Repo: coder/registry
Scope: Update the AgentAPI base module to configure state persistence and add shutdown script for log capture.
Changes:
- Add
state_file_path variable (default /home/coder/.agentapi/state.json)
- Add
pid_file_path variable (default /tmp/agentapi.pid)
- Add
enable_state_persistence variable (default true false, modules should explicitly enable and need to bump their agentapi module version anyway)
- Pass flags to AgentAPI startup:
--state-file, --load-state, --save-state, --pid-file
- Add
coder_script resource with run_on_stop = true that:
- Sends SIGUSR1 to AgentAPI to save state
- Fetches last 10 messages from AgentAPI
/messages (best-effort, may be fewer if truncated)
- Truncates if payload > 64KB
- POSTs to coderd snapshot endpoint
- Sends SIGTERM to AgentAPI
Script environment (available in run_on_stop context):
CODER_AGENT_TOKEN: Agent auth token for coderd API calls
CODER_AGENT_URL: Base URL for coderd API (via agent manifest)
CODER_WORKSPACE_NAME: Workspace name (via agent manifest)
Note on task ID: CODER_TASK_ID is available during Terraform provisioning but NOT during agent runtime. The shutdown script must have the task ID embedded at provisioning time via Terraform interpolation (see Terraform example below).
Startup script change (in main.sh):
- AgentAPI writes its own PID file via
--pid-file. Registry scripts should not rely on $! for the AgentAPI PID.
Terraform resource (in registry module):
resource "coder_script" "shutdown" {
agent_id = coder_agent.main.id
run_on_stop = true
script = <<-EOF
#!/bin/bash
set -euo pipefail
# Task ID embedded at provisioning time.
TASK_ID="${data.coder_task.me.id}"
AGENTAPI_PID=$(cat /tmp/agentapi.pid 2>/dev/null || echo "")
# Save state early (SIGUSR1 triggers save without exit).
if [[ -n "$AGENTAPI_PID" ]] && kill -0 "$AGENTAPI_PID" 2>/dev/null; then
kill -USR1 "$AGENTAPI_PID" || true
fi
# Capture and post snapshot (best-effort).
if curl -sf http://localhost:4321/messages >/dev/null 2>&1; then
# Fetch, truncate, post logic here
...
fi
# Terminate AgentAPI.
if [[ -n "$AGENTAPI_PID" ]]; then
kill -TERM "$AGENTAPI_PID" 2>/dev/null || true
fi
EOF
}
Files (in registry repo):
coder/modules/agentapi/main.tf
coder/modules/agentapi/scripts/main.sh (PID file)
coder/modules/agentapi/scripts/shutdown.sh (new)
Acceptance criteria:
Dependencies:
References:
Repo:
coder/registryScope: Update the AgentAPI base module to configure state persistence and add shutdown script for log capture.
Changes:
state_file_pathvariable (default/home/coder/.agentapi/state.json)pid_file_pathvariable (default/tmp/agentapi.pid)enable_state_persistencevariable (defaulttruefalse, modules should explicitly enable and need to bump their agentapi module version anyway)--state-file,--load-state,--save-state,--pid-filecoder_scriptresource withrun_on_stop = truethat:/messages(best-effort, may be fewer if truncated)Script environment (available in
run_on_stopcontext):CODER_AGENT_TOKEN: Agent auth token for coderd API callsCODER_AGENT_URL: Base URL for coderd API (via agent manifest)CODER_WORKSPACE_NAME: Workspace name (via agent manifest)Note on task ID:
CODER_TASK_IDis available during Terraform provisioning but NOT during agent runtime. The shutdown script must have the task ID embedded at provisioning time via Terraform interpolation (see Terraform example below).Startup script change (in
main.sh):--pid-file. Registry scripts should not rely on$!for the AgentAPI PID.Terraform resource (in registry module):
Files (in registry repo):
coder/modules/agentapi/main.tfcoder/modules/agentapi/scripts/main.sh(PID file)coder/modules/agentapi/scripts/shutdown.sh(new)Acceptance criteria:
set -euo pipefailbut snapshot POST failures don't abort shutdowndata.coder_task.me.idat provisioning time--pid-fileand shutdown script uses it for signalingDependencies:
run_on_stopscript completes coder#19467References: