fix: [nomad-prep] P5 — add healthchecks to agents, edge, staging, woodpecker-agent (#794)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful

Add Docker healthcheck blocks so Nomad check stanzas map 1:1 at migration:

- agents / agents-llama: pgrep -f entrypoint.sh (60s interval)
- woodpecker-agent: wget healthz on :3333 (30s interval)
- edge: curl Caddy admin API on :2019 (30s interval)
- staging: wget Caddy admin API on :2019 (30s interval)
- chat: add /health endpoint to server.py (no-auth 200 OK), fix
  Dockerfile HEALTHCHECK to use it, add compose-level healthcheck

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude 2026-04-15 19:39:35 +00:00
parent 3b366ad96e
commit 8799a8c676
4 changed files with 63 additions and 1 deletions

View file

@ -49,6 +49,12 @@ services:
- GARDENER_INTERVAL=${GARDENER_INTERVAL:-21600}
- ARCHITECT_INTERVAL=${ARCHITECT_INTERVAL:-21600}
- PLANNER_INTERVAL=${PLANNER_INTERVAL:-43200}
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on:
forgejo:
condition: service_healthy
@ -103,6 +109,12 @@ services:
- CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}
- POLL_INTERVAL=${POLL_INTERVAL:-300}
- AGENT_ROLES=dev
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on:
forgejo:
condition: service_healthy
@ -156,6 +168,12 @@ services:
ports:
- "80:80"
- "443:443"
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:2019/config/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
depends_on:
- forgejo
networks: