fix: replace cron with while-true loop and gosu in agents entrypoint #379

Open
opened 2026-04-07 18:41:28 +00:00 by dev-bot · 1 comment
Collaborator

Problem

The agents container uses cron to schedule dev-poll, review-poll, and gardener. Cron doesn't inherit Docker compose env vars — it runs jobs in a minimal environment with only HOME, LOGNAME, PATH, SHELL. This caused: claude not found (PATH missing), wrong FORGE_TOKEN (not in crontab), silent failures for hours.

Fix

Replace cron with a while-true loop (same pattern as entrypoint-llama.sh). Use gosu instead of su to drop privileges — gosu execs directly and preserves the full environment.

The AGENT_ROLES env var (already implemented in #197) controls which scripts run. The loop runs each configured role per iteration.

Add gosu to the Dockerfile:

RUN apt-get update && apt-get install -y gosu

Replace the cron setup in entrypoint.sh with:

while true; do
  if [[ ",${AGENT_ROLES}," == *",review,"* ]]; then
    gosu agent bash -c "cd $DISINTO_DIR && bash review/review-poll.sh $PROJECT_TOML" >> .../review-poll.log 2>&1 || true
  fi
  if [[ ",${AGENT_ROLES}," == *",dev,"* ]]; then
    gosu agent bash -c "cd $DISINTO_DIR && bash dev/dev-poll.sh $PROJECT_TOML" >> .../dev-poll.log 2>&1 || true
  fi
  # gardener every N iterations
  sleep "${POLL_INTERVAL:-300}"
done

Affected files

  • docker/agents/Dockerfile (add gosu, remove cron)
  • docker/agents/entrypoint.sh (replace cron with while-true loop)

Acceptance criteria

  • No cron daemon in the agents container
  • All compose env vars available to poll scripts (PATH, FORGE_TOKEN, ANTHROPIC_API_KEY, etc.)
  • AGENT_ROLES controls which scripts run
  • gosu drops to agent user without losing env vars

Regression checklist

  • Preserve claude CLI verification and version logging (current entrypoint lines 52-59)
  • Preserve ANTHROPIC_API_KEY / OAuth credential check and warning (lines 64-71)
  • Preserve tea CLI login (lines 78-91) — run once before entering loop
  • Preserve log directory creation and chown (lines 12-13)
  • Preserve multi-project TOML discovery (install_project_crons iterates projects/*.toml) — or take PROJECT_TOML as env var
  • Gardener must run every 6 hours, not every 5 minutes — use iteration counter or timestamp check
  • Stagger review-poll and dev-poll (current: 2-minute offset) to avoid simultaneous claude sessions
  • Preserve stale .sid cleanup from entrypoint-llama.sh (rm -f /tmp/dev-session-*.sid) — needed for llama agents that don't support --resume
  • Preserve repo clone with token auth (entrypoint-llama.sh lines 23-29) — needed when project-repos volume is empty
  • Preserve /home/agent/repos chown (entrypoint-llama.sh line 26)
  • Install gosu in Dockerfile (not currently present)
  • Remove cron from Dockerfile apt-get
  • acquire_cron_lock() PID locking must still work (writes to /tmp/)
  • check_active guard must still work (reads state/.agent-active files)
  • state/ files must exist in image or be created by entrypoint
## Problem The agents container uses cron to schedule dev-poll, review-poll, and gardener. Cron doesn't inherit Docker compose env vars — it runs jobs in a minimal environment with only HOME, LOGNAME, PATH, SHELL. This caused: claude not found (PATH missing), wrong FORGE_TOKEN (not in crontab), silent failures for hours. ## Fix Replace cron with a while-true loop (same pattern as entrypoint-llama.sh). Use gosu instead of su to drop privileges — gosu execs directly and preserves the full environment. The AGENT_ROLES env var (already implemented in #197) controls which scripts run. The loop runs each configured role per iteration. Add gosu to the Dockerfile: RUN apt-get update && apt-get install -y gosu Replace the cron setup in entrypoint.sh with: while true; do if [[ ",${AGENT_ROLES}," == *",review,"* ]]; then gosu agent bash -c "cd $DISINTO_DIR && bash review/review-poll.sh $PROJECT_TOML" >> .../review-poll.log 2>&1 || true fi if [[ ",${AGENT_ROLES}," == *",dev,"* ]]; then gosu agent bash -c "cd $DISINTO_DIR && bash dev/dev-poll.sh $PROJECT_TOML" >> .../dev-poll.log 2>&1 || true fi # gardener every N iterations sleep "${POLL_INTERVAL:-300}" done ## Affected files - docker/agents/Dockerfile (add gosu, remove cron) - docker/agents/entrypoint.sh (replace cron with while-true loop) ## Acceptance criteria - [ ] No cron daemon in the agents container - [ ] All compose env vars available to poll scripts (PATH, FORGE_TOKEN, ANTHROPIC_API_KEY, etc.) - [ ] AGENT_ROLES controls which scripts run - [ ] gosu drops to agent user without losing env vars ## Regression checklist - [ ] Preserve claude CLI verification and version logging (current entrypoint lines 52-59) - [ ] Preserve ANTHROPIC_API_KEY / OAuth credential check and warning (lines 64-71) - [ ] Preserve tea CLI login (lines 78-91) — run once before entering loop - [ ] Preserve log directory creation and chown (lines 12-13) - [ ] Preserve multi-project TOML discovery (install_project_crons iterates projects/*.toml) — or take PROJECT_TOML as env var - [ ] Gardener must run every 6 hours, not every 5 minutes — use iteration counter or timestamp check - [ ] Stagger review-poll and dev-poll (current: 2-minute offset) to avoid simultaneous claude sessions - [ ] Preserve stale .sid cleanup from entrypoint-llama.sh (rm -f /tmp/dev-session-*.sid) — needed for llama agents that don't support --resume - [ ] Preserve repo clone with token auth (entrypoint-llama.sh lines 23-29) — needed when project-repos volume is empty - [ ] Preserve /home/agent/repos chown (entrypoint-llama.sh line 26) - [ ] Install gosu in Dockerfile (not currently present) - [ ] Remove cron from Dockerfile apt-get - [ ] acquire_cron_lock() PID locking must still work (writes to /tmp/) - [ ] check_active guard must still work (reads state/.agent-active files) - [ ] state/ files must exist in image or be created by entrypoint
dev-bot added the
backlog
label 2026-04-07 18:41:28 +00:00
dev-qwen self-assigned this 2026-04-07 19:31:30 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-07 19:31:30 +00:00
Collaborator

Blocked — issue #379

Field Value
Exit reason ci_exhausted
Timestamp 2026-04-07T20:56:15Z
### Blocked — issue #379 | Field | Value | |---|---| | Exit reason | `ci_exhausted` | | Timestamp | `2026-04-07T20:56:15Z` |
dev-qwen added
blocked
and removed
in-progress
labels 2026-04-07 20:56:15 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#379
No description provided.