vision: supervisor agent running on host level with full system visibility #232

Closed
opened 2026-04-05 16:11:28 +00:00 by dev-bot · 1 comment
Collaborator

Goal

The supervisor agent monitors factory health: system resources, Docker containers, CI pipelines, agent sessions, stale locks, and worktrees. It needs Docker socket access for container visibility, which the agents container should not have.

Decision: run supervisor in the edge container

The edge container already runs infrastructure concerns (Caddy reverse proxy, vault dispatcher). Adding the supervisor here keeps the architecture clean:

  • agents container — low-permission, runs AI workloads (dev, review, gardener, planner, architect, predictor)
  • edge container — infrastructure operations (proxy, dispatch, supervisor), gets Docker socket

This stays inside the disinto up/down lifecycle. No host-level services, no extra systemd units.

Existing code (all in the repo, ready to use)

Supervisor scripts

  • supervisor/supervisor-run.sh — complete cron wrapper following the standard *-run.sh pattern. Sources lib/env.sh, lib/formula-session.sh, lib/worktree.sh, lib/guard.sh, lib/agent-sdk.sh. Uses check_active, acquire_cron_lock, check_memory, load_formula_or_profile, build_context_block, formula_prepare_profile_context, formula_worktree_setup, agent_run, profile_write_journal.
  • supervisor/preflight.sh — metrics collection. Sources lib/env.sh, lib/ci-helpers.sh. Collects RAM/swap/disk/load from /proc, Docker container status, CI pipeline state, open PRs, issue status, stale worktrees.
  • formulas/run-supervisor.toml — 5-step formula: preflight, health-assessment, decide-actions, report, journal. Defines P0-P4 priority thresholds and auto-fix recipes.
  • supervisor/AGENTS.md — agent documentation.

Edge container (prior art for the integration pattern)

  • docker/edge/entrypoint-edge.sh — starts dispatcher in background, Caddy as main process. The supervisor should follow the same pattern: start supervisor loop in background alongside dispatcher.
  • docker/edge/dispatcher.sh — sources lib/env.sh, runs a while-true poll loop with sleep 60. The supervisor loop should follow this pattern (while-true with sleep, not cron — cron doesn't inherit env vars).
  • docker/edge/Dockerfile — Alpine-based, already has bash, jq, curl, git, docker-cli.
  • Docker socket already mounted: /var/run/docker.sock:/var/run/docker.sock

Standard *-run.sh pattern (follow planner-run.sh as the cleanest reference)

Every formula agent follows the same sequence:

  1. Source libs: env.sh, formula-session.sh, worktree.sh, guard.sh, agent-sdk.sh
  2. Set FORGE_TOKEN to agent-specific token
  3. Set LOG_FILE to ${DISINTO_LOG_DIR}/<agent>/<agent>.log (NOT $SCRIPT_DIR — see #210)
  4. Guards: check_active, acquire_cron_lock, check_memory
  5. Resolve agent identity via resolve_agent_identity() (NOT copy-pasted curl block — see #280)
  6. Load formula: load_formula_or_profile
  7. Build context: build_context_block
  8. Prepare profile: formula_prepare_profile_context
  9. Set up worktree: formula_worktree_setup
  10. Build prompt with context + formula steps
  11. Run: agent_run --worktree "$WORKTREE" "$PROMPT"
  12. Write journal: profile_write_journal

What needs to change

docker/edge/Dockerfile

  • Add claude binary mount or install (same bind-mount pattern as agents container)

docker-compose.yml (edge service)

  • Add claude binary bind-mount
  • Add FORGE_SUPERVISOR_TOKEN env var
  • Add ANTHROPIC_API_KEY (or mount OAuth credentials)
  • Add state/.supervisor-active to enable the guard

docker/edge/entrypoint-edge.sh

  • Add supervisor loop in background (same pattern as dispatcher): bash /opt/disinto/supervisor/supervisor-run.sh &
  • Do NOT use cron — use while-true loop with sleep 1200 (20 min), like the dispatcher uses sleep 60

supervisor/supervisor-run.sh

  • Fix LOG_FILE: use ${DISINTO_LOG_DIR}/supervisor/supervisor.log (currently hardcoded to $SCRIPT_DIR, same bug as gardener had in #210)
  • The rest of the script follows the standard pattern and should work as-is

supervisor/preflight.sh

  • Docker commands (docker ps, docker stats) will work natively since the socket is mounted
  • System metrics from /proc work in any container
  • Forgejo/Woodpecker API calls work via existing FORGE_URL/FORGE_TOKEN

Known issues to address

  • supervisor-run.sh LOG_FILE is still hardcoded to $SCRIPT_DIR (same class as #210)
  • supervisor/supervisor-poll.sh is legacy bash orchestrator (superseded by supervisor-run.sh) — can be deleted
  • supervisor/update-prompt.sh — check if still used, likely dead code
  • .profile repo for supervisor-bot needs to exist (use hire-an-agent)

Acceptance criteria

  • Supervisor runs inside the edge container alongside dispatcher and Caddy
  • Uses while-true loop (not cron) for scheduling
  • Docker socket provides container visibility
  • Follows the standard *-run.sh pattern (planner-run.sh as reference)
  • Uses existing lib functions (no duplicated code)
  • disinto up starts supervisor, disinto down stops it
  • Health journal written to ops repo after each run
## Goal The supervisor agent monitors factory health: system resources, Docker containers, CI pipelines, agent sessions, stale locks, and worktrees. It needs Docker socket access for container visibility, which the agents container should not have. ## Decision: run supervisor in the edge container The edge container already runs infrastructure concerns (Caddy reverse proxy, vault dispatcher). Adding the supervisor here keeps the architecture clean: - **agents container** — low-permission, runs AI workloads (dev, review, gardener, planner, architect, predictor) - **edge container** — infrastructure operations (proxy, dispatch, supervisor), gets Docker socket This stays inside the disinto up/down lifecycle. No host-level services, no extra systemd units. ## Existing code (all in the repo, ready to use) ### Supervisor scripts - `supervisor/supervisor-run.sh` — complete cron wrapper following the standard *-run.sh pattern. Sources lib/env.sh, lib/formula-session.sh, lib/worktree.sh, lib/guard.sh, lib/agent-sdk.sh. Uses check_active, acquire_cron_lock, check_memory, load_formula_or_profile, build_context_block, formula_prepare_profile_context, formula_worktree_setup, agent_run, profile_write_journal. - `supervisor/preflight.sh` — metrics collection. Sources lib/env.sh, lib/ci-helpers.sh. Collects RAM/swap/disk/load from /proc, Docker container status, CI pipeline state, open PRs, issue status, stale worktrees. - `formulas/run-supervisor.toml` — 5-step formula: preflight, health-assessment, decide-actions, report, journal. Defines P0-P4 priority thresholds and auto-fix recipes. - `supervisor/AGENTS.md` — agent documentation. ### Edge container (prior art for the integration pattern) - `docker/edge/entrypoint-edge.sh` — starts dispatcher in background, Caddy as main process. The supervisor should follow the same pattern: start supervisor loop in background alongside dispatcher. - `docker/edge/dispatcher.sh` — sources lib/env.sh, runs a while-true poll loop with sleep 60. The supervisor loop should follow this pattern (while-true with sleep, not cron — cron doesn't inherit env vars). - `docker/edge/Dockerfile` — Alpine-based, already has bash, jq, curl, git, docker-cli. - Docker socket already mounted: `/var/run/docker.sock:/var/run/docker.sock` ### Standard *-run.sh pattern (follow planner-run.sh as the cleanest reference) Every formula agent follows the same sequence: 1. Source libs: env.sh, formula-session.sh, worktree.sh, guard.sh, agent-sdk.sh 2. Set FORGE_TOKEN to agent-specific token 3. Set LOG_FILE to `${DISINTO_LOG_DIR}/<agent>/<agent>.log` (NOT `$SCRIPT_DIR` — see #210) 4. Guards: check_active, acquire_cron_lock, check_memory 5. Resolve agent identity via resolve_agent_identity() (NOT copy-pasted curl block — see #280) 6. Load formula: load_formula_or_profile 7. Build context: build_context_block 8. Prepare profile: formula_prepare_profile_context 9. Set up worktree: formula_worktree_setup 10. Build prompt with context + formula steps 11. Run: agent_run --worktree "$WORKTREE" "$PROMPT" 12. Write journal: profile_write_journal ## What needs to change ### docker/edge/Dockerfile - Add claude binary mount or install (same bind-mount pattern as agents container) ### docker-compose.yml (edge service) - Add claude binary bind-mount - Add FORGE_SUPERVISOR_TOKEN env var - Add ANTHROPIC_API_KEY (or mount OAuth credentials) - Add state/.supervisor-active to enable the guard ### docker/edge/entrypoint-edge.sh - Add supervisor loop in background (same pattern as dispatcher): `bash /opt/disinto/supervisor/supervisor-run.sh &` - Do NOT use cron — use while-true loop with sleep 1200 (20 min), like the dispatcher uses sleep 60 ### supervisor/supervisor-run.sh - Fix LOG_FILE: use `${DISINTO_LOG_DIR}/supervisor/supervisor.log` (currently hardcoded to $SCRIPT_DIR, same bug as gardener had in #210) - The rest of the script follows the standard pattern and should work as-is ### supervisor/preflight.sh - Docker commands (docker ps, docker stats) will work natively since the socket is mounted - System metrics from /proc work in any container - Forgejo/Woodpecker API calls work via existing FORGE_URL/FORGE_TOKEN ## Known issues to address - supervisor-run.sh LOG_FILE is still hardcoded to $SCRIPT_DIR (same class as #210) - supervisor/supervisor-poll.sh is legacy bash orchestrator (superseded by supervisor-run.sh) — can be deleted - supervisor/update-prompt.sh — check if still used, likely dead code - .profile repo for supervisor-bot needs to exist (use hire-an-agent) ## Acceptance criteria - [ ] Supervisor runs inside the edge container alongside dispatcher and Caddy - [ ] Uses while-true loop (not cron) for scheduling - [ ] Docker socket provides container visibility - [ ] Follows the standard *-run.sh pattern (planner-run.sh as reference) - [ ] Uses existing lib functions (no duplicated code) - [ ] disinto up starts supervisor, disinto down stops it - [ ] Health journal written to ops repo after each run
dev-bot added the
vision
label 2026-04-05 16:11:29 +00:00
Author
Collaborator

Decomposed into #343 (cleanup) and #344 (edge integration). Closing.

Decomposed into #343 (cleanup) and #344 (edge integration). Closing.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#232
No description provided.