disinto/supervisor/AGENTS.md

<!-- last-reviewed: f2064ba67c3b6819f5e252300927c01e2825dd7c -->
# Supervisor Agent

**Role**: Health monitoring and auto-remediation, executed as a formula-driven
Claude agent. Collects system and project metrics via a bash pre-flight script,
then runs an interactive Claude session (sonnet) that assesses health, auto-fixes
issues, and writes a daily journal. When blocked on external
resources or human decisions, files vault items instead of escalating directly.

**Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh`
and calls `check_active supervisor` first — skips if
`$FACTORY_ROOT/state/.supervisor-active` is absent. Then creates a tmux session
with `claude --model sonnet`, injects `formulas/run-supervisor.toml` with
pre-collected metrics as context, monitors the phase file, and cleans up on
completion or timeout (20 min max session). No action issues — the supervisor
runs directly from cron like the planner and predictor.

**Key files**:
- `supervisor/supervisor-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
  runs preflight.sh, sources disinto project config, creates tmux session, injects
  formula prompt with metrics, monitors phase file, handles crash recovery via
  `run_formula_and_monitor`
- `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap,
  load), Docker status, active tmux sessions + phase files, lock files, agent log
  tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked
  issues. Also performs **stale phase cleanup**: scans `/tmp/*-session-*.phase`
  files for `PHASE:escalate` entries and auto-removes any whose linked issue
  is confirmed closed (24h grace period after closure to avoid races)
- `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review,
  health-assessment, decide-actions, report, journal) with `needs` dependencies.
  Claude evaluates all metrics and takes actions in a single interactive session
- `supervisor/journal/*.md` — Daily health logs from each supervisor run (local,
  committed periodically)
- `supervisor/PROMPT.md` — Best-practices reference for remediation actions
- `supervisor/best-practices/*.md` — Domain-specific remediation guides (memory,
  disk, CI, git, dev-agent, review-agent, forge)
- `supervisor/supervisor-poll.sh` — Legacy bash orchestrator (superseded by
  supervisor-run.sh + formula)

**Alert priorities**: P0 (memory crisis), P1 (disk), P2 (factory stopped/stalled),
P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).

**Environment variables consumed**:
- `FORGE_TOKEN`, `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`
- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by supervisor-run.sh)
- `WOODPECKER_TOKEN`, `WOODPECKER_SERVER`, `WOODPECKER_DB_PASSWORD`, `WOODPECKER_DB_USER`, `WOODPECKER_DB_HOST`, `WOODPECKER_DB_NAME` — CI database queries

**Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run
preflight.sh (collect metrics) → load formula + context → create tmux
session → Claude assesses health, auto-fixes, writes journal → `PHASE:done`.
fix: update AGENTS.md docs and handle stale PHASE:escalate in gardener Address review feedback: - gardener/AGENTS.md: remove escalation flow references, describe vault routing - supervisor/AGENTS.md: remove escalation flow references, describe vault routing - gardener-run.sh: treat PHASE:escalate as terminal (kills session) to prevent zombie sessions from stale/legacy escalation writes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-26 10:32:04 +00:00			`<!-- last-reviewed: f2064ba67c3b6819f5e252300927c01e2825dd7c -->`
chore: gardener housekeeping 2026-03-21 Progressive disclosure split of AGENTS.md (487→152 lines): - Extracted per-directory AGENTS.md files for all 8 agents + lib/ - Root AGENTS.md now serves as a table of contents with summary table - All watermarks updated to 16e430e Grooming results: - Promoted #469 (WATCH flow missing curl) and #436 (idle_pane_count bug) to backlog - 12 dust items classified, no groups ripe for bundling yet - No blocked issues, no AD violations 2026-03-21 12:44:23 +00:00			`# Supervisor Agent`

			`Role: Health monitoring and auto-remediation, executed as a formula-driven`
			`Claude agent. Collects system and project metrics via a bash pre-flight script,`
			`then runs an interactive Claude session (sonnet) that assesses health, auto-fixes`
fix: Remove Matrix integration — notifications move to forge + OpenClaw (#732) Remove all Matrix/Dendrite infrastructure: - Delete lib/matrix_listener.sh (long-poll daemon), lib/matrix_listener.service (systemd unit), lib/hooks/on-stop-matrix.sh (response streaming hook) - Remove matrix_send() and matrix_send_ctx() from lib/env.sh - Remove MATRIX_HOMESERVER auto-detection, MATRIX_THREAD_MAP from lib/env.sh - Remove [matrix] section parsing from lib/load-project.sh - Remove Matrix hook installation from lib/agent-session.sh - Remove notify/notify_ctx helpers and Matrix thread tracking from dev/dev-agent.sh and action/action-agent.sh - Remove all matrix_send calls from dev-poll.sh, phase-handler.sh, action-poll.sh, vault-poll.sh, vault-fire.sh, vault-reject.sh, review-poll.sh, review-pr.sh, supervisor-poll.sh, formula-session.sh - Remove Matrix listener startup from docker/agents/entrypoint.sh - Remove append_dendrite_compose() and setup_matrix() from bin/disinto - Remove --matrix flag from disinto init - Clean Matrix references from .env.example, projects/.toml.example, formulas/.toml, AGENTS.md, BOOTSTRAP.md, README.md, RESOURCES.md, PHASE-PROTOCOL.md, and all agent AGENTS.md/PROMPT.md files Status visibility now via Codeberg PR/issue activity. Human interaction via vault items through forge. Proactive alerts via OpenClaw heartbeats. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-26 14:53:56 +00:00			`issues, and writes a daily journal. When blocked on external`
fix: update AGENTS.md docs and handle stale PHASE:escalate in gardener Address review feedback: - gardener/AGENTS.md: remove escalation flow references, describe vault routing - supervisor/AGENTS.md: remove escalation flow references, describe vault routing - gardener-run.sh: treat PHASE:escalate as terminal (kills session) to prevent zombie sessions from stale/legacy escalation writes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-26 10:32:04 +00:00			`resources or human decisions, files vault items instead of escalating directly.`
chore: gardener housekeeping 2026-03-21 Progressive disclosure split of AGENTS.md (487→152 lines): - Extracted per-directory AGENTS.md files for all 8 agents + lib/ - Root AGENTS.md now serves as a table of contents with summary table - All watermarks updated to 16e430e Grooming results: - Promoted #469 (WATCH flow missing curl) and #436 (idle_pane_count bug) to backlog - 12 dust items classified, no groups ripe for bundling yet - No blocked issues, no AD violations 2026-03-21 12:44:23 +00:00
chore: gardener housekeeping 2026-03-25 2026-03-25 00:07:52 +00:00			Trigger: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh`
			and calls `check_active supervisor` first — skips if
			`$FACTORY_ROOT/state/.supervisor-active` is absent. Then creates a tmux session
			with `claude --model sonnet`, injects `formulas/run-supervisor.toml` with
			`pre-collected metrics as context, monitors the phase file, and cleans up on`
			`completion or timeout (20 min max session). No action issues — the supervisor`
chore: gardener housekeeping 2026-03-21 Progressive disclosure split of AGENTS.md (487→152 lines): - Extracted per-directory AGENTS.md files for all 8 agents + lib/ - Root AGENTS.md now serves as a table of contents with summary table - All watermarks updated to 16e430e Grooming results: - Promoted #469 (WATCH flow missing curl) and #436 (idle_pane_count bug) to backlog - 12 dust items classified, no groups ripe for bundling yet - No blocked issues, no AD violations 2026-03-21 12:44:23 +00:00			`runs directly from cron like the planner and predictor.`

			`Key files:`
			- `supervisor/supervisor-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
			`runs preflight.sh, sources disinto project config, creates tmux session, injects`
			`formula prompt with metrics, monitors phase file, handles crash recovery via`
			`run_formula_and_monitor`
			- `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap,
			`load), Docker status, active tmux sessions + phase files, lock files, agent log`
			`tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked`
fix: update AGENTS.md docs and handle stale PHASE:escalate in gardener Address review feedback: - gardener/AGENTS.md: remove escalation flow references, describe vault routing - supervisor/AGENTS.md: remove escalation flow references, describe vault routing - gardener-run.sh: treat PHASE:escalate as terminal (kills session) to prevent zombie sessions from stale/legacy escalation writes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-26 10:32:04 +00:00			issues. Also performs stale phase cleanup: scans `/tmp/-session-.phase`
fix: correct supervisor/AGENTS.md — stale escalation-reply text + phase name - Remove stale Matrix escalation-reply routing text (supervisor-run.sh no longer calls consume_escalation_reply) - Fix preflight description: PHASE:escalate (matches actual code), not PHASE:failed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-26 10:40:16 +00:00			files for `PHASE:escalate` entries and auto-removes any whose linked issue
fix: update AGENTS.md docs and handle stale PHASE:escalate in gardener Address review feedback: - gardener/AGENTS.md: remove escalation flow references, describe vault routing - supervisor/AGENTS.md: remove escalation flow references, describe vault routing - gardener-run.sh: treat PHASE:escalate as terminal (kills session) to prevent zombie sessions from stale/legacy escalation writes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-26 10:32:04 +00:00			`is confirmed closed (24h grace period after closure to avoid races)`
chore: gardener housekeeping 2026-03-21 Progressive disclosure split of AGENTS.md (487→152 lines): - Extracted per-directory AGENTS.md files for all 8 agents + lib/ - Root AGENTS.md now serves as a table of contents with summary table - All watermarks updated to 16e430e Grooming results: - Promoted #469 (WATCH flow missing curl) and #436 (idle_pane_count bug) to backlog - 12 dust items classified, no groups ripe for bundling yet - No blocked issues, no AD violations 2026-03-21 12:44:23 +00:00			- `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review,
			health-assessment, decide-actions, report, journal) with `needs` dependencies.
			`Claude evaluates all metrics and takes actions in a single interactive session`
			- `supervisor/journal/*.md` — Daily health logs from each supervisor run (local,
			`committed periodically)`
			- `supervisor/PROMPT.md` — Best-practices reference for remediation actions
			- `supervisor/best-practices/*.md` — Domain-specific remediation guides (memory,
chore: gardener housekeeping 2026-03-23 2026-03-23 18:05:26 +00:00			`disk, CI, git, dev-agent, review-agent, forge)`
chore: gardener housekeeping 2026-03-21 Progressive disclosure split of AGENTS.md (487→152 lines): - Extracted per-directory AGENTS.md files for all 8 agents + lib/ - Root AGENTS.md now serves as a table of contents with summary table - All watermarks updated to 16e430e Grooming results: - Promoted #469 (WATCH flow missing curl) and #436 (idle_pane_count bug) to backlog - 12 dust items classified, no groups ripe for bundling yet - No blocked issues, no AD violations 2026-03-21 12:44:23 +00:00			- `supervisor/supervisor-poll.sh` — Legacy bash orchestrator (superseded by
			`supervisor-run.sh + formula)`

			`Alert priorities: P0 (memory crisis), P1 (disk), P2 (factory stopped/stalled),`
			`P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).`

			`Environment variables consumed:`
fix: Replace Codeberg dependency with local Forgejo instance (#611) - Add setup_forge() to bin/disinto: provisions Forgejo via Docker, creates admin + bot users (dev-bot, review-bot), generates API tokens, creates repo, and pushes code — all automated - Rename env vars: CODEBERG_TOKEN→FORGE_TOKEN, REVIEW_BOT_TOKEN→ FORGE_REVIEW_TOKEN, CODEBERG_REPO→FORGE_REPO, CODEBERG_API→ FORGE_API, CODEBERG_WEB→FORGE_WEB, CODEBERG_BOT_USERNAMES→ FORGE_BOT_USERNAMES (with backwards-compat fallbacks) - Rename API helpers: codeberg_api()→forge_api(), codeberg_api_all() →forge_api_all() (with compat aliases) - Add forge_url field to project TOML; load-project.sh derives FORGE_API/FORGE_WEB from forge_url + repo - Update parse_repo_slug() to accept any host URL, not just codeberg - Forgejo data stored under ~/.disinto/forgejo/ (not in factory repo) - Update all 58 files: agent scripts, formulas, docs, site HTML Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-23 16:57:12 +00:00			- `FORGE_TOKEN`, `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`
chore: gardener housekeeping 2026-03-21 Progressive disclosure split of AGENTS.md (487→152 lines): - Extracted per-directory AGENTS.md files for all 8 agents + lib/ - Root AGENTS.md now serves as a table of contents with summary table - All watermarks updated to 16e430e Grooming results: - Promoted #469 (WATCH flow missing curl) and #436 (idle_pane_count bug) to backlog - 12 dust items classified, no groups ripe for bundling yet - No blocked issues, no AD violations 2026-03-21 12:44:23 +00:00			- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by supervisor-run.sh)
			- `WOODPECKER_TOKEN`, `WOODPECKER_SERVER`, `WOODPECKER_DB_PASSWORD`, `WOODPECKER_DB_USER`, `WOODPECKER_DB_HOST`, `WOODPECKER_DB_NAME` — CI database queries

			`Lifecycle: supervisor-run.sh (cron */20) → lock + memory guard → run`
fix: Remove Matrix integration — notifications move to forge + OpenClaw (#732) Remove all Matrix/Dendrite infrastructure: - Delete lib/matrix_listener.sh (long-poll daemon), lib/matrix_listener.service (systemd unit), lib/hooks/on-stop-matrix.sh (response streaming hook) - Remove matrix_send() and matrix_send_ctx() from lib/env.sh - Remove MATRIX_HOMESERVER auto-detection, MATRIX_THREAD_MAP from lib/env.sh - Remove [matrix] section parsing from lib/load-project.sh - Remove Matrix hook installation from lib/agent-session.sh - Remove notify/notify_ctx helpers and Matrix thread tracking from dev/dev-agent.sh and action/action-agent.sh - Remove all matrix_send calls from dev-poll.sh, phase-handler.sh, action-poll.sh, vault-poll.sh, vault-fire.sh, vault-reject.sh, review-poll.sh, review-pr.sh, supervisor-poll.sh, formula-session.sh - Remove Matrix listener startup from docker/agents/entrypoint.sh - Remove append_dendrite_compose() and setup_matrix() from bin/disinto - Remove --matrix flag from disinto init - Clean Matrix references from .env.example, projects/.toml.example, formulas/.toml, AGENTS.md, BOOTSTRAP.md, README.md, RESOURCES.md, PHASE-PROTOCOL.md, and all agent AGENTS.md/PROMPT.md files Status visibility now via Codeberg PR/issue activity. Human interaction via vault items through forge. Proactive alerts via OpenClaw heartbeats. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-26 14:53:56 +00:00			`preflight.sh (collect metrics) → load formula + context → create tmux`
			session → Claude assesses health, auto-fixes, writes journal → `PHASE:done`.