2026-03-26 06:06:51 +00:00
|
|
|
<!-- last-reviewed: 043bf0f0217aef3f319b844f1a1277acd6327a1c -->
|
2026-03-21 12:44:23 +00:00
|
|
|
# Supervisor Agent
|
|
|
|
|
|
|
|
|
|
**Role**: Health monitoring and auto-remediation, executed as a formula-driven
|
|
|
|
|
Claude agent. Collects system and project metrics via a bash pre-flight script,
|
|
|
|
|
then runs an interactive Claude session (sonnet) that assesses health, auto-fixes
|
|
|
|
|
issues, escalates via Matrix, and writes a daily journal.
|
|
|
|
|
|
2026-03-25 00:07:52 +00:00
|
|
|
**Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh`
|
|
|
|
|
and calls `check_active supervisor` first — skips if
|
|
|
|
|
`$FACTORY_ROOT/state/.supervisor-active` is absent. Then creates a tmux session
|
|
|
|
|
with `claude --model sonnet`, injects `formulas/run-supervisor.toml` with
|
|
|
|
|
pre-collected metrics as context, monitors the phase file, and cleans up on
|
|
|
|
|
completion or timeout (20 min max session). No action issues — the supervisor
|
2026-03-21 12:44:23 +00:00
|
|
|
runs directly from cron like the planner and predictor.
|
|
|
|
|
|
|
|
|
|
**Key files**:
|
|
|
|
|
- `supervisor/supervisor-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
|
|
|
|
|
runs preflight.sh, sources disinto project config, creates tmux session, injects
|
|
|
|
|
formula prompt with metrics, monitors phase file, handles crash recovery via
|
|
|
|
|
`run_formula_and_monitor`
|
|
|
|
|
- `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap,
|
|
|
|
|
load), Docker status, active tmux sessions + phase files, lock files, agent log
|
|
|
|
|
tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked
|
2026-03-25 12:07:39 +00:00
|
|
|
issues, Matrix escalation replies. Also performs **stale phase cleanup**: scans
|
|
|
|
|
`/tmp/*-session-*.phase` files for `PHASE:escalate` entries and auto-removes any
|
|
|
|
|
whose linked issue is confirmed closed (24h grace period after closure to avoid
|
|
|
|
|
races)
|
2026-03-21 12:44:23 +00:00
|
|
|
- `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review,
|
|
|
|
|
health-assessment, decide-actions, report, journal) with `needs` dependencies.
|
|
|
|
|
Claude evaluates all metrics and takes actions in a single interactive session
|
|
|
|
|
- `supervisor/journal/*.md` — Daily health logs from each supervisor run (local,
|
|
|
|
|
committed periodically)
|
|
|
|
|
- `supervisor/PROMPT.md` — Best-practices reference for remediation actions
|
|
|
|
|
- `supervisor/best-practices/*.md` — Domain-specific remediation guides (memory,
|
2026-03-23 18:05:26 +00:00
|
|
|
disk, CI, git, dev-agent, review-agent, forge)
|
2026-03-21 12:44:23 +00:00
|
|
|
- `supervisor/supervisor-poll.sh` — Legacy bash orchestrator (superseded by
|
|
|
|
|
supervisor-run.sh + formula)
|
|
|
|
|
|
|
|
|
|
**Alert priorities**: P0 (memory crisis), P1 (disk), P2 (factory stopped/stalled),
|
|
|
|
|
P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).
|
|
|
|
|
|
|
|
|
|
**Matrix integration**: The supervisor has its own Matrix thread. Posts health
|
|
|
|
|
summaries when there are changes, escalates P0-P2 issues, and processes replies
|
|
|
|
|
from humans ("ignore disk warning", "kill that agent", "what's stuck?"). The
|
|
|
|
|
Matrix listener routes thread replies to `/tmp/supervisor-escalation-reply`,
|
|
|
|
|
which `supervisor-run.sh` consumes atomically on each run.
|
|
|
|
|
|
|
|
|
|
**Environment variables consumed**:
|
fix: Replace Codeberg dependency with local Forgejo instance (#611)
- Add setup_forge() to bin/disinto: provisions Forgejo via Docker,
creates admin + bot users (dev-bot, review-bot), generates API
tokens, creates repo, and pushes code — all automated
- Rename env vars: CODEBERG_TOKEN→FORGE_TOKEN, REVIEW_BOT_TOKEN→
FORGE_REVIEW_TOKEN, CODEBERG_REPO→FORGE_REPO, CODEBERG_API→
FORGE_API, CODEBERG_WEB→FORGE_WEB, CODEBERG_BOT_USERNAMES→
FORGE_BOT_USERNAMES (with backwards-compat fallbacks)
- Rename API helpers: codeberg_api()→forge_api(), codeberg_api_all()
→forge_api_all() (with compat aliases)
- Add forge_url field to project TOML; load-project.sh derives
FORGE_API/FORGE_WEB from forge_url + repo
- Update parse_repo_slug() to accept any host URL, not just codeberg
- Forgejo data stored under ~/.disinto/forgejo/ (not in factory repo)
- Update all 58 files: agent scripts, formulas, docs, site HTML
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 16:57:12 +00:00
|
|
|
- `FORGE_TOKEN`, `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`
|
2026-03-21 12:44:23 +00:00
|
|
|
- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by supervisor-run.sh)
|
|
|
|
|
- `WOODPECKER_TOKEN`, `WOODPECKER_SERVER`, `WOODPECKER_DB_PASSWORD`, `WOODPECKER_DB_USER`, `WOODPECKER_DB_HOST`, `WOODPECKER_DB_NAME` — CI database queries
|
|
|
|
|
- `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` — Matrix notifications + human input
|
|
|
|
|
|
|
|
|
|
**Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run
|
|
|
|
|
preflight.sh (collect metrics) → consume escalation replies → load formula +
|
|
|
|
|
context → create tmux session → Claude assesses health, auto-fixes, posts
|
|
|
|
|
Matrix summary, writes journal → `PHASE:done`.
|