fix: feat: supervisor as formula-driven agent — cron + Matrix escalation (#245)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 00:22:37 +00:00 · 2026-03-21 00:22:37 +00:00 · d8244742f1
commit d8244742f1
parent 7fe5ed0381
5 changed files with 585 additions and 12 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -22,7 +22,10 @@ disinto/
 ├── planner/       planner-run.sh — direct cron executor for run-planner formula
 │                  planner/journal/ — daily raw logs from each planner run
 │                  prediction-poll.sh, prediction-agent.sh — legacy predictor (superseded by predictor/)
-├── supervisor/    supervisor-poll.sh — health monitoring
+├── supervisor/    supervisor-run.sh — formula-driven health monitoring (cron wrapper)
+│                  preflight.sh — pre-flight data collection for supervisor formula
+│                  supervisor/journal/ — daily health logs from each run
+│                  supervisor-poll.sh — legacy bash orchestrator (superseded)
 ├── vault/         vault-poll.sh, vault-agent.sh, vault-fire.sh — action gating
 ├── action/        action-poll.sh, action-agent.sh — operational task execution
 ├── lib/           env.sh, agent-session.sh, ci-helpers.sh, ci-debug.sh, load-project.sh, parse-deps.sh, matrix_listener.sh
@ -138,26 +141,56 @@ issue is filed against).

 ### Supervisor (`supervisor/`)

-**Role**: Health monitoring and auto-remediation. Two-layer architecture:
-(1) factory infrastructure checks (RAM, disk, swap, docker, stale processes)
-that run once, and (2) per-project checks (CI, PRs, dev-agent health,
-circular deps, stale deps) that iterate over `projects/*.toml`.
+**Role**: Health monitoring and auto-remediation, executed as a formula-driven
+Claude agent. Collects system and project metrics via a bash pre-flight script,
+then runs an interactive Claude session (sonnet) that assesses health, auto-fixes
+issues, escalates via Matrix, and writes a daily journal.

-**Trigger**: `supervisor-poll.sh` runs every 10 min via cron.
+**Trigger**: `supervisor-run.sh` runs every 20 min via cron. It creates a tmux
+session with `claude --model sonnet`, injects `formulas/run-supervisor.toml`
+with pre-collected metrics as context, monitors the phase file, and cleans up
+on completion or timeout (20 min max session). No action issues — the supervisor
+runs directly from cron like the planner and predictor.

 **Key files**:
- `supervisor/supervisor-poll.sh` — All checks + auto-fixes (kill stale processes, rotate logs, drop caches, docker prune, abort stale rebases) then invokes `claude -p` for unresolved alerts
- `supervisor/update-prompt.sh` — Updates the supervisor prompt file
- `supervisor/PROMPT.md` — System prompt for the supervisor's Claude invocation
+- `supervisor/supervisor-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
+  runs preflight.sh, sources disinto project config, creates tmux session, injects
+  formula prompt with metrics, monitors phase file, handles crash recovery via
+  `run_formula_and_monitor`
+- `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap,
+  load), Docker status, active tmux sessions + phase files, lock files, agent log
+  tails, CI pipeline status, open PRs, issue counts, stale worktrees, pending
+  escalations, Matrix escalation replies
+- `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review,
+  health-assessment, decide-actions, report, journal) with `needs` dependencies.
+  Claude evaluates all metrics and takes actions in a single interactive session
+- `supervisor/journal/*.md` — Daily health logs from each supervisor run (local,
+  committed periodically)
+- `supervisor/PROMPT.md` — Best-practices reference for remediation actions
+- `supervisor/best-practices/*.md` — Domain-specific remediation guides (memory,
+  disk, CI, git, dev-agent, review-agent, codeberg)
+- `supervisor/supervisor-poll.sh` — Legacy bash orchestrator (superseded by
+  supervisor-run.sh + formula)

 **Alert priorities**: P0 (memory crisis), P1 (disk), P2 (factory stopped/stalled),
 P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).

+**Matrix integration**: The supervisor has its own Matrix thread. Posts health
+summaries when there are changes, escalates P0-P2 issues, and processes replies
+from humans ("ignore disk warning", "kill that agent", "what's stuck?"). The
+Matrix listener routes thread replies to `/tmp/supervisor-escalation-reply`,
+which `supervisor-run.sh` consumes atomically on each run.
+
 **Environment variables consumed**:
- All from `lib/env.sh` + per-project TOML overrides
+- `CODEBERG_TOKEN`, `CODEBERG_REPO`, `CODEBERG_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`
+- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by supervisor-run.sh)
 - `WOODPECKER_TOKEN`, `WOODPECKER_SERVER`, `WOODPECKER_DB_PASSWORD`, `WOODPECKER_DB_USER`, `WOODPECKER_DB_HOST`, `WOODPECKER_DB_NAME` — CI database queries
- `CHECK_PRS`, `CHECK_DEV_AGENT`, `CHECK_PIPELINE_STALL` — Per-project monitoring toggles (from TOML `[monitoring]` section)
- `CHECK_INFRA_RETRY` — Infra failure retry toggle (env var only, defaults to `true`; not configurable via project TOML)
+- `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` — Matrix notifications + human input
+
+**Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run
+preflight.sh (collect metrics) → consume escalation replies → load formula +
+context → create tmux session → Claude assesses health, auto-fixes, posts
+Matrix summary, writes journal → `PHASE:done`.

 ### Planner (`planner/`)

--- a/formulas/run-supervisor.toml
+++ b/formulas/run-supervisor.toml
@ -0,0 +1,237 @@
+# formulas/run-supervisor.toml — Supervisor formula (health monitoring + remediation)
+#
+# Executed by supervisor/supervisor-run.sh via cron (every 20 minutes).
+# supervisor-run.sh creates a tmux session with Claude (sonnet) and injects
+# this formula with pre-collected metrics as context.
+#
+# Steps: preflight → health-assessment → decide-actions → report → journal
+#
+# Key differences from planner/gardener:
+#   - Runs every 20min — lightweight health check
+#   - Primarily READS state, rarely WRITES (no PRs, just Matrix + journal)
+#   - Reactive to escalations — processes pending escalation events
+#   - Conversation memory via Matrix thread and journal
+
+name        = "run-supervisor"
+description = "Factory health monitoring: assess metrics, fix issues, report via Matrix, write journal"
+version     = 1
+model       = "sonnet"
+
+[context]
+files = ["AGENTS.md"]
+
+[[steps]]
+id    = "preflight"
+title = "Review pre-collected metrics"
+description = """
+The pre-flight metrics have already been collected by supervisor/preflight.sh
+and injected into your prompt above. Review them now.
+
+1. Read the injected metrics data carefully (System Resources, Docker,
+   Active Sessions, Phase Files, Lock Files, Agent Logs, CI Pipelines,
+   Open PRs, Issue Status, Stale Worktrees, Pending Escalations,
+   Escalation Replies).
+
+2. If there are escalation replies from Matrix (human messages), note them —
+   you will act on them in the decide-actions step.
+
+3. Read the supervisor journal for recent history:
+     JOURNAL_FILE="$FACTORY_ROOT/supervisor/journal/$(date -u +%Y-%m-%d).md"
+     if [ -f "$JOURNAL_FILE" ]; then cat "$JOURNAL_FILE"; fi
+
+4. Note any values that cross these thresholds:
+   - RAM available < 500MB or swap > 3GB → P0 (memory crisis)
+   - Disk > 80% → P1 (disk pressure)
+   - Agent sessions dead, CI stuck/pending, git in bad state → P2 (factory stopped)
+   - PRs stale, unreviewed, or with merge conflicts → P3 (factory degraded)
+   - Stale worktrees, old lock files → P4 (housekeeping)
+"""
+
+[[steps]]
+id    = "health-assessment"
+title = "Evaluate health of each subsystem"
+description = """
+Categorize every finding from the metrics into priority levels.
+
+### P0 — Memory crisis
+- RAM available < 500MB
+- Swap used > 3GB AND RAM available < 2000MB
+
+### P1 — Disk pressure
+- Disk usage > 80%
+
+### P2 — Factory stopped / stalled
+- CI pipelines stuck running > 20min or pending > 30min
+- Dev-agent lock file present but process dead
+- Dev-agent status unchanged for > 30min
+- Git repo on wrong branch or in broken rebase state
+- Pipeline stalled: backlog issues exist but no agent ran for > 20min
+- Dev-agent blocked: last N polls all report "no ready issues"
+- Dev sessions in PHASE:needs_human for > 24h
+
+### P3 — Factory degraded
+- PRs with CI pass but merge conflict (needs rebase)
+- PRs with CI failure stale > 30min
+- PRs with CI pass but no review for > 60min
+- Circular dependency deadlocks in backlog
+- Stale dependencies (blocked by issues open > 30 days)
+
+### P4 — Housekeeping
+- Stale worktrees > 2h old with no active process
+- Lock files for dead processes
+- Stale claude processes (> 3h old)
+
+List each finding with its priority level. If everything looks healthy,
+note "All systems healthy" and proceed.
+"""
+needs = ["preflight"]
+
+[[steps]]
+id    = "decide-actions"
+title = "Fix what you can, escalate what you cannot"
+description = """
+For each finding from the health assessment, decide and execute an action.
+
+### Auto-fixable (execute these directly)
+
+**P0 Memory crisis:**
+  # Kill stale one-shot claude processes (>3h old)
+  pgrep -f "claude -p" --older 10800 2>/dev/null | xargs kill 2>/dev/null || true
+  # Drop filesystem caches
+  sync && echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null 2>&1 || true
+
+**P1 Disk pressure:**
+  # Docker cleanup
+  sudo docker system prune -f >/dev/null 2>&1 || true
+  # Truncate logs > 10MB
+  for f in "$FACTORY_ROOT"/{dev,review,supervisor,gardener,planner,predictor}/*.log; do
+    [ -f "$f" ] && [ "$(du -k "$f" | cut -f1)" -gt 10240 ] && truncate -s 0 "$f"
+  done
+
+**P2 Dead lock files:**
+  rm -f /path/to/stale.lock
+
+**P2 Stale rebase:**
+  cd "$PROJECT_REPO_ROOT"
+  git rebase --abort 2>/dev/null
+  git checkout "$PRIMARY_BRANCH" 2>/dev/null
+
+**P2 Wrong branch:**
+  cd "$PROJECT_REPO_ROOT"
+  git checkout "$PRIMARY_BRANCH" 2>/dev/null
+
+**P4 Stale worktrees:**
+  git -C "$PROJECT_REPO_ROOT" worktree remove --force /tmp/stale-worktree 2>/dev/null
+  git -C "$PROJECT_REPO_ROOT" worktree prune 2>/dev/null
+
+**P4 Stale claude processes:**
+  pgrep -f "claude -p" --older 10800 2>/dev/null | xargs kill 2>/dev/null || true
+
+### Escalation replies (from Matrix)
+
+If there are escalation replies from a human, act on them:
+- "ignore X" → note in journal, do not alert on X this run
+- "kill that agent" → identify and kill the referenced session
+- "what's stuck?" → include detailed status in the Matrix report
+- Other instructions → follow them, use best judgment
+
+### Cannot auto-fix → escalate
+
+For P0-P2 issues that persist after auto-fix attempts, or issues requiring
+human judgment, prepare an escalation message for the report step.
+
+Read the relevant best-practices file before taking action:
+  cat "$FACTORY_ROOT/supervisor/best-practices/memory.md"    # P0
+  cat "$FACTORY_ROOT/supervisor/best-practices/disk.md"      # P1
+  cat "$FACTORY_ROOT/supervisor/best-practices/ci.md"        # P2 CI
+  cat "$FACTORY_ROOT/supervisor/best-practices/dev-agent.md" # P2 agent
+  cat "$FACTORY_ROOT/supervisor/best-practices/git.md"       # P2 git
+
+Track what you fixed and what needs escalation for the report step.
+"""
+needs = ["health-assessment"]
+
+[[steps]]
+id    = "report"
+title = "Post health summary to Matrix"
+description = """
+Post a status summary to Matrix. Use the matrix_send function:
+  source "$FACTORY_ROOT/lib/env.sh"
+  matrix_send "supervisor" "<message>"
+
+### When everything is healthy
+Post a brief "all clear" only if the PREVIOUS run had alerts (check journal).
+Do NOT post "all clear" every 20 minutes — that would be noise.
+
+### When there are findings
+Post a summary grouped by priority:
+  matrix_send "supervisor" "Supervisor health check:
+
+  Fixed:
+  - <what was auto-fixed>
+
+  Alerts:
+  - [P2] <description>
+  - [P3] <description>
+
+  Status: RAM=<X>MB Disk=<Y>% Load=<Z>"
+
+### When escalation is needed (P0-P2 unresolved)
+Escalate with a clear call to action:
+  matrix_send "supervisor" "ESCALATE: <what's wrong and why you can't fix it>
+
+  Suggested action: <what the human should do>"
+
+### Responding to escalation replies
+If you acted on a human's reply, confirm what you did:
+  matrix_send "supervisor" "Acted on your reply: <summary of action taken>"
+
+Keep messages concise. Do not post identical messages to what was posted
+in the previous run (check journal for prior messages).
+"""
+needs = ["decide-actions"]
+
+[[steps]]
+id    = "journal"
+title = "Write health journal entry"
+description = """
+Append a timestamped entry to the supervisor journal.
+
+File path:
+  $FACTORY_ROOT/supervisor/journal/$(date -u +%Y-%m-%d).md
+
+If the file already exists (multiple runs per day), append a new section.
+If it does not exist, create it.
+
+Format:
+  ## Supervisor run — HH:MM UTC
+
+  ### Health status
+  - RAM: <X>MB available, Swap: <X>MB
+  - Disk: <X>%
+  - Load: <X>
+  - Docker: <N> containers
+
+  ### Findings
+  - [P<N>] <finding> — <action taken or "escalated">
+  (or "No issues found — all systems healthy")
+
+  ### Actions taken
+  - <what was fixed>
+  (or "No actions needed")
+
+  ### Escalation replies processed
+  - <human said X, did Y>
+  (or "None")
+
+Keep each entry concise — 15-25 lines max. This journal provides
+run-to-run context so future supervisor runs can detect trends
+(e.g., "disk has been >75% for 3 consecutive runs").
+
+IMPORTANT: Do NOT commit or push the journal — it is a local working file.
+The journal directory is committed to git periodically by other agents.
+
+After writing the journal, write the phase signal:
+  echo 'PHASE:done' > '$PHASE_FILE'
+"""
+needs = ["report"]
--- a/supervisor/journal/.gitkeep
+++ b/supervisor/journal/.gitkeep
--- a/supervisor/preflight.sh
+++ b/supervisor/preflight.sh
@ -0,0 +1,188 @@
+#!/usr/bin/env bash
+# =============================================================================
+# preflight.sh — Collect system and project metrics for the supervisor formula
+#
+# Outputs structured text to stdout. Called by supervisor-run.sh before
+# launching the Claude session. The output is injected into the prompt.
+#
+# Usage:
+#   bash supervisor/preflight.sh [projects/disinto.toml]
+# =============================================================================
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
+
+export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}"
+# shellcheck source=../lib/env.sh
+source "$FACTORY_ROOT/lib/env.sh"
+# shellcheck source=../lib/ci-helpers.sh
+source "$FACTORY_ROOT/lib/ci-helpers.sh"
+
+# ── System Resources ─────────────────────────────────────────────────────
+
+echo "## System Resources"
+
+_avail_mb=$(free -m | awk '/Mem:/{print $7}')
+_total_mb=$(free -m | awk '/Mem:/{print $2}')
+_swap_used=$(free -m | awk '/Swap:/{print $3}')
+_disk_pct=$(df -h / | awk 'NR==2{print $5}' | tr -d '%')
+_disk_used=$(df -h / | awk 'NR==2{print $3}')
+_disk_total=$(df -h / | awk 'NR==2{print $2}')
+_load=$(cat /proc/loadavg 2>/dev/null || echo "unknown")
+
+echo "RAM: ${_avail_mb}MB available / ${_total_mb}MB total, Swap: ${_swap_used}MB used"
+echo "Disk: ${_disk_pct}% used (${_disk_used}/${_disk_total} on /)"
+echo "Load: ${_load}"
+echo ""
+
+# ── Docker ────────────────────────────────────────────────────────────────
+
+echo "## Docker"
+if command -v docker &>/dev/null; then
+  docker ps --format 'table {{.Names}}\t{{.Status}}' 2>/dev/null || echo "Docker query failed"
+else
+  echo "Docker not available"
+fi
+echo ""
+
+# ── Active Sessions + Phase Files ─────────────────────────────────────────
+
+echo "## Active Sessions"
+if tmux list-sessions 2>/dev/null; then
+  :
+else
+  echo "No tmux sessions"
+fi
+echo ""
+
+echo "## Phase Files"
+_found_phase=false
+for _pf in /tmp/*-session-*.phase; do
+  [ -f "$_pf" ] || continue
+  _found_phase=true
+  _phase_content=$(head -1 "$_pf" 2>/dev/null || echo "unreadable")
+  _phase_age_min=$(( ($(date +%s) - $(stat -c %Y "$_pf" 2>/dev/null || echo 0)) / 60 ))
+  echo "  $(basename "$_pf"): ${_phase_content} (${_phase_age_min}min ago)"
+done
+[ "$_found_phase" = false ] && echo "  None"
+echo ""
+
+# ── Lock Files ────────────────────────────────────────────────────────────
+
+echo "## Lock Files"
+_found_lock=false
+for _lf in /tmp/*-poll.lock /tmp/*-run.lock /tmp/dev-agent-*.lock; do
+  [ -f "$_lf" ] || continue
+  _found_lock=true
+  _pid=$(cat "$_lf" 2>/dev/null || true)
+  _age_min=$(( ($(date +%s) - $(stat -c %Y "$_lf" 2>/dev/null || echo 0)) / 60 ))
+  _alive="dead"
+  [ -n "${_pid:-}" ] && kill -0 "$_pid" 2>/dev/null && _alive="alive"
+  echo "  $(basename "$_lf"): PID=${_pid:-?} ${_alive} age=${_age_min}min"
+done
+[ "$_found_lock" = false ] && echo "  None"
+echo ""
+
+# ── Agent Logs (last 15 lines each) ──────────────────────────────────────
+
+echo "## Recent Agent Logs"
+for _log in supervisor/supervisor.log dev/dev-agent.log review/review.log \
+            gardener/gardener.log planner/planner.log predictor/predictor.log \
+            action/action.log; do
+  _logpath="${FACTORY_ROOT}/${_log}"
+  if [ -f "$_logpath" ]; then
+    _log_age_min=$(( ($(date +%s) - $(stat -c %Y "$_logpath" 2>/dev/null || echo 0)) / 60 ))
+    echo "### ${_log} (last modified ${_log_age_min}min ago)"
+    tail -15 "$_logpath" 2>/dev/null || echo "(read failed)"
+    echo ""
+  fi
+done
+
+# ── CI Pipelines ──────────────────────────────────────────────────────────
+
+echo "## CI Pipelines (${PROJECT_NAME})"
+
+_recent_ci=$(wpdb -A -c "
+  SELECT number, status, branch,
+         ROUND(EXTRACT(EPOCH FROM (to_timestamp(finished) - to_timestamp(started)))/60)::int as dur_min
+  FROM pipelines
+  WHERE repo_id = ${WOODPECKER_REPO_ID}
+    AND finished > 0
+    AND to_timestamp(finished) > now() - interval '24 hours'
+  ORDER BY number DESC LIMIT 10;" 2>/dev/null || echo "CI database query failed")
+echo "$_recent_ci"
+
+_stuck=$(wpdb -c "
+  SELECT count(*) FROM pipelines
+  WHERE repo_id=${WOODPECKER_REPO_ID}
+    AND status='running'
+    AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs || echo "?")
+
+_pending=$(wpdb -c "
+  SELECT count(*) FROM pipelines
+  WHERE repo_id=${WOODPECKER_REPO_ID}
+    AND status='pending'
+    AND EXTRACT(EPOCH FROM now() - to_timestamp(created)) > 1800;" 2>/dev/null | xargs || echo "?")
+
+echo "Stuck (>20min): ${_stuck}"
+echo "Pending (>30min): ${_pending}"
+echo ""
+
+# ── Open PRs ──────────────────────────────────────────────────────────────
+
+echo "## Open PRs (${PROJECT_NAME})"
+_open_prs=$(codeberg_api GET "/pulls?state=open&limit=10" 2>/dev/null || echo "[]")
+echo "$_open_prs" | jq -r '.[] | "#\(.number) [\(.head.ref)] \(.title) — updated \(.updated_at)"' 2>/dev/null || echo "No open PRs or query failed"
+echo ""
+
+# ── Backlog + In-Progress ─────────────────────────────────────────────────
+
+echo "## Issue Status (${PROJECT_NAME})"
+_backlog_count=$(codeberg_api GET "/issues?state=open&labels=backlog&type=issues&limit=1" 2>/dev/null | jq 'length' 2>/dev/null || echo "?")
+_in_progress_count=$(codeberg_api GET "/issues?state=open&labels=in-progress&type=issues&limit=1" 2>/dev/null | jq 'length' 2>/dev/null || echo "?")
+_blocked_count=$(codeberg_api GET "/issues?state=open&labels=blocked&type=issues&limit=1" 2>/dev/null | jq 'length' 2>/dev/null || echo "?")
+echo "Backlog: ${_backlog_count}, In-progress: ${_in_progress_count}, Blocked: ${_blocked_count}"
+echo ""
+
+# ── Stale Worktrees ───────────────────────────────────────────────────────
+
+echo "## Stale Worktrees"
+_found_wt=false
+for _wt in /tmp/*-worktree-* /tmp/*-review-*; do
+  [ -d "$_wt" ] || continue
+  _found_wt=true
+  _wt_age_min=$(( ($(date +%s) - $(stat -c %Y "$_wt" 2>/dev/null || echo 0)) / 60 ))
+  echo "  $(basename "$_wt"): ${_wt_age_min}min old"
+done
+[ "$_found_wt" = false ] && echo "  None"
+echo ""
+
+# ── Pending Escalations ──────────────────────────────────────────────────
+
+echo "## Pending Escalations"
+_found_esc=false
+for _esc_file in "${FACTORY_ROOT}/supervisor/escalations-"*.jsonl; do
+  [ -f "$_esc_file" ] || continue
+  [[ "$_esc_file" == *.done.jsonl ]] && continue
+  _esc_count=$(wc -l < "$_esc_file" 2>/dev/null || echo 0)
+  [ "${_esc_count:-0}" -gt 0 ] || continue
+  _found_esc=true
+  echo "### $(basename "$_esc_file") (${_esc_count} entries)"
+  cat "$_esc_file"
+  echo ""
+done
+[ "$_found_esc" = false ] && echo "  None"
+echo ""
+
+# ── Escalation Replies from Matrix ────────────────────────────────────────
+
+echo "## Escalation Replies (from Matrix)"
+if [ -s /tmp/supervisor-escalation-reply ]; then
+  cat /tmp/supervisor-escalation-reply
+  echo ""
+  echo "(This reply will be consumed after this run)"
+else
+  echo "  None"
+fi
+echo ""
--- a/supervisor/supervisor-run.sh
+++ b/supervisor/supervisor-run.sh
@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+# =============================================================================
+# supervisor-run.sh — Cron wrapper: supervisor execution via Claude + formula
+#
+# Runs every 20 minutes (or on-demand). Guards against concurrent runs and
+# low memory. Collects metrics via preflight.sh, then creates a tmux session
+# with Claude (sonnet) reading formulas/run-supervisor.toml.
+#
+# Replaces supervisor-poll.sh (bash orchestrator + claude -p one-shot) with
+# formula-driven interactive Claude session matching the planner/predictor
+# pattern.
+#
+# Usage:
+#   supervisor-run.sh [projects/disinto.toml]   # project config (default: disinto)
+#
+# Cron: */20 * * * * cd /path/to/dark-factory && bash supervisor/supervisor-run.sh
+# =============================================================================
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
+
+# Accept project config from argument; default to disinto
+export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}"
+# shellcheck source=../lib/env.sh
+source "$FACTORY_ROOT/lib/env.sh"
+# shellcheck source=../lib/agent-session.sh
+source "$FACTORY_ROOT/lib/agent-session.sh"
+# shellcheck source=../lib/formula-session.sh
+source "$FACTORY_ROOT/lib/formula-session.sh"
+
+LOG_FILE="$SCRIPT_DIR/supervisor.log"
+# shellcheck disable=SC2034  # consumed by run_formula_and_monitor
+SESSION_NAME="supervisor-${PROJECT_NAME}"
+PHASE_FILE="/tmp/supervisor-session-${PROJECT_NAME}.phase"
+
+# shellcheck disable=SC2034  # read by monitor_phase_loop in lib/agent-session.sh
+PHASE_POLL_INTERVAL=15
+
+SCRATCH_FILE="/tmp/supervisor-${PROJECT_NAME}-scratch.md"
+
+log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%S)Z] $*" >> "$LOG_FILE"; }
+
+# ── Guards ────────────────────────────────────────────────────────────────
+acquire_cron_lock "/tmp/supervisor-run.lock"
+check_memory 2000
+
+log "--- Supervisor run start ---"
+
+# ── Collect pre-flight metrics ────────────────────────────────────────────
+log "Running preflight.sh"
+PREFLIGHT_OUTPUT=""
+if PREFLIGHT_OUTPUT=$(bash "$SCRIPT_DIR/preflight.sh" "$PROJECT_TOML" 2>&1); then
+  log "Preflight collected ($(echo "$PREFLIGHT_OUTPUT" | wc -l) lines)"
+else
+  log "WARNING: preflight.sh failed, continuing with partial data"
+fi
+
+# ── Consume escalation replies ────────────────────────────────────────────
+# Move the file atomically so matrix_listener can write a new one
+ESCALATION_REPLY=""
+if [ -s /tmp/supervisor-escalation-reply ]; then
+  _reply_tmp="/tmp/supervisor-escalation-reply.consumed.$$"
+  if mv /tmp/supervisor-escalation-reply "$_reply_tmp" 2>/dev/null; then
+    ESCALATION_REPLY=$(cat "$_reply_tmp")
+    rm -f "$_reply_tmp"
+    log "Consumed escalation reply: $(echo "$ESCALATION_REPLY" | head -1)"
+  fi
+fi
+
+# ── Load formula + context ───────────────────────────────────────────────
+load_formula "$FACTORY_ROOT/formulas/run-supervisor.toml"
+build_context_block AGENTS.md
+
+# ── Read scratch file (compaction survival) ───────────────────────────────
+SCRATCH_CONTEXT=$(read_scratch_context "$SCRATCH_FILE")
+SCRATCH_INSTRUCTION=$(build_scratch_instruction "$SCRATCH_FILE")
+
+# ── Build prompt ─────────────────────────────────────────────────────────
+build_prompt_footer
+
+# shellcheck disable=SC2034  # consumed by run_formula_and_monitor
+PROMPT="You are the supervisor agent for ${CODEBERG_REPO}. Work through the formula below. You MUST write PHASE:done to '${PHASE_FILE}' when finished — the orchestrator will time you out if you return to the prompt without signalling.
+
+You have full shell access and --dangerously-skip-permissions.
+Fix what you can. Escalate what you cannot. Do NOT ask permission — act first, report after.
+
+## Pre-flight metrics (collected $(date -u +%H:%M) UTC)
+${PREFLIGHT_OUTPUT}
+${ESCALATION_REPLY:+
+## Escalation Reply (from Matrix — human message)
+${ESCALATION_REPLY}
+
+Act on this reply in the decide-actions step.
+}
+## Project context
+${CONTEXT_BLOCK}
+${SCRATCH_CONTEXT:+${SCRATCH_CONTEXT}
+}
+## Formula
+${FORMULA_CONTENT}
+
+${SCRATCH_INSTRUCTION}
+
+${PROMPT_FOOTER}"
+
+# ── Run session ──────────────────────────────────────────────────────────
+export CLAUDE_MODEL="sonnet"
+run_formula_and_monitor "supervisor" 1200
+
+# ── Cleanup scratch file on normal exit ──────────────────────────────────
+# FINAL_PHASE already set by run_formula_and_monitor
+if [ "${FINAL_PHASE:-}" = "PHASE:done" ]; then
+  rm -f "$SCRATCH_FILE"
+fi