Merge pull request 'fix: Remove escalation — planner routes through vault instead (#721)' (#727) from fix/issue-721 into main

This commit is contained in:
johba 2026-03-26 11:49:02 +01:00
commit 13bf487bef
13 changed files with 141 additions and 141 deletions

View file

@ -96,7 +96,7 @@ The dev-agent is completely starved until they are promoted or resolved.
For each tier-0 issue: For each tier-0 issue:
- Read the full body: curl -sf -H "Authorization: token $FORGE_TOKEN" "$FORGE_API/issues/{number}" - Read the full body: curl -sf -H "Authorization: token $FORGE_TOKEN" "$FORGE_API/issues/{number}"
- If resolvable: promote to backlog add acceptance criteria, affected files, relabel - If resolvable: promote to backlog add acceptance criteria, affected files, relabel
- If needs human decision: add to ESCALATE block - If needs human decision: file a vault procurement item (vault/pending/<id>.md)
- If invalid / wontfix: close with explanation comment - If invalid / wontfix: close with explanation comment
After completing all tier-0, re-fetch to check for new blockers: After completing all tier-0, re-fetch to check for new blockers:
@ -135,8 +135,16 @@ DUPLICATE (>80% overlap after reading both bodies — confirm before closing):
Close: curl -X PATCH ... /issues/NNN -d '{"state":"closed"}' Close: curl -X PATCH ... /issues/NNN -d '{"state":"closed"}'
Write: echo "ACTION: closed #NNN as duplicate of #OLDER" >> "$RESULT_FILE" Write: echo "ACTION: closed #NNN as duplicate of #OLDER" >> "$RESULT_FILE"
ESCALATE (ambiguous scope, architectural question, needs human decision): VAULT (ambiguous scope, architectural question, needs human decision):
Collect into the ESCALATE block written to the result file at the end. File a vault procurement item at $PROJECT_REPO_ROOT/vault/pending/<id>.md:
# <What decision or resource is needed>
## What
<description>
## Why
<which issue this unblocks>
## Unblocks
- #NNN — <title>
Log: echo "VAULT: filed vault/pending/<id>.md for #NNN — <reason>" >> "$RESULT_FILE"
Dust vs ore rules: Dust vs ore rules:
Dust: comment fix, variable rename, whitespace/formatting, single-line edit, trivial cleanup with no behavior change Dust: comment fix, variable rename, whitespace/formatting, single-line edit, trivial cleanup with no behavior change
@ -179,7 +187,7 @@ Re-fetch ALL open tech-debt issues and count them:
Check each tier: Check each tier:
tier-0 count == 0 (HARD REQUIREMENT factory is blocked until zero) tier-0 count == 0 (HARD REQUIREMENT factory is blocked until zero)
tier-1 all processed or escalated tier-1 all processed or routed to vault
tier-2 all classified tier-2 all classified
If tier-0 > 0: If tier-0 > 0:
@ -195,8 +203,7 @@ If all tiers clear, write the completion summary and signal done:
echo "ACTION: grooming complete — 0 tech-debt remaining" >> "$RESULT_FILE" echo "ACTION: grooming complete — 0 tech-debt remaining" >> "$RESULT_FILE"
echo 'PHASE:done' > "$PHASE_FILE" echo 'PHASE:done' > "$PHASE_FILE"
Escalation format (for items needing human decision write to result file): Vault items filed during this run are picked up by vault-poll automatically.
printf 'ESCALATE\n1. #NNN "title" — reason (a) option1 (b) option2 (c) option3\n' >> "$RESULT_FILE"
On unrecoverable error (API unavailable, repeated failures): On unrecoverable error (API unavailable, repeated failures):
printf 'PHASE:failed\nReason: %s\n' 'describe what failed' > "$PHASE_FILE" printf 'PHASE:failed\nReason: %s\n' 'describe what failed' > "$PHASE_FILE"

View file

@ -119,8 +119,16 @@ DUST (trivial — single-line edit, rename, comment, style, whitespace):
Do NOT close dust issues the dust-bundling step auto-bundles groups Do NOT close dust issues the dust-bundling step auto-bundles groups
of 3+ into one backlog issue. of 3+ into one backlog issue.
ESCALATE (needs human decision): VAULT (needs human decision or external resource):
printf 'ESCALATE\n1. #NNN "title" — reason (a) option1 (b) option2\n' >> "$RESULT_FILE" File a vault procurement item at $PROJECT_REPO_ROOT/vault/pending/<id>.md:
# <What decision or resource is needed>
## What
<description>
## Why
<which issue this unblocks>
## Unblocks
- #NNN — <title>
Log: echo "VAULT: filed vault/pending/<id>.md for #NNN — <reason>" >> "$RESULT_FILE"
CLEAN (only if truly nothing to do): CLEAN (only if truly nothing to do):
echo 'CLEAN' >> "$RESULT_FILE" echo 'CLEAN' >> "$RESULT_FILE"
@ -150,7 +158,7 @@ Sibling dependency rule (CRITICAL):
Only close for clear, unambiguous violations. If the issue is Only close for clear, unambiguous violations. If the issue is
borderline or could be interpreted as compatible, leave it open borderline or could be interpreted as compatible, leave it open
and ESCALATE instead. and file a VAULT item for human decision instead.
8. Quality gate backlog label enforcement: 8. Quality gate backlog label enforcement:
For each open issue labeled 'backlog', verify it has the required For each open issue labeled 'backlog', verify it has the required
@ -178,7 +186,7 @@ Processing order:
2. AD alignment check close backlog issues that violate architecture decisions 2. AD alignment check close backlog issues that violate architecture decisions
3. Quality gate strip backlog from issues missing acceptance criteria or affected files 3. Quality gate strip backlog from issues missing acceptance criteria or affected files
4. Process tech-debt issues by score (impact/effort) 4. Process tech-debt issues by score (impact/effort)
5. Classify remaining items as dust or escalate 5. Classify remaining items as dust or route to vault
Do NOT bundle dust yourself the dust-bundling step handles accumulation, Do NOT bundle dust yourself the dust-bundling step handles accumulation,
dedup, TTL expiry, and bundling into backlog issues. dedup, TTL expiry, and bundling into backlog issues.

View file

@ -123,8 +123,9 @@ Update the tree:
Bounce/stuck detection for issues in the tree, fetch recent comments: Bounce/stuck detection for issues in the tree, fetch recent comments:
curl -sf -H "Authorization: token $FORGE_TOKEN" \ curl -sf -H "Authorization: token $FORGE_TOKEN" \
"$FORGE_API/issues/<number>/comments?limit=10" "$FORGE_API/issues/<number>/comments?limit=10"
Signals: BOUNCED (too_large, underspecified), ESCALATED (needs human decision), Signals: BOUNCED (too_large, underspecified),
LABEL_CHURN (3+ relabels between backlog/underspecified). LABEL_CHURN (3+ relabels between backlog/underspecified).
If an issue needs a human decision or external resource, it is HUMAN_BLOCKED.
Track as stuck_issues[] for constraint filing below. Track as stuck_issues[] for constraint filing below.
Hold the updated tree in memory written to disk in journal-and-commit. Hold the updated tree in memory written to disk in journal-and-commit.
@ -148,7 +149,17 @@ Graph bottlenecks (high betweenness centrality) and thin objectives inform ranki
Stuck issue handling: Stuck issue handling:
- BOUNCED/LABEL_CHURN: do NOT re-promote. Dispatch groom-backlog formula instead: - BOUNCED/LABEL_CHURN: do NOT re-promote. Dispatch groom-backlog formula instead:
tea_file_issue "chore: break down #<N> — bounced <count>x" "<body>" "action" tea_file_issue "chore: break down #<N> — bounced <count>x" "<body>" "action"
- ESCALATED: skip, mark in tree as "escalated — awaiting human decision" - HUMAN_BLOCKED (needs human decision or external resource): file a vault
procurement item instead of skipping. Write vault/pending/<resource-id>.md:
# <What is needed>
## What
<description of the resource or decision needed>
## Why
<which objective/issue this unblocks>
## Unblocks
- #<issue> — <title>
Then mark the prerequisite in the tree as "blocked-on-vault (vault/pending/<id>.md)".
Do NOT skip or mark as "awaiting human decision" the vault owns the human interface.
Filing gate (for non-stuck constraints): Filing gate (for non-stuck constraints):
1. Check if issue already exists (match by #number in tree or title search) 1. Check if issue already exists (match by #number in tree or title search)

View file

@ -9,7 +9,7 @@
# Key differences from planner/gardener: # Key differences from planner/gardener:
# - Runs every 20min — lightweight health check # - Runs every 20min — lightweight health check
# - Primarily READS state, rarely WRITES (no PRs, just Matrix + journal) # - Primarily READS state, rarely WRITES (no PRs, just Matrix + journal)
# - Reactive to escalations — processes pending escalation events # - Checks vault state for pending procurement items
# - Conversation memory via Matrix thread and journal # - Conversation memory via Matrix thread and journal
name = "run-supervisor" name = "run-supervisor"
@ -29,14 +29,14 @@ and injected into your prompt above. Review them now.
1. Read the injected metrics data carefully (System Resources, Docker, 1. Read the injected metrics data carefully (System Resources, Docker,
Active Sessions, Phase Files, Stale Phase Cleanup, Lock Files, Agent Logs, Active Sessions, Phase Files, Stale Phase Cleanup, Lock Files, Agent Logs,
CI Pipelines, Open PRs, Issue Status, Stale Worktrees, Pending Escalations, CI Pipelines, Open PRs, Issue Status, Stale Worktrees).
Escalation Replies).
Note: preflight.sh auto-removes PHASE:escalate files for closed issues Note: preflight.sh auto-removes PHASE:escalate files for closed issues
(24h grace period). Check the "Stale Phase Cleanup" section for any (24h grace period). Check the "Stale Phase Cleanup" section for any
files cleaned or in grace period this run. files cleaned or in grace period this run.
2. If there are escalation replies from Matrix (human messages), note them 2. Check vault state: read vault/pending/*.md for any procurement items
you will act on them in the decide-actions step. the planner has filed. Note items relevant to the health assessment
(e.g. a blocked resource that explains why the pipeline is stalled).
3. Read the supervisor journal for recent history: 3. Read the supervisor journal for recent history:
JOURNAL_FILE="$FACTORY_ROOT/supervisor/journal/$(date -u +%Y-%m-%d).md" JOURNAL_FILE="$FACTORY_ROOT/supervisor/journal/$(date -u +%Y-%m-%d).md"
@ -70,9 +70,9 @@ Categorize every finding from the metrics into priority levels.
- Git repo on wrong branch or in broken rebase state - Git repo on wrong branch or in broken rebase state
- Pipeline stalled: backlog issues exist but no agent ran for > 20min - Pipeline stalled: backlog issues exist but no agent ran for > 20min
- Dev-agent blocked: last N polls all report "no ready issues" - Dev-agent blocked: last N polls all report "no ready issues"
- Dev/action sessions in PHASE:escalate for > 24h (escalation timeout) - Dev/action sessions in PHASE:escalate for > 24h (session timeout)
(Note: PHASE:escalate files for closed issues are auto-cleaned by preflight; (Note: PHASE:escalate files for closed issues are auto-cleaned by preflight;
this check covers escalations where the issue is still open) this check covers sessions where the issue is still open)
### P3 — Factory degraded ### P3 — Factory degraded
- PRs stale: CI finished >20min ago AND no git push to the PR branch since CI completed - PRs stale: CI finished >20min ago AND no git push to the PR branch since CI completed
@ -92,7 +92,7 @@ needs = ["preflight"]
[[steps]] [[steps]]
id = "decide-actions" id = "decide-actions"
title = "Fix what you can, escalate what you cannot" title = "Fix what you can, file vault items for what you cannot"
description = """ description = """
For each finding from the health assessment, decide and execute an action. For each finding from the health assessment, decide and execute an action.
@ -145,20 +145,21 @@ For each finding from the health assessment, decide and execute an action.
tmux send-keys -t "$SESSION" "# [supervisor] PR stale >20min — CI finished, please push or update" Enter tmux send-keys -t "$SESSION" "# [supervisor] PR stale >20min — CI finished, please push or update" Enter
fi fi
If no active tmux session exists, note it in the journal for the next dev-poll cycle. If no active tmux session exists, note it in the journal for the next dev-poll cycle.
Do NOT escalate stale PRs to Matrix unless they remain stale for >3 consecutive runs. Do NOT file vault items for stale PRs unless they remain stale for >3 consecutive runs.
### Escalation replies (from Matrix) ### Cannot auto-fix → file vault item
If there are escalation replies from a human, act on them:
- "ignore X" note in journal, do not alert on X this run
- "kill that agent" identify and kill the referenced session
- "what's stuck?" include detailed status in the Matrix report
- Other instructions follow them, use best judgment
### Cannot auto-fix → escalate
For P0-P2 issues that persist after auto-fix attempts, or issues requiring For P0-P2 issues that persist after auto-fix attempts, or issues requiring
human judgment, prepare an escalation message for the report step. human judgment, file a vault procurement item:
Write $PROJECT_REPO_ROOT/vault/pending/supervisor-<issue-slug>.md:
# <What is needed>
## What
<description of the problem and why the supervisor cannot fix it>
## Why
<impact on factory health reference the priority level>
## Unblocks
- Factory health: <what this resolves>
The vault-poll will notify the human and track the request.
Read the relevant best-practices file before taking action: Read the relevant best-practices file before taking action:
cat "$FACTORY_ROOT/supervisor/best-practices/memory.md" # P0 cat "$FACTORY_ROOT/supervisor/best-practices/memory.md" # P0
@ -167,7 +168,7 @@ Read the relevant best-practices file before taking action:
cat "$FACTORY_ROOT/supervisor/best-practices/dev-agent.md" # P2 agent cat "$FACTORY_ROOT/supervisor/best-practices/dev-agent.md" # P2 agent
cat "$FACTORY_ROOT/supervisor/best-practices/git.md" # P2 git cat "$FACTORY_ROOT/supervisor/best-practices/git.md" # P2 git
Track what you fixed and what needs escalation for the report step. Track what you fixed and what vault items you filed for the report step.
""" """
needs = ["health-assessment"] needs = ["health-assessment"]
@ -196,15 +197,14 @@ Post a summary grouped by priority:
Status: RAM=<X>MB Disk=<Y>% Load=<Z>" Status: RAM=<X>MB Disk=<Y>% Load=<Z>"
### When escalation is needed (P0-P2 unresolved) ### When vault items were filed (P0-P2 unresolved)
Escalate with a clear call to action: Note the vault items in the status summary:
matrix_send "supervisor" "ESCALATE: <what's wrong and why you can't fix it> matrix_send "supervisor" "Supervisor health check:
Suggested action: <what the human should do>" Filed vault items:
- vault/pending/<id>.md <summary>
### Responding to escalation replies Status: RAM=<X>MB Disk=<Y>% Load=<Z>"
If you acted on a human's reply, confirm what you did:
matrix_send "supervisor" "Acted on your reply: <summary of action taken>"
Keep messages concise. Do not post identical messages to what was posted Keep messages concise. Do not post identical messages to what was posted
in the previous run (check journal for prior messages). in the previous run (check journal for prior messages).
@ -233,15 +233,15 @@ Format:
- Docker: <N> containers - Docker: <N> containers
### Findings ### Findings
- [P<N>] <finding> <action taken or "escalated"> - [P<N>] <finding> <action taken or "filed vault item">
(or "No issues found — all systems healthy") (or "No issues found — all systems healthy")
### Actions taken ### Actions taken
- <what was fixed> - <what was fixed>
(or "No actions needed") (or "No actions needed")
### Escalation replies processed ### Vault items filed
- <human said X, did Y> - vault/pending/<id>.md <reason>
(or "None") (or "None")
Keep each entry concise 15-25 lines max. This journal provides Keep each entry concise 15-25 lines max. This journal provides

View file

@ -1,29 +1,27 @@
<!-- last-reviewed: 043bf0f0217aef3f319b844f1a1277acd6327a1c --> <!-- last-reviewed: f2064ba67c3b6819f5e252300927c01e2825dd7c -->
# Gardener Agent # Gardener Agent
**Role**: Backlog grooming — detect duplicate issues, missing acceptance **Role**: Backlog grooming — detect duplicate issues, missing acceptance
criteria, oversized issues, stale issues, and circular dependencies. Enforces criteria, oversized issues, stale issues, and circular dependencies. Enforces
the quality gate: strips the `backlog` label from issues that lack acceptance the quality gate: strips the `backlog` label from issues that lack acceptance
criteria checkboxes (`- [ ]`) or an `## Affected files` section. Invokes criteria checkboxes (`- [ ]`) or an `## Affected files` section. Invokes
Claude to fix or escalate to a human via Matrix. Claude to fix what it can; files vault items for what it cannot.
**Trigger**: `gardener-run.sh` runs 4x/day via cron. Sources `lib/guard.sh` and **Trigger**: `gardener-run.sh` runs 4x/day via cron. Sources `lib/guard.sh` and
calls `check_active gardener` first — skips if `$FACTORY_ROOT/state/.gardener-active` calls `check_active gardener` first — skips if `$FACTORY_ROOT/state/.gardener-active`
is absent. Then creates a tmux session with `claude --model sonnet`, injects is absent. Then creates a tmux session with `claude --model sonnet`, injects
`formulas/run-gardener.toml` with escalation replies as context, monitors the `formulas/run-gardener.toml` as context, monitors the phase file, and cleans up
phase file, and cleans up on completion or timeout (2h max session). No action on completion or timeout (2h max session). No action issues — the gardener runs
issues — the gardener runs directly from cron like the planner, predictor, and directly from cron like the planner, predictor, and supervisor.
supervisor.
**Key files**: **Key files**:
- `gardener/gardener-run.sh` — Cron wrapper + orchestrator: lock, memory guard, - `gardener/gardener-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
consumes escalation replies, sources disinto project config, creates tmux session, sources disinto project config, creates tmux session, injects formula prompt,
injects formula prompt, monitors phase file via custom `_gardener_on_phase_change` monitors phase file via custom `_gardener_on_phase_change` callback (passed to
callback (passed to `run_formula_and_monitor`). Kills session on `PHASE:escalate` `run_formula_and_monitor`). Stays alive through CI/review/merge cycle after
to prevent zombies. Stays alive through CI/review/merge cycle after `PHASE:awaiting_ci` `PHASE:awaiting_ci` — injects CI results and review feedback, re-signals
— injects CI results and review feedback, re-signals `PHASE:awaiting_ci` after `PHASE:awaiting_ci` after fixes, signals `PHASE:awaiting_review` on CI pass.
fixes, signals `PHASE:awaiting_review` on CI pass. Executes pending-actions Executes pending-actions manifest after PR merge.
manifest after PR merge.
- `formulas/run-gardener.toml` — Execution spec: preflight, grooming, dust-bundling, blocked-review, agents-update, commit-and-pr - `formulas/run-gardener.toml` — Execution spec: preflight, grooming, dust-bundling, blocked-review, agents-update, commit-and-pr
- `gardener/pending-actions.json` — Manifest of deferred repo actions (label changes, - `gardener/pending-actions.json` — Manifest of deferred repo actions (label changes,
closures, comments, issue creation). Written during grooming steps, committed to the closures, comments, issue creation). Written during grooming steps, committed to the
@ -35,10 +33,10 @@ supervisor.
- `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` - `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER`
**Lifecycle**: gardener-run.sh (cron 0,6,12,18) → `check_active gardener` → lock + memory guard → **Lifecycle**: gardener-run.sh (cron 0,6,12,18) → `check_active gardener` → lock + memory guard →
consume escalation replies → load formula + context → create tmux session → load formula + context → create tmux session →
Claude grooms backlog (writes proposed actions to manifest), bundles dust, Claude grooms backlog (writes proposed actions to manifest), bundles dust,
reviews blocked issues, updates AGENTS.md, commits manifest + docs to PR → reviews blocked issues, updates AGENTS.md, commits manifest + docs to PR →
`PHASE:awaiting_ci` (stays alive) → CI pass → `PHASE:awaiting_review` `PHASE:awaiting_ci` (stays alive) → CI pass → `PHASE:awaiting_review`
review feedback → address + re-signal → merge → gardener-run.sh executes review feedback → address + re-signal → merge → gardener-run.sh executes
manifest actions via API → `PHASE:done`. On `PHASE:escalate`: session killed manifest actions via API → `PHASE:done`. When blocked on external resources
immediately. or human decisions, files a vault item instead of escalating.

View file

@ -60,9 +60,6 @@ check_memory 2000
log "--- Gardener run start ---" log "--- Gardener run start ---"
# ── Consume escalation replies ────────────────────────────────────────────
consume_escalation_reply "gardener"
# ── Load formula + context ─────────────────────────────────────────────── # ── Load formula + context ───────────────────────────────────────────────
load_formula "$FACTORY_ROOT/formulas/run-gardener.toml" load_formula "$FACTORY_ROOT/formulas/run-gardener.toml"
build_context_block AGENTS.md build_context_block AGENTS.md
@ -114,13 +111,8 @@ If no file changes in commit-and-pr:
PROMPT="You are the issue gardener for ${FORGE_REPO}. Work through the formula below. Follow the phase protocol: if the commit-and-pr step creates a PR, write PHASE:awaiting_ci and wait for orchestrator CI/review/merge handling. If no file changes, write PHASE:done. The orchestrator will time you out if you return to the prompt without signalling. PROMPT="You are the issue gardener for ${FORGE_REPO}. Work through the formula below. Follow the phase protocol: if the commit-and-pr step creates a PR, write PHASE:awaiting_ci and wait for orchestrator CI/review/merge handling. If no file changes, write PHASE:done. The orchestrator will time you out if you return to the prompt without signalling.
You have full shell access and --dangerously-skip-permissions. You have full shell access and --dangerously-skip-permissions.
Fix what you can. Escalate what you cannot. Do NOT ask permission — act first, report after. Fix what you can. File vault items for what you cannot. Do NOT ask permission — act first, report after.
${ESCALATION_REPLY:+
## Escalation Reply (from Matrix — human message)
${ESCALATION_REPLY}
Act on this reply during the grooming step.
}
## Project context ## Project context
${CONTEXT_BLOCK} ${CONTEXT_BLOCK}
${SCRATCH_CONTEXT:+${SCRATCH_CONTEXT} ${SCRATCH_CONTEXT:+${SCRATCH_CONTEXT}
@ -337,8 +329,8 @@ _gardener_merge() {
printf 'PHASE:done\n' > "$PHASE_FILE" printf 'PHASE:done\n' > "$PHASE_FILE"
return 0 return 0
fi fi
log "gardener merge blocked (HTTP 405) — escalating" log "gardener merge blocked (HTTP 405)"
printf 'PHASE:escalate\nReason: gardener PR #%s merge blocked (HTTP 405)\n' \ printf 'PHASE:failed\nReason: gardener PR #%s merge blocked (HTTP 405)\n' \
"$_GARDENER_PR" > "$PHASE_FILE" "$_GARDENER_PR" > "$PHASE_FILE"
return 0 return 0
fi fi
@ -350,7 +342,7 @@ _gardener_merge() {
git fetch origin ${PRIMARY_BRANCH} && git rebase origin/${PRIMARY_BRANCH} git fetch origin ${PRIMARY_BRANCH} && git rebase origin/${PRIMARY_BRANCH}
git push --force-with-lease origin HEAD git push --force-with-lease origin HEAD
echo \"PHASE:awaiting_ci\" > \"${PHASE_FILE}\" echo \"PHASE:awaiting_ci\" > \"${PHASE_FILE}\"
If rebase fails, write PHASE:escalate with a reason." If rebase fails, write PHASE:failed with a reason."
} }
# shellcheck disable=SC2317 # called indirectly by monitor_phase_loop # shellcheck disable=SC2317 # called indirectly by monitor_phase_loop
@ -468,7 +460,7 @@ Write PHASE:awaiting_review to the phase file, then stop and wait:
if ! $ci_done; then if ! $ci_done; then
log "CI timeout for PR #${_GARDENER_PR}" log "CI timeout for PR #${_GARDENER_PR}"
agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" \ agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" \
"CI TIMEOUT: CI did not complete within 15 minutes for PR #${_GARDENER_PR}. Write PHASE:escalate if you cannot proceed." "CI TIMEOUT: CI did not complete within 15 minutes for PR #${_GARDENER_PR}. Write PHASE:failed with a reason if you cannot proceed."
return 0 return 0
fi fi
@ -484,7 +476,7 @@ Write PHASE:awaiting_review to the phase file, then stop and wait:
_GARDENER_CI_FIX_COUNT=$(( _GARDENER_CI_FIX_COUNT + 1 )) _GARDENER_CI_FIX_COUNT=$(( _GARDENER_CI_FIX_COUNT + 1 ))
if [ "$_GARDENER_CI_FIX_COUNT" -gt 3 ]; then if [ "$_GARDENER_CI_FIX_COUNT" -gt 3 ]; then
log "CI exhausted after ${_GARDENER_CI_FIX_COUNT} attempts" log "CI exhausted after ${_GARDENER_CI_FIX_COUNT} attempts"
printf 'PHASE:escalate\nReason: gardener CI exhausted after %d attempts\n' \ printf 'PHASE:failed\nReason: gardener CI exhausted after %d attempts\n' \
"$_GARDENER_CI_FIX_COUNT" > "$PHASE_FILE" "$_GARDENER_CI_FIX_COUNT" > "$PHASE_FILE"
return 0 return 0
fi fi
@ -625,7 +617,7 @@ Then stop and wait."
if [ "$review_elapsed" -ge "$review_timeout" ]; then if [ "$review_elapsed" -ge "$review_timeout" ]; then
log "review wait timed out for PR #${_GARDENER_PR}" log "review wait timed out for PR #${_GARDENER_PR}"
agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" \ agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" \
"No review received after ${review_timeout}s for PR #${_GARDENER_PR}. Write PHASE:escalate if you cannot proceed." "No review received after ${review_timeout}s for PR #${_GARDENER_PR}. Write PHASE:failed with a reason if you cannot proceed."
fi fi
} }
@ -644,14 +636,7 @@ _gardener_on_phase_change() {
PHASE:done|PHASE:merged) PHASE:done|PHASE:merged)
agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}"
;; ;;
PHASE:failed) PHASE:failed|PHASE:escalate)
agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}"
;;
PHASE:escalate)
local reason
reason=$(sed -n '2p' "$PHASE_FILE" 2>/dev/null | sed 's/^Reason: //' || true)
log "escalated: ${reason}"
matrix_send "gardener" "Gardener escalated: ${reason}" 2>/dev/null || true
agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}"
;; ;;
PHASE:crashed) PHASE:crashed)

View file

@ -17,14 +17,18 @@ Dismissed predictions get re-filed by the predictor with stronger evidence
if still valid. Phase 2 if still valid. Phase 2
(update-prerequisite-tree): scan repo state + open/closed issues, mark resolved (update-prerequisite-tree): scan repo state + open/closed issues, mark resolved
prerequisites, discover new ones, update the tree. **Also scans comments on prerequisites, discover new ones, update the tree. **Also scans comments on
referenced issues for bounce/stuck signals** (BOUNCED, ESCALATED, LABEL_CHURN) referenced issues for bounce/stuck signals** (BOUNCED, LABEL_CHURN)
to detect issues ping-ponging between backlog and underspecified. Phase 3 to detect issues ping-ponging between backlog and underspecified. Issues that
need human decisions or external resources are filed as vault procurement items
(`vault/pending/*.md`) instead of being escalated. Phase 3
(file-at-constraints): identify the top 3 unresolved prerequisites that block (file-at-constraints): identify the top 3 unresolved prerequisites that block
the most downstream objectives — file issues as either `backlog` (code changes, the most downstream objectives — file issues as either `backlog` (code changes,
dev-agent) or `action` (run existing formula, action-agent). **Stuck issues dev-agent) or `action` (run existing formula, action-agent). **Stuck issues
(detected BOUNCED/LABEL_CHURN) are dispatched to the `groom-backlog` formula (detected BOUNCED/LABEL_CHURN) are dispatched to the `groom-backlog` formula
in breakdown mode instead of being re-promoted** — this breaks the ping-pong in breakdown mode instead of being re-promoted** — this breaks the ping-pong
loop by splitting them into dev-agent-sized sub-issues. loop by splitting them into dev-agent-sized sub-issues. **Human-blocked issues
are routed through the vault** — the planner files a procurement item and marks
the prerequisite as blocked-on-vault in the tree.
Phase 4 (journal-and-memory): write updated prerequisite tree + daily journal Phase 4 (journal-and-memory): write updated prerequisite tree + daily journal
entry (committed to git) and update `planner/MEMORY.md` (committed to git). entry (committed to git) and update `planner/MEMORY.md` (committed to git).
Phase 5 (commit-and-pr): one commit with all file changes, push, create PR. Phase 5 (commit-and-pr): one commit with all file changes, push, create PR.

View file

@ -24,8 +24,8 @@ Status: DONE — #395 closed
## Objective: Example project demonstrating full lifecycle (#466) ## Objective: Example project demonstrating full lifecycle (#466)
- [x] disinto init working (#393) - [x] disinto init working (#393)
- [ ] Human decision on implementation approach (external repo vs local demo) ⚠ escalated — awaiting human decision (since 2026-03-23) - [ ] Human decision on implementation approach (external repo vs local demo) — blocked-on-vault
Status: BLOCKED — bounced by dev-agent (too large), escalated by gardener, 3 days without human response Status: BLOCKED — bounced by dev-agent (too large), routed to vault for human decision
## Objective: Landing page communicating value proposition (#534) ## Objective: Landing page communicating value proposition (#534)
- [x] disinto init working (#393) - [x] disinto init working (#393)

View file

@ -1,10 +1,11 @@
<!-- last-reviewed: 043bf0f0217aef3f319b844f1a1277acd6327a1c --> <!-- last-reviewed: f2064ba67c3b6819f5e252300927c01e2825dd7c -->
# Supervisor Agent # Supervisor Agent
**Role**: Health monitoring and auto-remediation, executed as a formula-driven **Role**: Health monitoring and auto-remediation, executed as a formula-driven
Claude agent. Collects system and project metrics via a bash pre-flight script, Claude agent. Collects system and project metrics via a bash pre-flight script,
then runs an interactive Claude session (sonnet) that assesses health, auto-fixes then runs an interactive Claude session (sonnet) that assesses health, auto-fixes
issues, escalates via Matrix, and writes a daily journal. issues, reports via Matrix, and writes a daily journal. When blocked on external
resources or human decisions, files vault items instead of escalating directly.
**Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh` **Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh`
and calls `check_active supervisor` first — skips if and calls `check_active supervisor` first — skips if
@ -22,10 +23,9 @@ runs directly from cron like the planner and predictor.
- `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap, - `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap,
load), Docker status, active tmux sessions + phase files, lock files, agent log load), Docker status, active tmux sessions + phase files, lock files, agent log
tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked
issues, Matrix escalation replies. Also performs **stale phase cleanup**: scans issues. Also performs **stale phase cleanup**: scans `/tmp/*-session-*.phase`
`/tmp/*-session-*.phase` files for `PHASE:escalate` entries and auto-removes any files for `PHASE:escalate` entries and auto-removes any whose linked issue
whose linked issue is confirmed closed (24h grace period after closure to avoid is confirmed closed (24h grace period after closure to avoid races)
races)
- `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review, - `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review,
health-assessment, decide-actions, report, journal) with `needs` dependencies. health-assessment, decide-actions, report, journal) with `needs` dependencies.
Claude evaluates all metrics and takes actions in a single interactive session Claude evaluates all metrics and takes actions in a single interactive session
@ -41,10 +41,8 @@ runs directly from cron like the planner and predictor.
P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping). P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).
**Matrix integration**: The supervisor has its own Matrix thread. Posts health **Matrix integration**: The supervisor has its own Matrix thread. Posts health
summaries when there are changes, escalates P0-P2 issues, and processes replies summaries when there are changes, reports P0-P2 issues, and processes replies
from humans ("ignore disk warning", "kill that agent", "what's stuck?"). The from humans ("ignore disk warning", "kill that agent", "what's stuck?").
Matrix listener routes thread replies to `/tmp/supervisor-escalation-reply`,
which `supervisor-run.sh` consumes atomically on each run.
**Environment variables consumed**: **Environment variables consumed**:
- `FORGE_TOKEN`, `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT` - `FORGE_TOKEN`, `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`
@ -53,6 +51,6 @@ which `supervisor-run.sh` consumes atomically on each run.
- `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` — Matrix notifications + human input - `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` — Matrix notifications + human input
**Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run **Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run
preflight.sh (collect metrics) → consume escalation replies → load formula + preflight.sh (collect metrics) → consume Matrix replies → load formula +
context → create tmux session → Claude assesses health, auto-fixes, posts context → create tmux session → Claude assesses health, auto-fixes, posts
Matrix summary, writes journal → `PHASE:done`. Matrix summary, writes journal → `PHASE:done`.

View file

@ -53,7 +53,7 @@ stuck. Your job: figure out the correct dependency direction and fix the wrong o
actually depends on which actually depends on which
3. Edit the issue that has the incorrect dep to remove the `#NNN` reference from its 3. Edit the issue that has the incorrect dep to remove the `#NNN` reference from its
`## Dependencies` section (replace with `- None` if it was the only dep) `## Dependencies` section (replace with `- None` if it was the only dep)
4. If the correct direction is unclear from code, escalate with both issue summaries 4. If the correct direction is unclear from code, file a vault item with both issue summaries
Use the forge API to edit issue bodies: Use the forge API to edit issue bodies:
```bash ```bash
@ -70,25 +70,35 @@ obsolete or misprioritized. Investigate:
1. Check if dep #B is still relevant (read its body, check if the code it targets changed) 1. Check if dep #B is still relevant (read its body, check if the code it targets changed)
2. If the dep is obsolete → remove it from #A's `## Dependencies` section 2. If the dep is obsolete → remove it from #A's `## Dependencies` section
3. If the dep is still needed → escalate, suggesting to prioritize #B or split #A 3. If the dep is still needed → file a vault item, suggesting to prioritize #B or split #A
### Dev-agent blocked (P2) ### Dev-agent blocked (P2)
When you see "Dev-agent blocked: last N polls all report 'no ready issues'": When you see "Dev-agent blocked: last N polls all report 'no ready issues'":
1. Check if circular deps exist (they'll appear as separate P3 alerts) 1. Check if circular deps exist (they'll appear as separate P3 alerts)
2. Check if all backlog issues depend on a single unmerged issue — if so, escalate 2. Check if all backlog issues depend on a single unmerged issue — if so, file a vault
to prioritize that blocker item to prioritize that blocker
3. If no clear blocker, escalate with the list of blocked issues and their deps 3. If no clear blocker, file a vault item with the list of blocked issues and their deps
## Escalation ## When you cannot fix it
If you can't fix it, escalate via Matrix: File a vault procurement item so the human is notified through the vault:
```bash ```bash
source ${FACTORY_ROOT}/lib/env.sh cat > "${PROJECT_REPO_ROOT}/vault/pending/supervisor-$(date -u +%Y%m%d-%H%M)-issue.md" <<'VAULT_EOF'
matrix_send "supervisor" "🏭 ESCALATE: <what's wrong and why you can't fix it>" # <What is needed>
## What
<description of the problem and why the supervisor cannot fix it>
## Why
<impact on factory health>
## Unblocks
- Factory health: <what this resolves>
VAULT_EOF
``` ```
Do NOT escalate if you can fix it. Do NOT ask permission. Fix first, report after. The vault-poll will notify the human and track the request.
Do NOT talk to the human directly. The vault is the factory's only interface
to the human for resources and approvals. Fix first, report after.
## Output ## Output
@ -97,7 +107,7 @@ FIXED: <what you did>
``` ```
or or
``` ```
ESCALATE: <what's wrong> VAULT: filed vault/pending/<id>.md — <what's needed>
``` ```
## Learning ## Learning

View file

@ -214,14 +214,15 @@ else
fi fi
echo "" echo ""
# ── Escalation Replies from Matrix ──────────────────────────────────────── # ── Pending Vault Items ───────────────────────────────────────────────────
echo "## Escalation Replies (from Matrix)" echo "## Pending Vault Items"
if [ -s /tmp/supervisor-escalation-reply ]; then _found_vault=false
cat /tmp/supervisor-escalation-reply for _vf in "${PROJECT_REPO_ROOT}/vault/pending/"*.md; do
echo "" [ -f "$_vf" ] || continue
echo "(Reply already consumed by supervisor-run.sh before this session)" _found_vault=true
else _vtitle=$(grep -m1 '^# ' "$_vf" | sed 's/^# //' || basename "$_vf")
echo " None" echo " $(basename "$_vf"): ${_vtitle}"
fi done
[ "$_found_vault" = false ] && echo " None"
echo "" echo ""

View file

@ -80,14 +80,6 @@ status() {
flog "$*" flog "$*"
} }
# ── Check for escalation replies from Matrix ──────────────────────────────
ESCALATION_REPLY=""
if [ -s /tmp/supervisor-escalation-reply ]; then
ESCALATION_REPLY=$(cat /tmp/supervisor-escalation-reply)
rm -f /tmp/supervisor-escalation-reply
flog "Got escalation reply: $(echo "$ESCALATION_REPLY" | head -1)"
fi
# Alerts by priority # Alerts by priority
P0_ALERTS="" P0_ALERTS=""
P1_ALERTS="" P1_ALERTS=""
@ -813,13 +805,7 @@ Disk: $(df -h / | awk 'NR==2{printf "%s used of %s (%s)", $3, $2, $5}')
Docker: $(sudo docker ps --format '{{.Names}}' 2>/dev/null | wc -l) containers running Docker: $(sudo docker ps --format '{{.Names}}' 2>/dev/null | wc -l) containers running
Claude procs: $(pgrep -f "claude" 2>/dev/null | wc -l) Claude procs: $(pgrep -f "claude" 2>/dev/null | wc -l)
$(if [ -n "$ESCALATION_REPLY" ]; then echo " Fix what you can. File vault items for what you can't. Read the relevant best-practices file first."
## Human Response to Previous Escalation
${ESCALATION_REPLY}
Act on this response."; fi)
Fix what you can. Escalate what you can't. Read the relevant best-practices file first."
CLAUDE_OUTPUT=$(timeout 300 claude -p --model sonnet --dangerously-skip-permissions \ CLAUDE_OUTPUT=$(timeout 300 claude -p --model sonnet --dangerously-skip-permissions \
"$CLAUDE_PROMPT" 2>&1) || true "$CLAUDE_PROMPT" 2>&1) || true

View file

@ -59,9 +59,6 @@ else
log "WARNING: preflight.sh failed, continuing with partial data" log "WARNING: preflight.sh failed, continuing with partial data"
fi fi
# ── Consume escalation replies ────────────────────────────────────────────
consume_escalation_reply "supervisor"
# ── Load formula + context ─────────────────────────────────────────────── # ── Load formula + context ───────────────────────────────────────────────
load_formula "$FACTORY_ROOT/formulas/run-supervisor.toml" load_formula "$FACTORY_ROOT/formulas/run-supervisor.toml"
build_context_block AGENTS.md build_context_block AGENTS.md
@ -77,16 +74,11 @@ build_prompt_footer
PROMPT="You are the supervisor agent for ${FORGE_REPO}. Work through the formula below. You MUST write PHASE:done to '${PHASE_FILE}' when finished — the orchestrator will time you out if you return to the prompt without signalling. PROMPT="You are the supervisor agent for ${FORGE_REPO}. Work through the formula below. You MUST write PHASE:done to '${PHASE_FILE}' when finished — the orchestrator will time you out if you return to the prompt without signalling.
You have full shell access and --dangerously-skip-permissions. You have full shell access and --dangerously-skip-permissions.
Fix what you can. Escalate what you cannot. Do NOT ask permission — act first, report after. Fix what you can. File vault items for what you cannot. Do NOT ask permission — act first, report after.
## Pre-flight metrics (collected $(date -u +%H:%M) UTC) ## Pre-flight metrics (collected $(date -u +%H:%M) UTC)
${PREFLIGHT_OUTPUT} ${PREFLIGHT_OUTPUT}
${ESCALATION_REPLY:+
## Escalation Reply (from Matrix — human message)
${ESCALATION_REPLY}
Act on this reply in the decide-actions step.
}
## Project context ## Project context
${CONTEXT_BLOCK} ${CONTEXT_BLOCK}
${SCRATCH_CONTEXT:+${SCRATCH_CONTEXT} ${SCRATCH_CONTEXT:+${SCRATCH_CONTEXT}