fix: Remove escalation — planner routes through vault instead (#721)

Remove ESCALATED signal and escalation handling from planner, supervisor, and gardener. When blocked on external resources or human decisions, these agents now file vault procurement items (vault/pending/*.md) instead of escalating directly to the human. Changes: - Planner formula: ESCALATED signal replaced with HUMAN_BLOCKED; files vault items and marks prerequisites as blocked-on-vault - Supervisor formula/prompt: escalation sections replaced with vault item filing; preflight now reports pending vault items instead of escalation replies - Gardener formula: ESCALATE action replaced with VAULT action; files vault/pending/*.md for human decisions - Groom-backlog formula: same ESCALATE→VAULT replacement - Gardener shell: PHASE:escalate replaced with PHASE:failed for merge blocks and CI exhaustion; escalation reply consumption removed - Supervisor shell: escalation reply consumption removed from both supervisor-run.sh and legacy supervisor-poll.sh - Prerequisite tree: #466 updated from "escalated" to "blocked-on-vault" The vault is the factory's only interface to the human for resources and approvals. Dev/action agents retain PHASE:escalate for operational session issues (CI timeouts, merge blocks) which are a different mechanism. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 09:09:58 +00:00 · 2026-03-26 09:09:58 +00:00 · f2064ba67c
commit f2064ba67c
parent 850a8d743f
11 changed files with 117 additions and 113 deletions
--- a/formulas/groom-backlog.toml
+++ b/formulas/groom-backlog.toml
@ -96,7 +96,7 @@ The dev-agent is completely starved until they are promoted or resolved.
 For each tier-0 issue:
  - Read the full body: curl -sf -H "Authorization: token $FORGE_TOKEN" "$FORGE_API/issues/{number}"
  - If resolvable: promote to backlog — add acceptance criteria, affected files, relabel
-  - If needs human decision: add to ESCALATE block
+  - If needs human decision: file a vault procurement item (vault/pending/<id>.md)
  - If invalid / wontfix: close with explanation comment

 After completing all tier-0, re-fetch to check for new blockers:
@ -135,8 +135,16 @@ DUPLICATE (>80% overlap after reading both bodies — confirm before closing):
  Close:        curl -X PATCH ... /issues/NNN -d '{"state":"closed"}'
  Write: echo "ACTION: closed #NNN as duplicate of #OLDER" >> "$RESULT_FILE"

-ESCALATE (ambiguous scope, architectural question, needs human decision):
-  Collect into the ESCALATE block written to the result file at the end.
+VAULT (ambiguous scope, architectural question, needs human decision):
+  File a vault procurement item at $PROJECT_REPO_ROOT/vault/pending/<id>.md:
+    # <What decision or resource is needed>
+    ## What
+    <description>
+    ## Why
+    <which issue this unblocks>
+    ## Unblocks
+    - #NNN — <title>
+  Log: echo "VAULT: filed vault/pending/<id>.md for #NNN — <reason>" >> "$RESULT_FILE"

 Dust vs ore rules:
  Dust: comment fix, variable rename, whitespace/formatting, single-line edit, trivial cleanup with no behavior change
@ -179,7 +187,7 @@ Re-fetch ALL open tech-debt issues and count them:

 Check each tier:
  tier-0 count == 0  (HARD REQUIREMENT — factory is blocked until zero)
-  tier-1 all processed or escalated
+  tier-1 all processed or routed to vault
  tier-2 all classified

 If tier-0 > 0:
@ -195,8 +203,7 @@ If all tiers clear, write the completion summary and signal done:
  echo "ACTION: grooming complete — 0 tech-debt remaining" >> "$RESULT_FILE"
  echo 'PHASE:done' > "$PHASE_FILE"

-Escalation format (for items needing human decision — write to result file):
-  printf 'ESCALATE\n1. #NNN "title" — reason (a) option1 (b) option2 (c) option3\n' >> "$RESULT_FILE"
+Vault items filed during this run are picked up by vault-poll automatically.

 On unrecoverable error (API unavailable, repeated failures):
  printf 'PHASE:failed\nReason: %s\n' 'describe what failed' > "$PHASE_FILE"
--- a/formulas/run-gardener.toml
+++ b/formulas/run-gardener.toml
@ -119,8 +119,16 @@ DUST (trivial — single-line edit, rename, comment, style, whitespace):
  Do NOT close dust issues — the dust-bundling step auto-bundles groups
  of 3+ into one backlog issue.

-ESCALATE (needs human decision):
-  printf 'ESCALATE\n1. #NNN "title" — reason (a) option1 (b) option2\n' >> "$RESULT_FILE"
+VAULT (needs human decision or external resource):
+  File a vault procurement item at $PROJECT_REPO_ROOT/vault/pending/<id>.md:
+    # <What decision or resource is needed>
+    ## What
+    <description>
+    ## Why
+    <which issue this unblocks>
+    ## Unblocks
+    - #NNN — <title>
+  Log: echo "VAULT: filed vault/pending/<id>.md for #NNN — <reason>" >> "$RESULT_FILE"

 CLEAN (only if truly nothing to do):
  echo 'CLEAN' >> "$RESULT_FILE"
@ -150,7 +158,7 @@ Sibling dependency rule (CRITICAL):

   Only close for clear, unambiguous violations. If the issue is
   borderline or could be interpreted as compatible, leave it open
-   and ESCALATE instead.
+   and file a VAULT item for human decision instead.

 8. Quality gate — backlog label enforcement:
   For each open issue labeled 'backlog', verify it has the required
@ -178,7 +186,7 @@ Processing order:
  2. AD alignment check — close backlog issues that violate architecture decisions
  3. Quality gate — strip backlog from issues missing acceptance criteria or affected files
  4. Process tech-debt issues by score (impact/effort)
-  5. Classify remaining items as dust or escalate
+  5. Classify remaining items as dust or route to vault

 Do NOT bundle dust yourself — the dust-bundling step handles accumulation,
 dedup, TTL expiry, and bundling into backlog issues.
--- a/formulas/run-planner.toml
+++ b/formulas/run-planner.toml
@ -123,8 +123,9 @@ Update the tree:
 Bounce/stuck detection — for issues in the tree, fetch recent comments:
  curl -sf -H "Authorization: token $FORGE_TOKEN" \
    "$FORGE_API/issues/<number>/comments?limit=10"
-  Signals: BOUNCED (too_large, underspecified), ESCALATED (needs human decision),
+  Signals: BOUNCED (too_large, underspecified),
  LABEL_CHURN (3+ relabels between backlog/underspecified).
+  If an issue needs a human decision or external resource, it is HUMAN_BLOCKED.
  Track as stuck_issues[] for constraint filing below.

 Hold the updated tree in memory — written to disk in journal-and-commit.
@ -148,7 +149,17 @@ Graph bottlenecks (high betweenness centrality) and thin objectives inform ranki
 Stuck issue handling:
  - BOUNCED/LABEL_CHURN: do NOT re-promote. Dispatch groom-backlog formula instead:
      tea_file_issue "chore: break down #<N> — bounced <count>x" "<body>" "action"
-  - ESCALATED: skip, mark in tree as "escalated — awaiting human decision"
+  - HUMAN_BLOCKED (needs human decision or external resource): file a vault
+    procurement item instead of skipping. Write vault/pending/<resource-id>.md:
+      # <What is needed>
+      ## What
+      <description of the resource or decision needed>
+      ## Why
+      <which objective/issue this unblocks>
+      ## Unblocks
+      - #<issue> — <title>
+    Then mark the prerequisite in the tree as "blocked-on-vault (vault/pending/<id>.md)".
+    Do NOT skip or mark as "awaiting human decision" — the vault owns the human interface.

 Filing gate (for non-stuck constraints):
  1. Check if issue already exists (match by #number in tree or title search)
--- a/formulas/run-supervisor.toml
+++ b/formulas/run-supervisor.toml
@ -9,7 +9,7 @@
 # Key differences from planner/gardener:
 #   - Runs every 20min — lightweight health check
 #   - Primarily READS state, rarely WRITES (no PRs, just Matrix + journal)
-#   - Reactive to escalations — processes pending escalation events
+#   - Checks vault state for pending procurement items
 #   - Conversation memory via Matrix thread and journal

 name        = "run-supervisor"
@ -29,14 +29,14 @@ and injected into your prompt above. Review them now.

 1. Read the injected metrics data carefully (System Resources, Docker,
   Active Sessions, Phase Files, Stale Phase Cleanup, Lock Files, Agent Logs,
-   CI Pipelines, Open PRs, Issue Status, Stale Worktrees, Pending Escalations,
-   Escalation Replies).
+   CI Pipelines, Open PRs, Issue Status, Stale Worktrees).
   Note: preflight.sh auto-removes PHASE:escalate files for closed issues
   (24h grace period). Check the "Stale Phase Cleanup" section for any
   files cleaned or in grace period this run.

-2. If there are escalation replies from Matrix (human messages), note them —
-   you will act on them in the decide-actions step.
+2. Check vault state: read vault/pending/*.md for any procurement items
+   the planner has filed. Note items relevant to the health assessment
+   (e.g. a blocked resource that explains why the pipeline is stalled).

 3. Read the supervisor journal for recent history:
     JOURNAL_FILE="$FACTORY_ROOT/supervisor/journal/$(date -u +%Y-%m-%d).md"
@ -70,9 +70,9 @@ Categorize every finding from the metrics into priority levels.
 - Git repo on wrong branch or in broken rebase state
 - Pipeline stalled: backlog issues exist but no agent ran for > 20min
 - Dev-agent blocked: last N polls all report "no ready issues"
- Dev/action sessions in PHASE:escalate for > 24h (escalation timeout)
+- Dev/action sessions in PHASE:escalate for > 24h (session timeout)
  (Note: PHASE:escalate files for closed issues are auto-cleaned by preflight;
-  this check covers escalations where the issue is still open)
+  this check covers sessions where the issue is still open)

 ### P3 — Factory degraded
 - PRs stale: CI finished >20min ago AND no git push to the PR branch since CI completed
@ -92,7 +92,7 @@ needs = ["preflight"]

 [[steps]]
 id    = "decide-actions"
-title = "Fix what you can, escalate what you cannot"
+title = "Fix what you can, file vault items for what you cannot"
 description = """
 For each finding from the health assessment, decide and execute an action.

@ -145,20 +145,21 @@ For each finding from the health assessment, decide and execute an action.
      tmux send-keys -t "$SESSION" "# [supervisor] PR stale >20min — CI finished, please push or update" Enter
    fi
  If no active tmux session exists, note it in the journal for the next dev-poll cycle.
-  Do NOT escalate stale PRs to Matrix unless they remain stale for >3 consecutive runs.
+  Do NOT file vault items for stale PRs unless they remain stale for >3 consecutive runs.

-### Escalation replies (from Matrix)
-
-If there are escalation replies from a human, act on them:
- "ignore X" → note in journal, do not alert on X this run
- "kill that agent" → identify and kill the referenced session
- "what's stuck?" → include detailed status in the Matrix report
- Other instructions → follow them, use best judgment
-
-### Cannot auto-fix → escalate
+### Cannot auto-fix → file vault item

 For P0-P2 issues that persist after auto-fix attempts, or issues requiring
-human judgment, prepare an escalation message for the report step.
+human judgment, file a vault procurement item:
+  Write $PROJECT_REPO_ROOT/vault/pending/supervisor-<issue-slug>.md:
+    # <What is needed>
+    ## What
+    <description of the problem and why the supervisor cannot fix it>
+    ## Why
+    <impact on factory health — reference the priority level>
+    ## Unblocks
+    - Factory health: <what this resolves>
+  The vault-poll will notify the human and track the request.

 Read the relevant best-practices file before taking action:
  cat "$FACTORY_ROOT/supervisor/best-practices/memory.md"    # P0
@ -167,7 +168,7 @@ Read the relevant best-practices file before taking action:
  cat "$FACTORY_ROOT/supervisor/best-practices/dev-agent.md" # P2 agent
  cat "$FACTORY_ROOT/supervisor/best-practices/git.md"       # P2 git

-Track what you fixed and what needs escalation for the report step.
+Track what you fixed and what vault items you filed for the report step.
 """
 needs = ["health-assessment"]

@ -196,15 +197,14 @@ Post a summary grouped by priority:

  Status: RAM=<X>MB Disk=<Y>% Load=<Z>"

-### When escalation is needed (P0-P2 unresolved)
-Escalate with a clear call to action:
-  matrix_send "supervisor" "ESCALATE: <what's wrong and why you can't fix it>
+### When vault items were filed (P0-P2 unresolved)
+Note the vault items in the status summary:
+  matrix_send "supervisor" "Supervisor health check:

-  Suggested action: <what the human should do>"
+  Filed vault items:
+  - vault/pending/<id>.md — <summary>

-### Responding to escalation replies
-If you acted on a human's reply, confirm what you did:
-  matrix_send "supervisor" "Acted on your reply: <summary of action taken>"
+  Status: RAM=<X>MB Disk=<Y>% Load=<Z>"

 Keep messages concise. Do not post identical messages to what was posted
 in the previous run (check journal for prior messages).
@ -233,15 +233,15 @@ Format:
  - Docker: <N> containers

  ### Findings
-  - [P<N>] <finding> — <action taken or "escalated">
+  - [P<N>] <finding> — <action taken or "filed vault item">
  (or "No issues found — all systems healthy")

  ### Actions taken
  - <what was fixed>
  (or "No actions needed")

-  ### Escalation replies processed
-  - <human said X, did Y>
+  ### Vault items filed
+  - vault/pending/<id>.md — <reason>
  (or "None")

 Keep each entry concise — 15-25 lines max. This journal provides