fix: update AGENTS.md docs and handle stale PHASE:escalate in gardener

Address review feedback:
- gardener/AGENTS.md: remove escalation flow references, describe vault routing
- supervisor/AGENTS.md: remove escalation flow references, describe vault routing
- gardener-run.sh: treat PHASE:escalate as terminal (kills session) to prevent
  zombie sessions from stale/legacy escalation writes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-26 10:32:04 +00:00
parent f2064ba67c
commit 5b6c7c962b
3 changed files with 23 additions and 25 deletions

View file

@ -1,29 +1,27 @@
<!-- last-reviewed: 043bf0f0217aef3f319b844f1a1277acd6327a1c --> <!-- last-reviewed: f2064ba67c3b6819f5e252300927c01e2825dd7c -->
# Gardener Agent # Gardener Agent
**Role**: Backlog grooming — detect duplicate issues, missing acceptance **Role**: Backlog grooming — detect duplicate issues, missing acceptance
criteria, oversized issues, stale issues, and circular dependencies. Enforces criteria, oversized issues, stale issues, and circular dependencies. Enforces
the quality gate: strips the `backlog` label from issues that lack acceptance the quality gate: strips the `backlog` label from issues that lack acceptance
criteria checkboxes (`- [ ]`) or an `## Affected files` section. Invokes criteria checkboxes (`- [ ]`) or an `## Affected files` section. Invokes
Claude to fix or escalate to a human via Matrix. Claude to fix what it can; files vault items for what it cannot.
**Trigger**: `gardener-run.sh` runs 4x/day via cron. Sources `lib/guard.sh` and **Trigger**: `gardener-run.sh` runs 4x/day via cron. Sources `lib/guard.sh` and
calls `check_active gardener` first — skips if `$FACTORY_ROOT/state/.gardener-active` calls `check_active gardener` first — skips if `$FACTORY_ROOT/state/.gardener-active`
is absent. Then creates a tmux session with `claude --model sonnet`, injects is absent. Then creates a tmux session with `claude --model sonnet`, injects
`formulas/run-gardener.toml` with escalation replies as context, monitors the `formulas/run-gardener.toml` as context, monitors the phase file, and cleans up
phase file, and cleans up on completion or timeout (2h max session). No action on completion or timeout (2h max session). No action issues — the gardener runs
issues — the gardener runs directly from cron like the planner, predictor, and directly from cron like the planner, predictor, and supervisor.
supervisor.
**Key files**: **Key files**:
- `gardener/gardener-run.sh` — Cron wrapper + orchestrator: lock, memory guard, - `gardener/gardener-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
consumes escalation replies, sources disinto project config, creates tmux session, sources disinto project config, creates tmux session, injects formula prompt,
injects formula prompt, monitors phase file via custom `_gardener_on_phase_change` monitors phase file via custom `_gardener_on_phase_change` callback (passed to
callback (passed to `run_formula_and_monitor`). Kills session on `PHASE:escalate` `run_formula_and_monitor`). Stays alive through CI/review/merge cycle after
to prevent zombies. Stays alive through CI/review/merge cycle after `PHASE:awaiting_ci` `PHASE:awaiting_ci` — injects CI results and review feedback, re-signals
— injects CI results and review feedback, re-signals `PHASE:awaiting_ci` after `PHASE:awaiting_ci` after fixes, signals `PHASE:awaiting_review` on CI pass.
fixes, signals `PHASE:awaiting_review` on CI pass. Executes pending-actions Executes pending-actions manifest after PR merge.
manifest after PR merge.
- `formulas/run-gardener.toml` — Execution spec: preflight, grooming, dust-bundling, blocked-review, agents-update, commit-and-pr - `formulas/run-gardener.toml` — Execution spec: preflight, grooming, dust-bundling, blocked-review, agents-update, commit-and-pr
- `gardener/pending-actions.json` — Manifest of deferred repo actions (label changes, - `gardener/pending-actions.json` — Manifest of deferred repo actions (label changes,
closures, comments, issue creation). Written during grooming steps, committed to the closures, comments, issue creation). Written during grooming steps, committed to the
@ -35,10 +33,10 @@ supervisor.
- `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` - `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER`
**Lifecycle**: gardener-run.sh (cron 0,6,12,18) → `check_active gardener` → lock + memory guard → **Lifecycle**: gardener-run.sh (cron 0,6,12,18) → `check_active gardener` → lock + memory guard →
consume escalation replies → load formula + context → create tmux session → load formula + context → create tmux session →
Claude grooms backlog (writes proposed actions to manifest), bundles dust, Claude grooms backlog (writes proposed actions to manifest), bundles dust,
reviews blocked issues, updates AGENTS.md, commits manifest + docs to PR → reviews blocked issues, updates AGENTS.md, commits manifest + docs to PR →
`PHASE:awaiting_ci` (stays alive) → CI pass → `PHASE:awaiting_review` `PHASE:awaiting_ci` (stays alive) → CI pass → `PHASE:awaiting_review`
review feedback → address + re-signal → merge → gardener-run.sh executes review feedback → address + re-signal → merge → gardener-run.sh executes
manifest actions via API → `PHASE:done`. On `PHASE:escalate`: session killed manifest actions via API → `PHASE:done`. When blocked on external resources
immediately. or human decisions, files a vault item instead of escalating.

View file

@ -636,7 +636,7 @@ _gardener_on_phase_change() {
PHASE:done|PHASE:merged) PHASE:done|PHASE:merged)
agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}"
;; ;;
PHASE:failed) PHASE:failed|PHASE:escalate)
agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}"
;; ;;
PHASE:crashed) PHASE:crashed)

View file

@ -1,10 +1,11 @@
<!-- last-reviewed: 043bf0f0217aef3f319b844f1a1277acd6327a1c --> <!-- last-reviewed: f2064ba67c3b6819f5e252300927c01e2825dd7c -->
# Supervisor Agent # Supervisor Agent
**Role**: Health monitoring and auto-remediation, executed as a formula-driven **Role**: Health monitoring and auto-remediation, executed as a formula-driven
Claude agent. Collects system and project metrics via a bash pre-flight script, Claude agent. Collects system and project metrics via a bash pre-flight script,
then runs an interactive Claude session (sonnet) that assesses health, auto-fixes then runs an interactive Claude session (sonnet) that assesses health, auto-fixes
issues, escalates via Matrix, and writes a daily journal. issues, reports via Matrix, and writes a daily journal. When blocked on external
resources or human decisions, files vault items instead of escalating directly.
**Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh` **Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh`
and calls `check_active supervisor` first — skips if and calls `check_active supervisor` first — skips if
@ -22,10 +23,9 @@ runs directly from cron like the planner and predictor.
- `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap, - `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap,
load), Docker status, active tmux sessions + phase files, lock files, agent log load), Docker status, active tmux sessions + phase files, lock files, agent log
tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked
issues, Matrix escalation replies. Also performs **stale phase cleanup**: scans issues. Also performs **stale phase cleanup**: scans `/tmp/*-session-*.phase`
`/tmp/*-session-*.phase` files for `PHASE:escalate` entries and auto-removes any files for stale `PHASE:failed` entries and auto-removes any whose linked issue
whose linked issue is confirmed closed (24h grace period after closure to avoid is confirmed closed (24h grace period after closure to avoid races)
races)
- `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review, - `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review,
health-assessment, decide-actions, report, journal) with `needs` dependencies. health-assessment, decide-actions, report, journal) with `needs` dependencies.
Claude evaluates all metrics and takes actions in a single interactive session Claude evaluates all metrics and takes actions in a single interactive session
@ -41,7 +41,7 @@ runs directly from cron like the planner and predictor.
P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping). P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).
**Matrix integration**: The supervisor has its own Matrix thread. Posts health **Matrix integration**: The supervisor has its own Matrix thread. Posts health
summaries when there are changes, escalates P0-P2 issues, and processes replies summaries when there are changes, reports P0-P2 issues, and processes replies
from humans ("ignore disk warning", "kill that agent", "what's stuck?"). The from humans ("ignore disk warning", "kill that agent", "what's stuck?"). The
Matrix listener routes thread replies to `/tmp/supervisor-escalation-reply`, Matrix listener routes thread replies to `/tmp/supervisor-escalation-reply`,
which `supervisor-run.sh` consumes atomically on each run. which `supervisor-run.sh` consumes atomically on each run.
@ -53,6 +53,6 @@ which `supervisor-run.sh` consumes atomically on each run.
- `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` — Matrix notifications + human input - `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` — Matrix notifications + human input
**Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run **Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run
preflight.sh (collect metrics) → consume escalation replies → load formula + preflight.sh (collect metrics) → consume Matrix replies → load formula +
context → create tmux session → Claude assesses health, auto-fixes, posts context → create tmux session → Claude assesses health, auto-fixes, posts
Matrix summary, writes journal → `PHASE:done`. Matrix summary, writes journal → `PHASE:done`.