From 5b6c7c962b0ace91842e0cfe6d90e34865b03f1d Mon Sep 17 00:00:00 2001 From: openhands Date: Thu, 26 Mar 2026 10:32:04 +0000 Subject: [PATCH] fix: update AGENTS.md docs and handle stale PHASE:escalate in gardener Address review feedback: - gardener/AGENTS.md: remove escalation flow references, describe vault routing - supervisor/AGENTS.md: remove escalation flow references, describe vault routing - gardener-run.sh: treat PHASE:escalate as terminal (kills session) to prevent zombie sessions from stale/legacy escalation writes Co-Authored-By: Claude Opus 4.6 (1M context) --- gardener/AGENTS.md | 30 ++++++++++++++---------------- gardener/gardener-run.sh | 2 +- supervisor/AGENTS.md | 16 ++++++++-------- 3 files changed, 23 insertions(+), 25 deletions(-) diff --git a/gardener/AGENTS.md b/gardener/AGENTS.md index 40b295f..342e1ec 100644 --- a/gardener/AGENTS.md +++ b/gardener/AGENTS.md @@ -1,29 +1,27 @@ - + # Gardener Agent **Role**: Backlog grooming — detect duplicate issues, missing acceptance criteria, oversized issues, stale issues, and circular dependencies. Enforces the quality gate: strips the `backlog` label from issues that lack acceptance criteria checkboxes (`- [ ]`) or an `## Affected files` section. Invokes -Claude to fix or escalate to a human via Matrix. +Claude to fix what it can; files vault items for what it cannot. **Trigger**: `gardener-run.sh` runs 4x/day via cron. Sources `lib/guard.sh` and calls `check_active gardener` first — skips if `$FACTORY_ROOT/state/.gardener-active` is absent. Then creates a tmux session with `claude --model sonnet`, injects -`formulas/run-gardener.toml` with escalation replies as context, monitors the -phase file, and cleans up on completion or timeout (2h max session). No action -issues — the gardener runs directly from cron like the planner, predictor, and -supervisor. +`formulas/run-gardener.toml` as context, monitors the phase file, and cleans up +on completion or timeout (2h max session). No action issues — the gardener runs +directly from cron like the planner, predictor, and supervisor. **Key files**: - `gardener/gardener-run.sh` — Cron wrapper + orchestrator: lock, memory guard, - consumes escalation replies, sources disinto project config, creates tmux session, - injects formula prompt, monitors phase file via custom `_gardener_on_phase_change` - callback (passed to `run_formula_and_monitor`). Kills session on `PHASE:escalate` - to prevent zombies. Stays alive through CI/review/merge cycle after `PHASE:awaiting_ci` - — injects CI results and review feedback, re-signals `PHASE:awaiting_ci` after - fixes, signals `PHASE:awaiting_review` on CI pass. Executes pending-actions - manifest after PR merge. + sources disinto project config, creates tmux session, injects formula prompt, + monitors phase file via custom `_gardener_on_phase_change` callback (passed to + `run_formula_and_monitor`). Stays alive through CI/review/merge cycle after + `PHASE:awaiting_ci` — injects CI results and review feedback, re-signals + `PHASE:awaiting_ci` after fixes, signals `PHASE:awaiting_review` on CI pass. + Executes pending-actions manifest after PR merge. - `formulas/run-gardener.toml` — Execution spec: preflight, grooming, dust-bundling, blocked-review, agents-update, commit-and-pr - `gardener/pending-actions.json` — Manifest of deferred repo actions (label changes, closures, comments, issue creation). Written during grooming steps, committed to the @@ -35,10 +33,10 @@ supervisor. - `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` **Lifecycle**: gardener-run.sh (cron 0,6,12,18) → `check_active gardener` → lock + memory guard → -consume escalation replies → load formula + context → create tmux session → +load formula + context → create tmux session → Claude grooms backlog (writes proposed actions to manifest), bundles dust, reviews blocked issues, updates AGENTS.md, commits manifest + docs to PR → `PHASE:awaiting_ci` (stays alive) → CI pass → `PHASE:awaiting_review` → review feedback → address + re-signal → merge → gardener-run.sh executes -manifest actions via API → `PHASE:done`. On `PHASE:escalate`: session killed -immediately. +manifest actions via API → `PHASE:done`. When blocked on external resources +or human decisions, files a vault item instead of escalating. diff --git a/gardener/gardener-run.sh b/gardener/gardener-run.sh index 536ee22..58b9b1f 100755 --- a/gardener/gardener-run.sh +++ b/gardener/gardener-run.sh @@ -636,7 +636,7 @@ _gardener_on_phase_change() { PHASE:done|PHASE:merged) agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" ;; - PHASE:failed) + PHASE:failed|PHASE:escalate) agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" ;; PHASE:crashed) diff --git a/supervisor/AGENTS.md b/supervisor/AGENTS.md index b9fc223..c36e978 100644 --- a/supervisor/AGENTS.md +++ b/supervisor/AGENTS.md @@ -1,10 +1,11 @@ - + # Supervisor Agent **Role**: Health monitoring and auto-remediation, executed as a formula-driven Claude agent. Collects system and project metrics via a bash pre-flight script, then runs an interactive Claude session (sonnet) that assesses health, auto-fixes -issues, escalates via Matrix, and writes a daily journal. +issues, reports via Matrix, and writes a daily journal. When blocked on external +resources or human decisions, files vault items instead of escalating directly. **Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh` and calls `check_active supervisor` first — skips if @@ -22,10 +23,9 @@ runs directly from cron like the planner and predictor. - `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap, load), Docker status, active tmux sessions + phase files, lock files, agent log tails, CI pipeline status, open PRs, issue counts, stale worktrees, blocked - issues, Matrix escalation replies. Also performs **stale phase cleanup**: scans - `/tmp/*-session-*.phase` files for `PHASE:escalate` entries and auto-removes any - whose linked issue is confirmed closed (24h grace period after closure to avoid - races) + issues. Also performs **stale phase cleanup**: scans `/tmp/*-session-*.phase` + files for stale `PHASE:failed` entries and auto-removes any whose linked issue + is confirmed closed (24h grace period after closure to avoid races) - `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review, health-assessment, decide-actions, report, journal) with `needs` dependencies. Claude evaluates all metrics and takes actions in a single interactive session @@ -41,7 +41,7 @@ runs directly from cron like the planner and predictor. P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping). **Matrix integration**: The supervisor has its own Matrix thread. Posts health -summaries when there are changes, escalates P0-P2 issues, and processes replies +summaries when there are changes, reports P0-P2 issues, and processes replies from humans ("ignore disk warning", "kill that agent", "what's stuck?"). The Matrix listener routes thread replies to `/tmp/supervisor-escalation-reply`, which `supervisor-run.sh` consumes atomically on each run. @@ -53,6 +53,6 @@ which `supervisor-run.sh` consumes atomically on each run. - `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` — Matrix notifications + human input **Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run -preflight.sh (collect metrics) → consume escalation replies → load formula + +preflight.sh (collect metrics) → consume Matrix replies → load formula + context → create tmux session → Claude assesses health, auto-fixes, posts Matrix summary, writes journal → `PHASE:done`.