fix: Shared monitor_phase_loop idle_prompt behaviour undocumented for future agents (#265)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-20 22:07:00 +00:00
parent 05884c7d1e
commit a3ca4b55b8
2 changed files with 31 additions and 1 deletions

View file

@ -310,7 +310,7 @@ sourced as needed.
| `lib/matrix_listener.sh` | Long-poll Matrix sync daemon. Dispatches thread replies to the correct agent via well-known files (`/tmp/{agent}-escalation-reply`). Handles supervisor, gardener, dev, review, vault, and action reply routing. Run as systemd service. | Standalone daemon | | `lib/matrix_listener.sh` | Long-poll Matrix sync daemon. Dispatches thread replies to the correct agent via well-known files (`/tmp/{agent}-escalation-reply`). Handles supervisor, gardener, dev, review, vault, and action reply routing. Run as systemd service. | Standalone daemon |
| `lib/formula-session.sh` | `acquire_cron_lock()`, `check_memory()`, `load_formula()`, `build_context_block()`, `start_formula_session()`, `formula_phase_callback()`, `build_prompt_footer()`, `run_formula_and_monitor()` — shared helpers for formula-driven cron agents (lock, memory guard, formula loading, prompt assembly, tmux session, monitor loop, crash recovery). | planner-run.sh, predictor-run.sh | | `lib/formula-session.sh` | `acquire_cron_lock()`, `check_memory()`, `load_formula()`, `build_context_block()`, `start_formula_session()`, `formula_phase_callback()`, `build_prompt_footer()`, `run_formula_and_monitor()` — shared helpers for formula-driven cron agents (lock, memory guard, formula loading, prompt assembly, tmux session, monitor loop, crash recovery). | planner-run.sh, predictor-run.sh |
| `lib/file-action-issue.sh` | `file_action_issue()` — dedup check, label lookup, and issue creation for formula-driven cron wrappers. Sets `FILED_ISSUE_NUM` on success. | gardener-run.sh | | `lib/file-action-issue.sh` | `file_action_issue()` — dedup check, label lookup, and issue creation for formula-driven cron wrappers. Sets `FILED_ISSUE_NUM` on success. | gardener-run.sh |
| `lib/agent-session.sh` | Shared tmux + Claude session helpers: `create_agent_session()`, `inject_formula()`, `agent_wait_for_claude_ready()`, `agent_inject_into_session()`, `agent_kill_session()`, `monitor_phase_loop()`, `read_phase()`. `create_agent_session(session, workdir, [phase_file])` optionally installs a PostToolUse hook (matcher `Bash\|Write`) that detects phase file writes in real-time — when Claude writes to the phase file, the hook writes a marker so `monitor_phase_loop` reacts on the next poll instead of waiting for mtime changes. Also installs a StopFailure hook (matcher `rate_limit\|server_error\|authentication_failed\|billing_error`) that writes `PHASE:failed` with an `api_error` reason to the phase file and touches the phase-changed marker, so the orchestrator discovers API errors within one poll cycle instead of waiting for idle timeout. When `MATRIX_THREAD_ID` is exported, also installs a Stop hook (`on-stop-matrix.sh`) that streams each Claude response to the Matrix thread. `monitor_phase_loop` sets `_MONITOR_LOOP_EXIT` to one of: `done`, `idle_timeout`, `idle_prompt` (Claude returned to `` for 3 consecutive polls without writing any phase — callback invoked with `PHASE:failed`, session already dead), `crashed`, or a `PHASE:*` string. Agents must handle `idle_prompt` in both their callback and their post-loop exit handler. | dev-agent.sh, gardener-agent.sh, action-agent.sh | | `lib/agent-session.sh` | Shared tmux + Claude session helpers: `create_agent_session()`, `inject_formula()`, `agent_wait_for_claude_ready()`, `agent_inject_into_session()`, `agent_kill_session()`, `monitor_phase_loop()`, `read_phase()`. `create_agent_session(session, workdir, [phase_file])` optionally installs a PostToolUse hook (matcher `Bash\|Write`) that detects phase file writes in real-time — when Claude writes to the phase file, the hook writes a marker so `monitor_phase_loop` reacts on the next poll instead of waiting for mtime changes. Also installs a StopFailure hook (matcher `rate_limit\|server_error\|authentication_failed\|billing_error`) that writes `PHASE:failed` with an `api_error` reason to the phase file and touches the phase-changed marker, so the orchestrator discovers API errors within one poll cycle instead of waiting for idle timeout. When `MATRIX_THREAD_ID` is exported, also installs a Stop hook (`on-stop-matrix.sh`) that streams each Claude response to the Matrix thread. `monitor_phase_loop` sets `_MONITOR_LOOP_EXIT` to one of: `done`, `idle_timeout`, `idle_prompt` (Claude returned to `` for 3 consecutive polls without writing any phase — callback invoked with `PHASE:failed`, session already dead), `crashed`, or a `PHASE:*` string. **Callers must handle `idle_prompt`** in both their callback and their post-loop exit handler — see [`docs/PHASE-PROTOCOL.md` § idle_prompt](docs/PHASE-PROTOCOL.md#idle_prompt-exit-reason) for the full contract. | dev-agent.sh, gardener-agent.sh, action-agent.sh |
--- ---

View file

@ -92,6 +92,36 @@ PHASE:failed → write escalation to supervisor/escalations-{project}.j
restore backlog label on issue restore backlog label on issue
``` ```
### `idle_prompt` exit reason
`monitor_phase_loop` (in `lib/agent-session.sh`) can exit with
`_MONITOR_LOOP_EXIT=idle_prompt`. This happens when Claude returns to the
interactive prompt (``) for **3 consecutive polls** without writing any phase
signal to the phase file.
**Trigger conditions:**
- The phase file is empty (no phase has ever been written), **and**
- The Stop-hook idle marker (`/tmp/claude-idle-{session}.ts`) is present
(meaning Claude finished a response), **and**
- This state persists across 3 consecutive poll cycles.
**Side-effects:**
1. The tmux session is **killed before** the callback is invoked — callbacks
that handle `PHASE:failed` must not assume the session is alive.
2. The callback is invoked with `PHASE:failed` even though the phase file is
empty. This is the only situation where `PHASE:failed` is passed to the
callback without the phase file actually containing that value.
**Agent requirements:**
- **Callback (`_on_phase_change` / `formula_phase_callback`):** Must handle
`PHASE:failed` defensively — the session is already dead, so any tmux
send-keys or session-dependent logic must be skipped or guarded.
- **Post-loop exit handler (`case $_MONITOR_LOOP_EXIT`):** Must include an
`idle_prompt)` branch. Typical actions: log the event, clean up temp files,
and (for agents that use escalation) write an escalation entry or notify via
Matrix. See `dev/dev-agent.sh`, `action/action-agent.sh`, and
`gardener/gardener-agent.sh` for reference implementations.
## Crash Recovery ## Crash Recovery
If the tmux session dies (Claude crash, OOM, kernel OOM-kill, compaction): If the tmux session dies (Claude crash, OOM, kernel OOM-kill, compaction):