Remove all Matrix/Dendrite infrastructure: - Delete lib/matrix_listener.sh (long-poll daemon), lib/matrix_listener.service (systemd unit), lib/hooks/on-stop-matrix.sh (response streaming hook) - Remove matrix_send() and matrix_send_ctx() from lib/env.sh - Remove MATRIX_HOMESERVER auto-detection, MATRIX_THREAD_MAP from lib/env.sh - Remove [matrix] section parsing from lib/load-project.sh - Remove Matrix hook installation from lib/agent-session.sh - Remove notify/notify_ctx helpers and Matrix thread tracking from dev/dev-agent.sh and action/action-agent.sh - Remove all matrix_send calls from dev-poll.sh, phase-handler.sh, action-poll.sh, vault-poll.sh, vault-fire.sh, vault-reject.sh, review-poll.sh, review-pr.sh, supervisor-poll.sh, formula-session.sh - Remove Matrix listener startup from docker/agents/entrypoint.sh - Remove append_dendrite_compose() and setup_matrix() from bin/disinto - Remove --matrix flag from disinto init - Clean Matrix references from .env.example, projects/*.toml.example, formulas/*.toml, AGENTS.md, BOOTSTRAP.md, README.md, RESOURCES.md, PHASE-PROTOCOL.md, and all agent AGENTS.md/PROMPT.md files Status visibility now via Codeberg PR/issue activity. Human interaction via vault items through forge. Proactive alerts via OpenClaw heartbeats. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.4 KiB
Phase-Signaling Protocol for Persistent Claude Sessions
Overview
When dev-agent runs Claude in a persistent tmux session (rather than a
one-shot claude -p invocation), Claude needs a way to signal the
orchestrator (dev-poll.sh) that a phase has completed.
Claude writes a sentinel line to a phase file — a well-known path based on project name and issue number. The orchestrator watches that file and reacts accordingly.
Phase File Path Convention
/tmp/dev-session-{project}-{issue}.phase
Where:
{project}= the project name from the TOML (namefield), e.g.harb{issue}= the issue number, e.g.42
Example: /tmp/dev-session-harb-42.phase
Phase Values
Claude writes exactly one of these lines to the phase file when a phase ends:
| Sentinel | Meaning | Orchestrator action |
|---|---|---|
PHASE:awaiting_ci |
PR pushed, waiting for CI to run | Poll CI; inject result when done |
PHASE:awaiting_review |
CI passed, PR open, waiting for review | Wait for review-poll to inject feedback |
PHASE:escalate |
Needs human input (any reason) | Send vault/forge notification; session stays alive; 24h timeout → blocked |
PHASE:done |
Work complete, PR merged | Verify merge, kill tmux session, clean up |
PHASE:failed |
Unrecoverable failure | Escalate to gardener/supervisor |
Writing a phase (from within Claude's session)
PHASE_FILE="/tmp/dev-session-${PROJECT_NAME:-project}-${ISSUE:-0}.phase"
# Signal awaiting CI
echo "PHASE:awaiting_ci" > "$PHASE_FILE"
# Signal awaiting review
echo "PHASE:awaiting_review" > "$PHASE_FILE"
# Signal needs human
echo "PHASE:escalate" > "$PHASE_FILE"
# Signal done
echo "PHASE:done" > "$PHASE_FILE"
# Signal failure
echo "PHASE:failed" > "$PHASE_FILE"
The orchestrator reads with:
phase=$(head -1 "$PHASE_FILE" 2>/dev/null | tr -d '[:space:]')
Using head -1 is required: PHASE:failed may have a reason line on line 2,
and reading all lines would produce PHASE:failedReason:... which never matches.
Orchestrator Reaction Matrix
PHASE:awaiting_ci → poll CI every 30s
on success → inject "CI passed" into tmux session
on failure → inject CI error log into tmux session
on timeout → inject "CI timeout" + escalate
PHASE:awaiting_review → wait for review-poll.sh to post review comment
on REQUEST_CHANGES → inject review text into session
on APPROVE → inject "approved" into session
on timeout (3h) → inject "no review, escalating"
PHASE:escalate → send vault/forge notification with context (issue/PR link, reason)
session stays alive waiting for human reply
on timeout → 24h: label issue blocked, kill session
PHASE:done → verify PR merged on forge
if merged → kill tmux session, clean labels, close issue
if not → inject "PR not merged yet" into session
PHASE:failed → label issue blocked, post diagnostic comment
kill tmux session
restore backlog label on issue
idle_prompt exit reason
monitor_phase_loop (in lib/agent-session.sh) can exit with
_MONITOR_LOOP_EXIT=idle_prompt. This happens when Claude returns to the
interactive prompt (❯) for 3 consecutive polls without writing any phase
signal to the phase file.
Trigger conditions:
- The phase file is empty (no phase has ever been written), and
- The Stop-hook idle marker (
/tmp/claude-idle-{session}.ts) is present (meaning Claude finished a response), and - This state persists across 3 consecutive poll cycles.
Side-effects:
- The tmux session is killed before the callback is invoked — callbacks
that handle
PHASE:failedmust not assume the session is alive. - The callback is invoked with
PHASE:failedeven though the phase file is empty. This is the only situation wherePHASE:failedis passed to the callback without the phase file actually containing that value.
Agent requirements:
- Callback (
_on_phase_change/formula_phase_callback): Must handlePHASE:faileddefensively — the session is already dead, so any tmux send-keys or session-dependent logic must be skipped or guarded. - Post-loop exit handler (
case $_MONITOR_LOOP_EXIT): Must include anidle_prompt)branch. Typical actions: log the event, clean up temp files, and (for agents that use escalation) write an escalation entry or notify via vault/forge. Seedev/dev-agent.sh,action/action-agent.sh, andgardener/gardener-agent.shfor reference implementations.
Crash Recovery
If the tmux session dies (Claude crash, OOM, kernel OOM-kill, compaction):
Detection
dev-poll.sh detects a crash via:
tmux has-session -t "dev-{project}-{issue}"returns non-zero, OR- Phase file is stale (mtime >
CLAUDE_TIMEOUTseconds with noPHASE:done)
Recovery procedure
# 1. Read current state from disk
git_diff=$(git -C "$WORKTREE" diff origin/main..HEAD --stat 2>/dev/null)
last_phase=$(head -1 "$PHASE_FILE" 2>/dev/null | tr -d '[:space:]')
last_phase="${last_phase:-PHASE:unknown}"
last_ci=$(cat "/tmp/ci-result-${PROJECT_NAME}-${ISSUE}.txt" 2>/dev/null || echo "")
review_comments=$(curl -sf ... "${API}/issues/${PR}/comments" | jq ...)
# 2. Spawn new tmux session in same worktree
tmux new-session -d -s "dev-${PROJECT_NAME}-${ISSUE}" \
-c "$WORKTREE" \
"claude --dangerously-skip-permissions"
# 3. Inject recovery context
tmux send-keys -t "dev-${PROJECT_NAME}-${ISSUE}" \
"$(cat recovery-prompt.txt)" Enter
Recovery context injected into new session:
- Issue body (what to implement)
git diffof work done so far (git is the checkpoint, not memory)- Last known phase (where we left off)
- Last CI result (if phase was
awaiting_ci) - Latest review comments (if phase was
awaiting_review)
Key principle: Git is the checkpoint. The worktree persists across crashes.
Claude can read git log, git diff, and git status to understand exactly
what was done before the crash. No state needs to be stored beyond the phase
file and git history.
State files summary
| File | Created by | Purpose |
|---|---|---|
/tmp/dev-session-{proj}-{issue}.phase |
Claude (in session) | Current phase |
/tmp/ci-result-{proj}-{issue}.txt |
Orchestrator | Last CI output for injection |
/tmp/dev-{proj}-{issue}.log |
Orchestrator | Session transcript (aspirational — path TBD when tmux session manager is implemented in #80) |
/tmp/dev-renotify-{proj}-{issue} |
supervisor-poll.sh | Marker to prevent duplicate 6h re-notifications |
WORKTREE (git worktree) |
dev-agent.sh | Code checkpoint |
Sequence Diagram
Claude session Orchestrator (dev-poll.sh)
────────────── ──────────────────────────
implement issue
push PR branch
echo "PHASE:awaiting_ci" ───→ read phase file
poll CI
CI passes
←── tmux send-keys "CI passed"
echo "PHASE:awaiting_review" → read phase file
wait for review-poll
review: REQUEST_CHANGES
←── tmux send-keys "Review: ..."
address review comments
push fixes
echo "PHASE:awaiting_review" → read phase file
review: APPROVE
←── tmux send-keys "Approved"
merge PR
echo "PHASE:done" ──────────→ read phase file
verify merged
kill session
close issue
Notes
- The phase file is write-once-per-phase (always overwritten with
>). The orchestrator reads it, acts, then waits for the next write. - Claude should write the phase sentinel as the last action of each phase, after any git push or other side effects are complete.
- If Claude writes
PHASE:failed, it should include a reason on the next line:printf 'PHASE:failed\nReason: %s\n' "$reason" > "$PHASE_FILE" - Phase files are cleaned up by the orchestrator after
PHASE:doneorPHASE:failed.