7.2 KiB
Phase-Signaling Protocol for Persistent Claude Sessions
Overview
When dev-agent runs Claude in a persistent tmux session (rather than a
one-shot claude -p invocation), Claude needs a way to signal the
orchestrator (dev-poll.sh) that a phase has completed.
Claude writes a sentinel line to a phase file — a well-known path based on project name and issue number. The orchestrator watches that file and reacts accordingly.
Phase File Path Convention
/tmp/dev-session-{project}-{issue}.phase
Where:
{project}= the project name from the TOML (namefield), e.g.harb{issue}= the issue number, e.g.42
Example: /tmp/dev-session-harb-42.phase
Phase Values
Claude writes exactly one of these lines to the phase file when a phase ends:
| Sentinel | Meaning | Orchestrator action |
|---|---|---|
PHASE:awaiting_ci |
PR pushed, waiting for CI to run | Poll CI; inject result when done |
PHASE:awaiting_review |
CI passed, PR open, waiting for review | Wait for review-poll to inject feedback |
PHASE:needs_human |
Blocked on human decision | Send Matrix notification; wait for reply |
PHASE:done |
Work complete, PR merged | Verify merge, kill tmux session, clean up |
PHASE:failed |
Unrecoverable failure | Escalate to gardener/supervisor |
Writing a phase (from within Claude's session)
PHASE_FILE="/tmp/dev-session-${PROJECT_NAME:-project}-${ISSUE:-0}.phase"
# Signal awaiting CI
echo "PHASE:awaiting_ci" > "$PHASE_FILE"
# Signal awaiting review
echo "PHASE:awaiting_review" > "$PHASE_FILE"
# Signal needs human
echo "PHASE:needs_human" > "$PHASE_FILE"
# Signal done
echo "PHASE:done" > "$PHASE_FILE"
# Signal failure
echo "PHASE:failed" > "$PHASE_FILE"
The orchestrator reads with:
phase=$(head -1 "$PHASE_FILE" 2>/dev/null | tr -d '[:space:]')
Using head -1 is required: PHASE:failed may have a reason line on line 2,
and reading all lines would produce PHASE:failedReason:... which never matches.
Orchestrator Reaction Matrix
PHASE:awaiting_ci → poll CI every 30s
on success → inject "CI passed" into tmux session
on failure → inject CI error log into tmux session
on timeout → inject "CI timeout" + escalate
PHASE:awaiting_review → wait for review-poll.sh to post review comment
on REQUEST_CHANGES → inject review text into session
on APPROVE → inject "approved" into session
on timeout (3h) → inject "no review, escalating"
PHASE:needs_human → send Matrix notification with issue/PR link
on reply → supervisor-poll.sh injects reply into tmux session
(gardener-poll.sh as backup if supervisor missed it)
reply file: /tmp/dev-escalation-reply (written by matrix_listener.sh)
on timeout → re-notify at 6h, escalate at 24h (supervisor-poll.sh)
PHASE:done → verify PR merged on Codeberg
if merged → kill tmux session, clean labels, close issue
if not → inject "PR not merged yet" into session
PHASE:failed → write escalation to supervisor/escalations-{project}.jsonl
kill tmux session
restore backlog label on issue
Crash Recovery
If the tmux session dies (Claude crash, OOM, kernel OOM-kill, compaction):
Detection
dev-poll.sh detects a crash via:
tmux has-session -t "dev-{project}-{issue}"returns non-zero, OR- Phase file is stale (mtime >
CLAUDE_TIMEOUTseconds with noPHASE:done)
Recovery procedure
# 1. Read current state from disk
git_diff=$(git -C "$WORKTREE" diff origin/main..HEAD --stat 2>/dev/null)
last_phase=$(head -1 "$PHASE_FILE" 2>/dev/null | tr -d '[:space:]')
last_phase="${last_phase:-PHASE:unknown}"
last_ci=$(cat "/tmp/ci-result-${PROJECT_NAME}-${ISSUE}.txt" 2>/dev/null || echo "")
review_comments=$(curl -sf ... "${API}/issues/${PR}/comments" | jq ...)
# 2. Spawn new tmux session in same worktree
tmux new-session -d -s "dev-${PROJECT_NAME}-${ISSUE}" \
-c "$WORKTREE" \
"claude --dangerously-skip-permissions"
# 3. Inject recovery context
tmux send-keys -t "dev-${PROJECT_NAME}-${ISSUE}" \
"$(cat recovery-prompt.txt)" Enter
Recovery context injected into new session:
- Issue body (what to implement)
git diffof work done so far (git is the checkpoint, not memory)- Last known phase (where we left off)
- Last CI result (if phase was
awaiting_ci) - Latest review comments (if phase was
awaiting_review)
Key principle: Git is the checkpoint. The worktree persists across crashes.
Claude can read git log, git diff, and git status to understand exactly
what was done before the crash. No state needs to be stored beyond the phase
file and git history.
State files summary
| File | Created by | Purpose |
|---|---|---|
/tmp/dev-session-{proj}-{issue}.phase |
Claude (in session) | Current phase |
/tmp/ci-result-{proj}-{issue}.txt |
Orchestrator | Last CI output for injection |
/tmp/dev-{proj}-{issue}.log |
Orchestrator | Session transcript (aspirational — path TBD when tmux session manager is implemented in #80) |
/tmp/dev-escalation-reply |
matrix_listener.sh | Human reply to needs_human escalation (consumed by supervisor-poll.sh) |
/tmp/dev-renotify-{proj}-{issue} |
supervisor-poll.sh | Marker to prevent duplicate 6h re-notifications |
WORKTREE (git worktree) |
dev-agent.sh | Code checkpoint |
Sequence Diagram
Claude session Orchestrator (dev-poll.sh)
────────────── ──────────────────────────
implement issue
push PR branch
echo "PHASE:awaiting_ci" ───→ read phase file
poll CI
CI passes
←── tmux send-keys "CI passed"
echo "PHASE:awaiting_review" → read phase file
wait for review-poll
review: REQUEST_CHANGES
←── tmux send-keys "Review: ..."
address review comments
push fixes
echo "PHASE:awaiting_review" → read phase file
review: APPROVE
←── tmux send-keys "Approved"
merge PR
echo "PHASE:done" ──────────→ read phase file
verify merged
kill session
close issue
Notes
- The phase file is write-once-per-phase (always overwritten with
>). The orchestrator reads it, acts, then waits for the next write. - Claude should write the phase sentinel as the last action of each phase, after any git push or other side effects are complete.
- If Claude writes
PHASE:failed, it should include a reason on the next line:printf 'PHASE:failed\nReason: %s\n' "$reason" > "$PHASE_FILE" - Phase files are cleaned up by the orchestrator after
PHASE:doneorPHASE:failed.