From a15623b747cb29012bb060ba7b98e12412a5a4db Mon Sep 17 00:00:00 2001
From: johba <johba@noreply.codeberg.org>
Date: Fri, 20 Mar 2026 17:39:44 +0100
Subject: [PATCH] =?UTF-8?q?fix:=20fix:=20action-agent=20shares=20phase=20h?=
 =?UTF-8?q?andler=20with=20dev-agent=20=E2=80=94=20review=20lifecycle=20+?=
 =?UTF-8?q?=20cleanup=20(#388)=20(#403)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes #388

## Changes
Action-agent now sources dev/phase-handler.sh and enters monitor_phase_loop after prompt injection. Two paths: (A) git output triggers the same PR/CI/review lifecycle as dev-agent, (B) no-git output writes PHASE:done for cleanup. Adds docker compose down on terminal phases, escalation to supervisor on idle timeout, and proper temp file cleanup.

Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/disinto/pulls/403
Reviewed-by: Disinto_bot <disinto_bot@noreply.codeberg.org>
---
 .woodpecker/agent-smoke.sh |   2 +-
 AGENTS.md                  |  18 ++---
 action/action-agent.sh     | 137 ++++++++++++++++++++++++++-----------
 dev/phase-handler.sh       | 106 ++++++++++++++++++++++++----
 4 files changed, 198 insertions(+), 65 deletions(-)

diff --git a/.woodpecker/agent-smoke.sh b/.woodpecker/agent-smoke.sh
index a6a1bda..87e148d 100644
--- a/.woodpecker/agent-smoke.sh
+++ b/.woodpecker/agent-smoke.sh
@@ -170,7 +170,7 @@ check_script vault/vault-fire.sh
 check_script vault/vault-poll.sh
 check_script vault/vault-reject.sh
 check_script action/action-poll.sh
-check_script action/action-agent.sh
+check_script action/action-agent.sh    dev/phase-handler.sh
 
 echo "function resolution check done"
 
diff --git a/AGENTS.md b/AGENTS.md
index 48a9b0e..10ab265 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -227,9 +227,9 @@ later triages these predictions into action/backlog issues or dismisses them.
 ### Action (`action/`)
 
 **Role**: Execute operational tasks described by action formulas — run scripts,
-call APIs, send messages, collect human approval. Unlike the dev-agent, the
-action-agent produces no PRs: Claude closes the issue directly after executing
-all formula steps.
+call APIs, send messages, collect human approval. Shares the same phase handler
+as the dev-agent: if an action produces code changes, the orchestrator creates a
+PR and drives the CI/review loop; otherwise Claude closes the issue directly.
 
 **Trigger**: `action-poll.sh` runs every 10 min via cron. It scans for open
 issues labeled `action` that have no active tmux session, then spawns
@@ -237,17 +237,17 @@ issues labeled `action` that have no active tmux session, then spawns
 
 **Key files**:
 - `action/action-poll.sh` — Cron scheduler: finds open action issues with no active tmux session, spawns action-agent.sh
-- `action/action-agent.sh` — Orchestrator: fetches issue body + prior comments, creates tmux session (`action-{issue_num}`) with interactive `claude`, injects formula prompt, monitors session until Claude exits or 4h idle timeout
+- `action/action-agent.sh` — Orchestrator: fetches issue body + prior comments, creates tmux session (`action-{issue_num}`) with interactive `claude`, injects formula prompt with phase protocol, enters `monitor_phase_loop` (shared via `dev/phase-handler.sh`) for CI/review lifecycle or direct completion
 
 **Session lifecycle**:
 1. `action-poll.sh` finds open `action` issues with no active tmux session.
 2. Spawns `action-agent.sh <issue_num>`.
 3. Agent creates Matrix thread, exports `MATRIX_THREAD_ID` so Claude's output streams to the thread via a Stop hook (`on-stop-matrix.sh`).
-4. Agent creates tmux session `action-{issue_num}`, injects prompt (formula + prior comments).
-5. Claude executes formula steps using Bash and other tools, posts progress as issue comments. Each Claude turn is streamed to the Matrix thread for real-time human visibility.
-6. For human input: Claude sends a Matrix message and waits; the reply is injected into the session by `matrix_listener.sh`.
-7. When complete: Claude closes the issue with a summary comment. Session exits.
-8. Poll detects no active session on next run — nothing further to do.
+4. Agent creates tmux session `action-{issue_num}`, injects prompt (formula + prior comments + phase protocol).
+5. Agent enters `monitor_phase_loop` (shared with dev-agent via `dev/phase-handler.sh`).
+6. **Path A (git output):** Claude pushes branch → `PHASE:awaiting_ci` → handler creates PR, polls CI → injects failures → Claude fixes → push → re-poll → CI passes → `PHASE:awaiting_review` → handler polls reviews → injects REQUEST_CHANGES → Claude fixes → approved → merge → cleanup.
+7. **Path B (no git output):** Claude posts results as comment, closes issue → `PHASE:done` → handler cleans up (kill session, docker compose down, remove temp files).
+8. For human input: Claude sends a Matrix message and waits; the reply is injected into the session by `matrix_listener.sh`.
 
 **Environment variables consumed**:
 - `CODEBERG_TOKEN`, `CODEBERG_REPO`, `CODEBERG_API`, `PROJECT_NAME`, `CODEBERG_WEB`
diff --git a/action/action-agent.sh b/action/action-agent.sh
index cca560b..80fe835 100644
--- a/action/action-agent.sh
+++ b/action/action-agent.sh
@@ -6,10 +6,13 @@
 # Lifecycle:
 #   1. Fetch issue body (action formula) + existing comments
 #   2. Create tmux session: action-{issue_num} with interactive claude
-#   3. Inject initial prompt: formula + comments + instructions
-#   4. Claude executes formula steps, posts progress comments, closes issue
+#   3. Inject initial prompt: formula + comments + phase protocol instructions
+#   4. Monitor phase file via monitor_phase_loop (shared with dev-agent)
+#   Path A (git output): Claude pushes → handler creates PR → CI poll → review
+#     injection → merge → cleanup (same loop as dev-agent via phase-handler.sh)
+#   Path B (no git output): Claude posts results → PHASE:done → cleanup
 #   5. For human input: Claude asks via Matrix; reply injected via matrix_listener
-#   6. Monitor session until Claude exits or idle timeout reached
+#   6. Cleanup on terminal phase: kill tmux, docker compose down, remove temp files
 #
 # Session:  action-{issue_num} (tmux)
 # Log:      action/action-poll-{project}.log
@@ -21,16 +24,52 @@ export PROJECT_TOML="${2:-${PROJECT_TOML:-}}"
 
 source "$(dirname "$0")/../lib/env.sh"
 source "$(dirname "$0")/../lib/agent-session.sh"
+# shellcheck source=../dev/phase-handler.sh
+source "$(dirname "$0")/../dev/phase-handler.sh"
 SESSION_NAME="action-${ISSUE}"
 LOCKFILE="/tmp/action-agent-${ISSUE}.lock"
 LOGFILE="${FACTORY_ROOT}/action/action-poll-${PROJECT_NAME:-harb}.log"
 THREAD_FILE="/tmp/action-thread-${ISSUE}"
 IDLE_TIMEOUT="${ACTION_IDLE_TIMEOUT:-14400}"  # 4h default
 
+# --- Phase handler globals (agent-specific; defaults in phase-handler.sh) ---
+# shellcheck disable=SC2034  # used by phase-handler.sh
+API="${CODEBERG_API}"
+BRANCH="action/issue-${ISSUE}"
+# shellcheck disable=SC2034  # used by phase-handler.sh
+WORKTREE="${PROJECT_REPO_ROOT}"
+PHASE_FILE="/tmp/action-session-${PROJECT_NAME:-harb}-${ISSUE}.phase"
+IMPL_SUMMARY_FILE="/tmp/action-impl-summary-${PROJECT_NAME:-harb}-${ISSUE}.txt"
+PREFLIGHT_RESULT="/tmp/action-preflight-${ISSUE}.json"
+
 log() {
-  printf '[%s] #%s %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$ISSUE" "$*" >> "$LOGFILE"
+  printf '[%s] action#%s %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$ISSUE" "$*" >> "$LOGFILE"
 }
 
+notify() {
+  local thread_id=""
+  [ -f "${THREAD_FILE:-}" ] && thread_id=$(cat "$THREAD_FILE" 2>/dev/null || true)
+  matrix_send "action" "⚡ #${ISSUE}: $*" "${thread_id}" 2>/dev/null || true
+}
+
+notify_ctx() {
+  local plain="$1" html="$2" thread_id=""
+  [ -f "${THREAD_FILE:-}" ] && thread_id=$(cat "$THREAD_FILE" 2>/dev/null || true)
+  if [ -n "$thread_id" ]; then
+    matrix_send_ctx "action" "⚡ #${ISSUE}: ${plain}" "⚡ #${ISSUE}: ${html}" "${thread_id}" 2>/dev/null || true
+  else
+    matrix_send "action" "⚡ #${ISSUE}: ${plain}" "" "${ISSUE}" 2>/dev/null || true
+  fi
+}
+
+status() {
+  log "$*"
+}
+
+# --- Action-specific stubs for phase-handler.sh ---
+cleanup_worktree() { :; }  # action agent uses PROJECT_REPO_ROOT directly — no separate git worktree to remove
+cleanup_labels() { :; }    # action agent doesn't use in-progress labels
+
 # --- Concurrency lock (per issue) ---
 if [ -f "$LOCKFILE" ]; then
   LOCK_PID=$(cat "$LOCKFILE" 2>/dev/null || echo "")
@@ -45,6 +84,9 @@ echo $$ > "$LOCKFILE"
 cleanup() {
   rm -f "$LOCKFILE"
   agent_kill_session "$SESSION_NAME"
+  # Best-effort docker cleanup for containers started during this action
+  (cd "${PROJECT_REPO_ROOT}" 2>/dev/null && docker compose down 2>/dev/null) || true
+  rm -f "$PHASE_FILE" "$IMPL_SUMMARY_FILE" "$PREFLIGHT_RESULT"
 }
 trap cleanup EXIT
 
@@ -129,8 +171,11 @@ if [ -n "$THREAD_ID" ]; then
    are routed back to this session."
 fi
 
+# Build phase protocol from shared function (Path B covered in Instructions section above)
+PHASE_PROTOCOL_INSTRUCTIONS="$(build_phase_protocol_prompt "$PHASE_FILE" "$IMPL_SUMMARY_FILE" "$BRANCH")"
+
 INITIAL_PROMPT="You are an action agent. Your job is to execute the action formula
-in the issue below and then close the issue.
+in the issue below.
 
 ## Issue #${ISSUE}: ${ISSUE_TITLE}
 
@@ -153,24 +198,35 @@ ${PRIOR_SECTION}## Instructions
    what you need, then wait. A human will reply and the reply will be injected
    into this session automatically.${THREAD_HINT}
 
-5. When all steps are complete, close issue #${ISSUE} with a summary:
-   curl -sf -X PATCH \\
-     -H \"Authorization: token \${CODEBERG_TOKEN}\" \\
-     -H 'Content-Type: application/json' \\
-     \"${CODEBERG_API}/issues/${ISSUE}\" \\
-     -d '{\"state\": \"closed\"}'
+### Path A: If this action produces code changes (e.g. config updates, baselines):
+   - Work in the project repo: cd ${PROJECT_REPO_ROOT}
+   - Create and switch to branch: git checkout -b ${BRANCH}
+   - Make your changes, commit, and push: git push origin ${BRANCH}
+   - Follow the phase protocol below — the orchestrator handles PR creation,
+     CI monitoring, and review injection.
 
-6. Environment variables available in your bash sessions:
+### Path B: If this action produces no code changes (investigation, report):
+   - Post results as a comment on issue #${ISSUE}.
+   - Close the issue:
+     curl -sf -X PATCH \\
+       -H \"Authorization: token \${CODEBERG_TOKEN}\" \\
+       -H 'Content-Type: application/json' \\
+       \"${CODEBERG_API}/issues/${ISSUE}\" \\
+       -d '{\"state\": \"closed\"}'
+   - Signal completion: echo \"PHASE:done\" > \"${PHASE_FILE}\"
+
+5. Environment variables available in your bash sessions:
    CODEBERG_TOKEN, CODEBERG_API, CODEBERG_REPO, CODEBERG_WEB, PROJECT_NAME
    (all sourced from ${FACTORY_ROOT}/.env)
 
-**Important**: You do NOT need to create PRs or write a phase file. Just execute
-the formula steps, post comments, and close the issue when done. If the prior
-comments above show work already completed, resume from where it left off."
+If the prior comments above show work already completed, resume from where it
+left off.
+
+${PHASE_PROTOCOL_INSTRUCTIONS}"
 
 # --- Create tmux session ---
 log "creating tmux session: ${SESSION_NAME}"
-if ! create_agent_session "${SESSION_NAME}" "${FACTORY_ROOT}"; then
+if ! create_agent_session "${SESSION_NAME}" "${FACTORY_ROOT}" "${PHASE_FILE}"; then
   log "ERROR: failed to create tmux session"
   exit 1
 fi
@@ -182,31 +238,30 @@ log "initial prompt injected into session"
 matrix_send "action" "⚡ #${ISSUE}: session started — ${ISSUE_TITLE}" \
   "${THREAD_ID}" 2>/dev/null || true
 
-# --- Monitor session until Claude exits or idle timeout ---
-log "monitoring session: ${SESSION_NAME} (idle_timeout=${IDLE_TIMEOUT}s)"
-IDLE_ELAPSED=0
-POLL_INTERVAL=30
-IDLE_MARKER="/tmp/claude-idle-${SESSION_NAME}.ts"
+# --- Monitor phase loop (shared with dev-agent) ---
+status "monitoring phase: ${PHASE_FILE} (action agent)"
+monitor_phase_loop "$PHASE_FILE" "$IDLE_TIMEOUT" _on_phase_change "$SESSION_NAME"
 
-while tmux has-session -t "${SESSION_NAME}" 2>/dev/null; do
-  sleep "$POLL_INTERVAL"
-
-  # Use the Stop hook idle marker to distinguish active vs idle:
-  # marker exists → Claude finished responding and is at the prompt (idle)
-  # marker absent → Claude is mid-turn or hasn't started yet (active)
-  if [ -f "$IDLE_MARKER" ]; then
-    IDLE_ELAPSED=$((IDLE_ELAPSED + POLL_INTERVAL))
-  else
-    IDLE_ELAPSED=0
-  fi
-
-  if [ "$IDLE_ELAPSED" -ge "$IDLE_TIMEOUT" ]; then
-    log "idle timeout (${IDLE_TIMEOUT}s) — killing session for issue #${ISSUE}"
-    matrix_send "action" "⚠️ #${ISSUE}: session idle for $((IDLE_TIMEOUT / 3600))h — killed" \
-      "${THREAD_ID}" 2>/dev/null || true
-    agent_kill_session "${SESSION_NAME}"
-    break
-  fi
-done
+# Handle exit reason from monitor_phase_loop
+case "${_MONITOR_LOOP_EXIT:-}" in
+  idle_timeout)
+    notify_ctx \
+      "session idle for $((IDLE_TIMEOUT / 3600))h — killed" \
+      "session idle for $((IDLE_TIMEOUT / 3600))h — killed"
+    # Escalate to supervisor (idle_prompt already escalated via _on_phase_change callback)
+    echo "{\"issue\":${ISSUE},\"pr\":${PR_NUMBER:-0},\"reason\":\"idle_timeout\",\"ts\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"}" \
+      >> "${FACTORY_ROOT}/supervisor/escalations-${PROJECT_NAME}.jsonl"
+    rm -f "$PHASE_FILE" "$IMPL_SUMMARY_FILE" "$THREAD_FILE"
+    ;;
+  idle_prompt)
+    # Notification + escalation already handled by _on_phase_change(PHASE:failed) callback
+    rm -f "$PHASE_FILE" "$IMPL_SUMMARY_FILE" "$THREAD_FILE"
+    ;;
+  done)
+    # Belt-and-suspenders: callback handles primary cleanup,
+    # but ensure sentinel files are removed if callback was interrupted
+    rm -f "$PHASE_FILE" "$IMPL_SUMMARY_FILE" "$THREAD_FILE"
+    ;;
+esac
 
 log "action-agent finished for issue #${ISSUE}"
diff --git a/dev/phase-handler.sh b/dev/phase-handler.sh
index 45f5f6e..ae47e97 100644
--- a/dev/phase-handler.sh
+++ b/dev/phase-handler.sh
@@ -1,24 +1,97 @@
 #!/usr/bin/env bash
 # dev/phase-handler.sh — Phase callback functions for dev-agent.sh
 #
-# Source this file from dev-agent.sh after lib/agent-session.sh is loaded.
-# Defines: post_refusal_comment(), _on_phase_change()
+# Source this file from agent orchestrators after lib/agent-session.sh is loaded.
+# Defines: post_refusal_comment(), _on_phase_change(), build_phase_protocol_prompt()
 #
-# Required globals from dev-agent.sh:
+# Required globals (set by calling agent before or after sourcing):
 #   ISSUE, CODEBERG_TOKEN, API, CODEBERG_WEB, PROJECT_NAME, FACTORY_ROOT
-#   PR_NUMBER, BRANCH, PHASE_FILE, WORKTREE, IMPL_SUMMARY_FILE, THREAD_FILE
+#   BRANCH, PHASE_FILE, WORKTREE, IMPL_SUMMARY_FILE, THREAD_FILE
 #   PRIMARY_BRANCH, SESSION_NAME, LOGFILE, ISSUE_TITLE
-#   CI_POLL_TIMEOUT, MAX_CI_FIXES, MAX_REVIEW_ROUNDS, REVIEW_POLL_TIMEOUT
-#   CI_RETRY_COUNT, CI_FIX_COUNT, REVIEW_ROUND, CLAIMED
 #   WOODPECKER_REPO_ID, WOODPECKER_TOKEN, WOODPECKER_SERVER
 #
-# Calls back to dev-agent.sh-defined helpers:
-#   cleanup_worktree(), cleanup_labels()
+# Globals with defaults (agents can override after sourcing):
+#   PR_NUMBER, CI_POLL_TIMEOUT, MAX_CI_FIXES, MAX_REVIEW_ROUNDS,
+#   REVIEW_POLL_TIMEOUT, CI_RETRY_COUNT, CI_FIX_COUNT, REVIEW_ROUND,
+#   CLAIMED, PHASE_POLL_INTERVAL
+#
+# Calls back to agent-defined helpers:
+#   cleanup_worktree(), cleanup_labels(), notify(), notify_ctx(), status(), log()
 #
 # shellcheck shell=bash
 # shellcheck disable=SC2154  # globals are set in dev-agent.sh before calling
 # shellcheck disable=SC2034  # CLAIMED is read by cleanup() in dev-agent.sh
 
+# --- Default globals (agents can override after sourcing) ---
+: "${CI_POLL_TIMEOUT:=1800}"
+: "${REVIEW_POLL_TIMEOUT:=10800}"
+: "${MAX_CI_FIXES:=3}"
+: "${MAX_REVIEW_ROUNDS:=5}"
+: "${CI_RETRY_COUNT:=0}"
+: "${CI_FIX_COUNT:=0}"
+: "${REVIEW_ROUND:=0}"
+: "${PR_NUMBER:=}"
+: "${CLAIMED:=false}"
+: "${PHASE_POLL_INTERVAL:=30}"
+
+# --- Build phase protocol prompt (shared across agents) ---
+# Generates the phase-signaling instructions for Claude prompts.
+# Args: phase_file summary_file branch
+# Output: The protocol text (stdout)
+build_phase_protocol_prompt() {
+  local _pf="$1" _sf="$2" _br="$3"
+  cat <<_PHASE_PROTOCOL_EOF_
+## Phase-Signaling Protocol (REQUIRED)
+
+You are running in a persistent tmux session managed by an orchestrator.
+Communicate progress by writing to the phase file. The orchestrator watches
+this file and injects events (CI results, review feedback) back into this session.
+
+### Key files
+\`\`\`
+PHASE_FILE="${_pf}"
+SUMMARY_FILE="${_sf}"
+\`\`\`
+
+### Phase transitions — write these exactly:
+
+**After committing and pushing your branch:**
+\`\`\`bash
+git push origin ${_br}
+# Write a short summary of what you implemented:
+printf '%s' "<your summary>" > "\${SUMMARY_FILE}"
+# Signal the orchestrator to create the PR and watch for CI:
+echo "PHASE:awaiting_ci" > "${_pf}"
+\`\`\`
+Then STOP and wait. The orchestrator will inject CI results.
+
+**When you receive a "CI passed" injection:**
+\`\`\`bash
+echo "PHASE:awaiting_review" > "${_pf}"
+\`\`\`
+Then STOP and wait. The orchestrator will inject review feedback.
+
+**When you receive a "CI failed:" injection:**
+Fix the CI issue, commit, push, then:
+\`\`\`bash
+echo "PHASE:awaiting_ci" > "${_pf}"
+\`\`\`
+Then STOP and wait.
+
+**When you receive a "Review: REQUEST_CHANGES" injection:**
+Address ALL review feedback, commit, push, then:
+\`\`\`bash
+echo "PHASE:awaiting_ci" > "${_pf}"
+\`\`\`
+(CI runs again after each push — always write awaiting_ci, not awaiting_review)
+
+**On unrecoverable failure:**
+\`\`\`bash
+printf 'PHASE:failed\nReason: %s\n' "describe what failed" > "${_pf}"
+\`\`\`
+_PHASE_PROTOCOL_EOF_
+}
+
 # --- Merge helper ---
 # do_merge — attempt to merge PR via Codeberg API.
 # Args: pr_num
@@ -489,12 +562,17 @@ Instructions:
   # ── PHASE: done ─────────────────────────────────────────────────────────────
   # PR merged and issue closed (by orchestrator or Claude). Just clean up local state.
   elif [ "$phase" = "PHASE:done" ]; then
-    status "phase done — PR #${PR_NUMBER:-?} merged, cleaning up"
-
-    # Notify Matrix (issue already closed and labels removed via API)
-    notify_ctx \
-      "✅ PR #${PR_NUMBER:-?} merged! Issue #${ISSUE} done." \
-      "✅ PR <a href='${CODEBERG_WEB}/pulls/${PR_NUMBER:-?}'>#${PR_NUMBER:-?}</a> merged! <a href='${CODEBERG_WEB}/issues/${ISSUE}'>Issue #${ISSUE}</a> done."
+    if [ -n "${PR_NUMBER:-}" ]; then
+      status "phase done — PR #${PR_NUMBER} merged, cleaning up"
+      notify_ctx \
+        "✅ PR #${PR_NUMBER} merged! Issue #${ISSUE} done." \
+        "✅ PR <a href='${CODEBERG_WEB}/pulls/${PR_NUMBER}'>#${PR_NUMBER}</a> merged! <a href='${CODEBERG_WEB}/issues/${ISSUE}'>Issue #${ISSUE}</a> done."
+    else
+      status "phase done — issue #${ISSUE} complete, cleaning up"
+      notify_ctx \
+        "✅ Issue #${ISSUE} done." \
+        "✅ <a href='${CODEBERG_WEB}/issues/${ISSUE}'>Issue #${ISSUE}</a> done."
+    fi
 
     # Belt-and-suspenders: ensure in-progress label removed (idempotent)
     cleanup_labels