From 6d9d027d5ec0110ad97d507267ce35144df2172a Mon Sep 17 00:00:00 2001 From: openhands Date: Fri, 20 Mar 2026 13:40:09 +0000 Subject: [PATCH 1/2] =?UTF-8?q?fix:=20planner=20runs=20directly=20from=20c?= =?UTF-8?q?ron=20=E2=80=94=20no=20action=20issues=20(#359)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename planner-poll.sh → planner-run.sh: direct cron executor that creates a tmux session with Claude (opus), injects the formula as context, monitors phase file, handles crash recovery and cleanup. No action issues, no action-poll dependency. - Source disinto project config explicitly (projects/disinto.toml) instead of defaulting to harb via env.sh. - Update formulas/run-planner.toml (v2): remove agents-update step (now handled by gardener per #246), add journal-and-memory step (daily journal entries committed to git + local MEMORY.md update), add commit-and-pr step (one commit, one PR per run). - Create planner/journal/ directory for daily raw logs. - Update crontab: weekly Sunday 6AM call to planner-run.sh. - Update AGENTS.md to reflect new architecture. Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 45 ++++++---- formulas/run-planner.toml | 157 +++++++++++++++++++-------------- planner/journal/.gitkeep | 0 planner/planner-poll.sh | 73 ---------------- planner/planner-run.sh | 180 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 297 insertions(+), 158 deletions(-) create mode 100644 planner/journal/.gitkeep delete mode 100755 planner/planner-poll.sh create mode 100755 planner/planner-run.sh diff --git a/AGENTS.md b/AGENTS.md index c1fe444..646752e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -18,7 +18,8 @@ disinto/ ├── review/ review-poll.sh, review-pr.sh — PR review ├── gardener/ gardener-run.sh — files action issue for run-gardener formula │ gardener-poll.sh, gardener-agent.sh — recipe engine + grooming -├── planner/ planner-poll.sh — files action issue for run-planner formula +├── planner/ planner-run.sh — direct cron executor for run-planner formula +│ planner/journal/ — daily raw logs from each planner run │ prediction-poll.sh, prediction-agent.sh — evidence-based predictions ├── supervisor/ supervisor-poll.sh — health monitoring ├── vault/ vault-poll.sh, vault-agent.sh, vault-fire.sh — action gating @@ -154,36 +155,42 @@ P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping). ### Planner (`planner/`) -**Role**: Five-phase strategic planning, executed as an action formula. +**Role**: Strategic planning, executed directly from cron via tmux + Claude. Phase 0 (preflight): pull latest code, load persistent memory from -`planner/MEMORY.md`. Phase 1: update the AGENTS.md documentation tree to -reflect recent code changes (fast-track PR). Phase 1.5: triage +`planner/MEMORY.md`. Phase 1 (prediction-triage): triage `prediction/unreviewed` issues filed by the [Predictor](#predictor-planner) — for each prediction: promote to action, promote to backlog, watch (relabel to prediction/backlog), or dismiss with reasoning. Promoted predictions compete -with vision gaps for the per-cycle issue limit. Phase 2: strategic planning -via resource+leverage gap analysis — reasons about VISION.md, RESOURCES.md, +with vision gaps for the per-cycle issue limit. Phase 2 (strategic-planning): +resource+leverage gap analysis — reasons about VISION.md, RESOURCES.md, formula catalog, and project state to create up to 5 total issues (including -promotions) prioritized by leverage. Phase 3: persist learnings to -`planner/MEMORY.md`. +promotions) prioritized by leverage. Phase 3 (journal-and-memory): write +daily journal entry (committed to git) and update `planner/MEMORY.md` +(gitignored, local only). Phase 4 (commit-and-pr): one commit with all file +changes, push, create PR. AGENTS.md maintenance is handled by the +[Gardener](#gardener-gardener). -**Trigger**: `planner-poll.sh` runs weekly via cron. It files an `action` -issue referencing `formulas/run-planner.toml`; the [action-agent](#action-action) -picks it up and executes the planning steps in an interactive Claude tmux session. +**Trigger**: `planner-run.sh` runs weekly via cron. It creates a tmux session +with `claude --model opus`, injects `formulas/run-planner.toml` as context, +monitors the phase file, and cleans up on completion or timeout. No action +issues — the planner is a nervous system component, not work. **Key files**: -- `planner/planner-poll.sh` — Cron wrapper: memory guard, dedup check, files action issue -- `formulas/run-planner.toml` — Execution spec: five steps (preflight, agents-update, - triage-predictions, strategic-planning, memory-update) with `needs` dependencies. - Steps 2 and 3 are independent; step 4 depends on both. Claude executes all steps - in a single interactive session with tool access +- `planner/planner-run.sh` — Cron wrapper + orchestrator: lock, memory guard, + sources disinto project config, creates tmux session, injects formula prompt, + monitors phase file, handles crash recovery, cleans up +- `formulas/run-planner.toml` — Execution spec: five steps (preflight, + prediction-triage, strategic-planning, journal-and-memory, commit-and-pr) + with `needs` dependencies. Claude executes all steps in a single interactive + session with tool access - `planner/MEMORY.md` — Persistent memory across runs (gitignored, local only) +- `planner/journal/*.md` — Daily raw logs from each planner run (committed to git) **Future direction**: The [Predictor](#predictor-planner) already reads `evidence/` JSON and files prediction issues for the planner to triage. The next step is evidence-gated deployment (see `docs/EVIDENCE-ARCHITECTURE.md`): replacing human "ship it" decisions with automated gates across dimensions (holdout, red-team, user-test, evolution fitness, protocol metrics, funnel). Not yet implemented. -**Environment variables consumed** (by the action-agent session): +**Environment variables consumed**: - `CODEBERG_TOKEN`, `CODEBERG_REPO`, `CODEBERG_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT` -- `PRIMARY_BRANCH` +- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to opus by planner-run.sh) - `MATRIX_TOKEN`, `MATRIX_ROOM_ID`, `MATRIX_HOMESERVER` ### Predictor (`planner/`) @@ -284,7 +291,7 @@ sourced as needed. | `lib/load-project.sh` | Parses a `projects/*.toml` file into env vars (`PROJECT_NAME`, `CODEBERG_REPO`, `WOODPECKER_REPO_ID`, monitoring toggles, Matrix config, etc.). | env.sh (when `PROJECT_TOML` is set), supervisor-poll (per-project iteration) | | `lib/parse-deps.sh` | Extracts dependency issue numbers from an issue body (stdin → stdout, one number per line). Matches `## Dependencies` / `## Depends on` / `## Blocked by` sections and inline `depends on #N` patterns. Not sourced — executed via `bash lib/parse-deps.sh`. | dev-poll, supervisor-poll | | `lib/matrix_listener.sh` | Long-poll Matrix sync daemon. Dispatches thread replies to the correct agent via well-known files (`/tmp/{agent}-escalation-reply`). Handles supervisor, gardener, dev, review, vault, and action reply routing. Run as systemd service. | Standalone daemon | -| `lib/file-action-issue.sh` | `file_action_issue()` — dedup check, label lookup, and issue creation for formula-driven cron wrappers. Sets `FILED_ISSUE_NUM` on success. | gardener-run.sh, planner-poll.sh | +| `lib/file-action-issue.sh` | `file_action_issue()` — dedup check, label lookup, and issue creation for formula-driven cron wrappers. Sets `FILED_ISSUE_NUM` on success. | gardener-run.sh | | `lib/agent-session.sh` | Shared tmux + Claude session helpers: `create_agent_session()`, `inject_formula()`, `agent_wait_for_claude_ready()`, `agent_inject_into_session()`, `agent_kill_session()`, `monitor_phase_loop()`, `read_phase()`. `create_agent_session(session, workdir, [phase_file])` optionally installs a PostToolUse hook (matcher `Bash\|Write`) that detects phase file writes in real-time — when Claude writes to the phase file, the hook writes a marker so `monitor_phase_loop` reacts on the next poll instead of waiting for mtime changes. Also installs a StopFailure hook (matcher `rate_limit\|server_error\|authentication_failed\|billing_error`) that writes `PHASE:failed` with an `api_error` reason to the phase file and touches the phase-changed marker, so the orchestrator discovers API errors within one poll cycle instead of waiting for idle timeout. When `MATRIX_THREAD_ID` is exported, also installs a Stop hook (`on-stop-matrix.sh`) that streams each Claude response to the Matrix thread. `monitor_phase_loop` sets `_MONITOR_LOOP_EXIT` to one of: `done`, `idle_timeout`, `idle_prompt` (Claude returned to `❯` for 3 consecutive polls without writing any phase — callback invoked with `PHASE:failed`, session already dead), `crashed`, or a `PHASE:*` string. Agents must handle `idle_prompt` in both their callback and their post-loop exit handler. | dev-agent.sh, gardener-agent.sh, action-agent.sh | --- diff --git a/formulas/run-planner.toml b/formulas/run-planner.toml index 137029e..2391ac6 100644 --- a/formulas/run-planner.toml +++ b/formulas/run-planner.toml @@ -1,13 +1,18 @@ # formulas/run-planner.toml — Strategic planning formula # -# Executed by the action-agent via cron-filed action issues. -# planner-poll.sh files an action issue referencing this formula weekly; -# action-poll.sh picks it up and spawns a tmux session where Claude -# executes these steps autonomously. +# Executed directly by planner-run.sh via cron — no action issues. +# planner-run.sh creates a tmux session with Claude (opus) and injects +# this formula as context. Claude executes all steps autonomously. +# +# Steps: preflight → prediction-triage → strategic-planning +# → journal-and-memory → commit-and-pr +# +# AGENTS.md maintenance is handled by the gardener (#246). +# All git writes (journal entry) happen in one commit at the end. name = "run-planner" -description = "Strategic planning: update docs, triage predictions, resource+leverage gap analysis" -version = 1 +description = "Strategic planning: triage predictions, resource+leverage gap analysis, journal" +version = 2 model = "opus" [context] @@ -27,7 +32,7 @@ Set up the working environment for this planning run. git checkout "$PRIMARY_BRANCH" --quiet git pull --ff-only origin "$PRIMARY_BRANCH" --quiet -3. Record the current HEAD SHA — you will need it for AGENTS.md watermarks: +3. Record the current HEAD SHA: HEAD_SHA=$(git rev-parse HEAD) echo "$HEAD_SHA" > /tmp/planner-head-sha @@ -37,59 +42,7 @@ Set up the working environment for this planning run. """ [[steps]] -id = "agents-update" -title = "Update AGENTS.md documentation tree" -description = """ -Check all AGENTS.md files for staleness and update any that are outdated. - -1. Read the HEAD SHA from preflight: - HEAD_SHA=$(cat /tmp/planner-head-sha) - -2. Find all AGENTS.md files: - find "$PROJECT_REPO_ROOT" -name "AGENTS.md" -not -path "*/.git/*" - -3. For each file, read the watermark from line 1: - - -4. Check for changes since the watermark: - git log --oneline ..HEAD -- - If zero changes, the file is current — skip it. - -5. For stale files: - - Read the AGENTS.md and the source files in that directory - - Update the documentation to reflect code changes since the watermark - - Set the watermark to the HEAD SHA from the preflight step - - Conventions: max ~200 lines, architecture and WHY not implementation details - -6. If you made changes: - a. Create a branch: - git checkout -B "chore/planner-agents-$(date -u +%Y%m%d)" - b. Stage only AGENTS.md files: - find . -name "AGENTS.md" -not -path "./.git/*" -exec git add {} + - c. Commit: - git commit -m "chore: planner update AGENTS.md tree" - d. Push: - git push -f origin "chore/planner-agents-$(date -u +%Y%m%d)" - e. Create a PR (failure here is non-fatal — log and continue): - curl -sf -X POST \ - -H "Authorization: token $CODEBERG_TOKEN" \ - -H "Content-Type: application/json" \ - "$CODEBERG_API/pulls" \ - -d '{"title":"chore: planner update AGENTS.md tree", - "head":"","base":"", - "body":"Automated AGENTS.md update — review-agent fast-tracks doc-only PRs."}' - f. Return to primary branch: - git checkout "$PRIMARY_BRANCH" - -7. If no AGENTS.md files need updating, skip this step entirely. - -CRITICAL: If this step fails for any reason, log the failure and move on. -Do NOT let an AGENTS.md failure prevent prediction triage or strategic planning. -""" -needs = ["preflight"] - -[[steps]] -id = "triage-predictions" +id = "prediction-triage" title = "Triage prediction/unreviewed issues" description = """ Triage prediction issues filed by the predictor (goblin). @@ -213,7 +166,7 @@ Read these inputs: - Open issues (fetched via API) — what's already planned - $FACTORY_ROOT/metrics/supervisor-metrics.jsonl — operational trends (may not exist) - Planner memory (loaded in preflight) - - Promoted predictions from triage-predictions (these count toward the + - Promoted predictions from prediction-triage (these count toward the per-cycle issue limit — they compete with vision gaps for priority) Reason through these five questions: @@ -238,7 +191,7 @@ Reason through these five questions: Things that depend on blocked resources or aren't high-leverage right now. Do NOT create issues for these. -Then create up to 5 issues total (including promotions from triage-predictions), +Then create up to 5 issues total (including promotions from prediction-triage), prioritized by leverage: For formula-matching gaps, include YAML front matter in the body: @@ -271,13 +224,42 @@ Rules: If there are no gaps, note that the backlog is aligned with the vision. """ -needs = ["agents-update", "triage-predictions"] +needs = ["prediction-triage"] [[steps]] -id = "memory-update" -title = "Persist learnings to planner/MEMORY.md" +id = "journal-and-memory" +title = "Write journal entry and update planner memory" description = """ -Reflect on this planning run and write the updated memory file. +Two outputs from this step: + +### 1. Journal entry (committed to git) + +Create a daily journal file at: + $FACTORY_ROOT/planner/journal/$(date -u +%Y-%m-%d).md + +If the file already exists (multiple runs per day), append a new section +with a timestamp header. + +Format: + # Planner run — YYYY-MM-DD HH:MM UTC + + ## Predictions triaged + - #NNN: PROMOTE_ACTION/PROMOTE_BACKLOG/WATCH/DISMISS — reasoning + (or "No unreviewed predictions" if none) + + ## Issues created + - #NNN: title — why + (or "No new issues — backlog aligned with vision" if none) + + ## Observations + - Key patterns, resource state, metric trends noticed during this run + + ## Deferred + - Items considered but deferred, and why + +Keep each entry concise — 30-50 lines max. + +### 2. Memory update (gitignored, local only) Write to: $FACTORY_ROOT/planner/MEMORY.md (replace the entire file) @@ -298,3 +280,46 @@ Rules: Format: simple markdown with dated sections. """ needs = ["strategic-planning"] + +[[steps]] +id = "commit-and-pr" +title = "One commit with all file changes, push, create PR" +description = """ +Collect all file changes from this run into a single commit. +API calls (issue creation, prediction triage) already happened during the +run — only file changes (journal entries) need the PR. + +1. Check for staged or unstaged changes: + cd "$PROJECT_REPO_ROOT" + git status --porcelain + + If there are no file changes, skip this entire step — no commit, no PR. + +2. If there are changes: + a. Create a branch: + BRANCH="chore/planner-$(date -u +%Y%m%d-%H%M)" + git checkout -B "$BRANCH" + b. Stage journal entries: + git add planner/journal/ 2>/dev/null || true + c. Stage any other tracked files modified during the run: + git add -u + d. Check if there is anything to commit: + git diff --cached --quiet && echo "Nothing staged" && skip + e. Commit: + git commit -m "chore: planner run $(date -u +%Y-%m-%d)" + f. Push: + git push -u origin "$BRANCH" + g. Create a PR: + curl -sf -X POST \ + -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/pulls" \ + -d '{"title":"chore: planner run journal", + "head":"","base":"", + "body":"Automated planner run — journal entry from strategic planning session."}' + h. Return to primary branch: + git checkout "$PRIMARY_BRANCH" + +3. If the PR creation fails, log and continue — the journal is committed locally. +""" +needs = ["journal-and-memory"] diff --git a/planner/journal/.gitkeep b/planner/journal/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/planner/planner-poll.sh b/planner/planner-poll.sh deleted file mode 100755 index 510709b..0000000 --- a/planner/planner-poll.sh +++ /dev/null @@ -1,73 +0,0 @@ -#!/usr/bin/env bash -# ============================================================================= -# planner-poll.sh — Cron wrapper: files action issue for run-planner formula -# -# Runs weekly (or on-demand). Guards against concurrent runs and low memory. -# Files an action issue referencing formulas/run-planner.toml; the action-agent -# picks it up and executes the planning steps in an interactive Claude session. -# ============================================================================= -set -euo pipefail - -SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -FACTORY_ROOT="$(dirname "$SCRIPT_DIR")" - -# shellcheck source=../lib/env.sh -source "$FACTORY_ROOT/lib/env.sh" -# shellcheck source=../lib/file-action-issue.sh -source "$FACTORY_ROOT/lib/file-action-issue.sh" - -LOG_FILE="$SCRIPT_DIR/planner.log" -LOCK_FILE="/tmp/planner-poll.lock" - -log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%S)Z] $*" >> "$LOG_FILE"; } - -# ── Lock ────────────────────────────────────────────────────────────────── -if [ -f "$LOCK_FILE" ]; then - LOCK_PID=$(cat "$LOCK_FILE" 2>/dev/null || true) - if [ -n "$LOCK_PID" ] && kill -0 "$LOCK_PID" 2>/dev/null; then - log "poll: planner running (PID $LOCK_PID)" - exit 0 - fi - rm -f "$LOCK_FILE" -fi -echo $$ > "$LOCK_FILE" -trap 'rm -f "$LOCK_FILE"' EXIT - -# ── Memory guard ────────────────────────────────────────────────────────── -AVAIL_MB=$(free -m | awk '/Mem:/{print $7}') -if [ "${AVAIL_MB:-0}" -lt 2000 ]; then - log "poll: skipping — only ${AVAIL_MB}MB available (need 2000)" - exit 0 -fi - -log "--- Planner poll start ---" - -# ── File action issue for run-planner formula ───────────────────────────── -ISSUE_BODY="--- -formula: run-planner -model: opus ---- - -Periodic strategic planning run. The action-agent reads \`formulas/run-planner.toml\` -and executes the five phases: preflight, AGENTS.md update, prediction triage, -strategic planning (resource+leverage gap analysis), and memory update. - -Filed automatically by \`planner-poll.sh\`." - -_rc=0 -file_action_issue "run-planner" "action: run-planner — periodic strategic planning" "$ISSUE_BODY" || _rc=$? -case "$_rc" in - 0) ;; - 1) log "poll: open run-planner action issue already exists — skipping" - log "--- Planner poll done ---" - exit 0 ;; - 2) log "ERROR: 'action' label not found — cannot file planner issue" - exit 1 ;; - *) log "ERROR: failed to create action issue for run-planner" - exit 1 ;; -esac - -log "Filed action issue #${FILED_ISSUE_NUM} for run-planner formula" -matrix_send "planner" "Filed action #${FILED_ISSUE_NUM}: run-planner — periodic strategic planning" 2>/dev/null || true - -log "--- Planner poll done ---" diff --git a/planner/planner-run.sh b/planner/planner-run.sh new file mode 100755 index 0000000..dbc913f --- /dev/null +++ b/planner/planner-run.sh @@ -0,0 +1,180 @@ +#!/usr/bin/env bash +# ============================================================================= +# planner-run.sh — Cron wrapper: direct planner execution via Claude + formula +# +# Runs weekly (or on-demand). Guards against concurrent runs and low memory. +# Creates a tmux session with Claude (opus) reading formulas/run-planner.toml. +# No action issues — the planner is a nervous system component, not work. +# +# The planner plans for ALL projects (harb + disinto) but is itself disinto +# infrastructure — always sources projects/disinto.toml. +# ============================================================================= +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +FACTORY_ROOT="$(dirname "$SCRIPT_DIR")" + +# Source disinto project config — the planner is disinto infrastructure +export PROJECT_TOML="$FACTORY_ROOT/projects/disinto.toml" +# shellcheck source=../lib/env.sh +source "$FACTORY_ROOT/lib/env.sh" +# shellcheck source=../lib/agent-session.sh +source "$FACTORY_ROOT/lib/agent-session.sh" + +LOG_FILE="$SCRIPT_DIR/planner.log" +LOCK_FILE="/tmp/planner-run.lock" +SESSION_NAME="planner-${PROJECT_NAME}" +PHASE_FILE="/tmp/planner-session-${PROJECT_NAME}.phase" + +# shellcheck disable=SC2034 # read by monitor_phase_loop in lib/agent-session.sh +PHASE_POLL_INTERVAL=15 + +log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%S)Z] $*" >> "$LOG_FILE"; } + +# ── Lock ────────────────────────────────────────────────────────────────── +if [ -f "$LOCK_FILE" ]; then + LOCK_PID=$(cat "$LOCK_FILE" 2>/dev/null || true) + if [ -n "$LOCK_PID" ] && kill -0 "$LOCK_PID" 2>/dev/null; then + log "run: planner running (PID $LOCK_PID)" + exit 0 + fi + rm -f "$LOCK_FILE" +fi +echo $$ > "$LOCK_FILE" +trap 'rm -f "$LOCK_FILE"' EXIT + +# ── Memory guard ────────────────────────────────────────────────────────── +AVAIL_MB=$(free -m | awk '/Mem:/{print $7}') +if [ "${AVAIL_MB:-0}" -lt 2000 ]; then + log "run: skipping — only ${AVAIL_MB}MB available (need 2000)" + exit 0 +fi + +log "--- Planner run start ---" + +# ── Load formula ───────────────────────────────────────────────────────── +FORMULA_FILE="$FACTORY_ROOT/formulas/run-planner.toml" +if [ ! -f "$FORMULA_FILE" ]; then + log "ERROR: formula not found: $FORMULA_FILE" + exit 1 +fi +FORMULA_CONTENT=$(cat "$FORMULA_FILE") + +# ── Read context files ─────────────────────────────────────────────────── +CONTEXT_BLOCK="" +for ctx in VISION.md AGENTS.md RESOURCES.md; do + ctx_path="${PROJECT_REPO_ROOT}/${ctx}" + if [ -f "$ctx_path" ]; then + CONTEXT_BLOCK="${CONTEXT_BLOCK} +### ${ctx} +$(cat "$ctx_path") +" + fi +done + +# ── Read planner memory ───────────────────────────────────────────────── +MEMORY_BLOCK="" +MEMORY_FILE="$FACTORY_ROOT/planner/MEMORY.md" +if [ -f "$MEMORY_FILE" ]; then + MEMORY_BLOCK=" +### planner/MEMORY.md (persistent memory from prior runs) +$(cat "$MEMORY_FILE") +" +fi + +# ── Build prompt ───────────────────────────────────────────────────────── +PROMPT="You are the strategic planner for ${CODEBERG_REPO}. Work through the formula below. You MUST write PHASE:done to '${PHASE_FILE}' when finished — the orchestrator will time you out if you return to the prompt without signalling. + +## Project context +${CONTEXT_BLOCK}${MEMORY_BLOCK} + +## Formula +${FORMULA_CONTENT} + +## Codeberg API reference +Base URL: ${CODEBERG_API} +Auth header: -H \"Authorization: token \$CODEBERG_TOKEN\" + Read issue: curl -sf -H \"Authorization: token \$CODEBERG_TOKEN\" '${CODEBERG_API}/issues/{number}' | jq '.body' + Create issue: curl -sf -X POST -H \"Authorization: token \$CODEBERG_TOKEN\" -H 'Content-Type: application/json' '${CODEBERG_API}/issues' -d '{\"title\":\"...\",\"body\":\"...\",\"labels\":[LABEL_ID]}' + Relabel: curl -sf -H \"Authorization: token \$CODEBERG_TOKEN\" -X PUT -H 'Content-Type: application/json' '${CODEBERG_API}/issues/{number}/labels' -d '{\"labels\":[LABEL_ID]}' + Comment: curl -sf -H \"Authorization: token \$CODEBERG_TOKEN\" -X POST -H 'Content-Type: application/json' '${CODEBERG_API}/issues/{number}/comments' -d '{\"body\":\"...\"}' + Close: curl -sf -H \"Authorization: token \$CODEBERG_TOKEN\" -X PATCH -H 'Content-Type: application/json' '${CODEBERG_API}/issues/{number}' -d '{\"state\":\"closed\"}' + List labels: curl -sf -H \"Authorization: token \$CODEBERG_TOKEN\" '${CODEBERG_API}/labels' +NEVER echo or include the actual token value in output — always reference \$CODEBERG_TOKEN. + +## Environment +FACTORY_ROOT=${FACTORY_ROOT} +PROJECT_REPO_ROOT=${PROJECT_REPO_ROOT} +PRIMARY_BRANCH=${PRIMARY_BRANCH} + +## Phase protocol (REQUIRED) +When all work is done: + echo 'PHASE:done' > '${PHASE_FILE}' +On unrecoverable error: + printf 'PHASE:failed\nReason: %s\n' 'describe error' > '${PHASE_FILE}'" + +# ── Reset phase file + kill stale session ──────────────────────────────── +agent_kill_session "$SESSION_NAME" +rm -f "$PHASE_FILE" + +# ── Create tmux session ───────────────────────────────────────────────── +log "Creating tmux session: ${SESSION_NAME}" +export CLAUDE_MODEL="opus" +if ! create_agent_session "$SESSION_NAME" "$PROJECT_REPO_ROOT" "$PHASE_FILE"; then + log "ERROR: failed to create tmux session ${SESSION_NAME}" + exit 1 +fi + +agent_inject_into_session "$SESSION_NAME" "$PROMPT" +log "Prompt sent to tmux session" +matrix_send "planner" "Planner session started for ${CODEBERG_REPO}" 2>/dev/null || true + +# ── Phase monitoring loop ──────────────────────────────────────────────── +log "Monitoring phase file: ${PHASE_FILE}" +PLANNER_CRASH_COUNT=0 + +planner_phase_callback() { + local phase="$1" + log "phase: ${phase}" + case "$phase" in + PHASE:crashed) + if [ "$PLANNER_CRASH_COUNT" -gt 0 ]; then + log "ERROR: session crashed again after recovery — giving up" + return 0 + fi + PLANNER_CRASH_COUNT=$((PLANNER_CRASH_COUNT + 1)) + log "WARNING: tmux session died unexpectedly — attempting recovery" + if create_agent_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROJECT_REPO_ROOT" "$PHASE_FILE" 2>/dev/null; then + agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROMPT" + log "Recovery session started" + else + log "ERROR: could not restart session after crash" + fi + ;; + PHASE:done|PHASE:failed|PHASE:needs_human|PHASE:merged) + agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" + ;; + esac +} + +monitor_phase_loop "$PHASE_FILE" 7200 "planner_phase_callback" + +FINAL_PHASE=$(read_phase "$PHASE_FILE") +log "Final phase: ${FINAL_PHASE:-none}" + +if [ "$FINAL_PHASE" != "PHASE:done" ]; then + case "${_MONITOR_LOOP_EXIT:-}" in + idle_prompt) + log "planner: Claude returned to prompt without writing phase signal" + ;; + idle_timeout) + log "planner: timed out after 2h with no phase signal" + ;; + *) + log "planner finished without PHASE:done (phase: ${FINAL_PHASE:-none}, exit: ${_MONITOR_LOOP_EXIT:-})" + ;; + esac +fi + +matrix_send "planner" "Planner session finished (${FINAL_PHASE:-no phase})" 2>/dev/null || true +log "--- Planner run done ---" From 5bac4a84090fbd558611b9bc299558aa4ee049bd Mon Sep 17 00:00:00 2001 From: openhands Date: Fri, 20 Mar 2026 13:53:33 +0000 Subject: [PATCH 2/2] fix: extract lib/formula-session.sh to eliminate duplicate code blocks Shared helpers for formula-driven cron agents: lock, memory guard, formula loading, context building, session startup, crash recovery. - planner-run.sh uses shared helpers instead of inline code - gardener-agent.sh delegates crash recovery to formula_phase_callback - agent-smoke.sh updated for renamed planner script + new lib file Co-Authored-By: Claude Opus 4.6 (1M context) --- .woodpecker/agent-smoke.sh | 4 +- AGENTS.md | 1 + gardener/gardener-agent.sh | 33 +++------- lib/formula-session.sh | 125 +++++++++++++++++++++++++++++++++++++ planner/planner-run.sh | 82 ++++-------------------- 5 files changed, 148 insertions(+), 97 deletions(-) create mode 100644 lib/formula-session.sh diff --git a/.woodpecker/agent-smoke.sh b/.woodpecker/agent-smoke.sh index 40d95b8..a6a1bda 100644 --- a/.woodpecker/agent-smoke.sh +++ b/.woodpecker/agent-smoke.sh @@ -91,7 +91,7 @@ echo "=== 2/2 Function resolution ===" # Functions provided by shared lib files (available to all agent scripts via source) LIB_FUNS=$( - for f in lib/agent-session.sh lib/env.sh lib/ci-helpers.sh lib/load-project.sh lib/file-action-issue.sh; do + for f in lib/agent-session.sh lib/env.sh lib/ci-helpers.sh lib/load-project.sh lib/file-action-issue.sh lib/formula-session.sh; do if [ -f "$f" ]; then get_fns "$f"; fi done | sort -u ) @@ -162,7 +162,7 @@ check_script gardener/gardener-poll.sh check_script gardener/gardener-run.sh check_script review/review-pr.sh check_script review/review-poll.sh -check_script planner/planner-poll.sh +check_script planner/planner-run.sh check_script supervisor/supervisor-poll.sh check_script supervisor/update-prompt.sh check_script vault/vault-agent.sh diff --git a/AGENTS.md b/AGENTS.md index 646752e..d9d9249 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -291,6 +291,7 @@ sourced as needed. | `lib/load-project.sh` | Parses a `projects/*.toml` file into env vars (`PROJECT_NAME`, `CODEBERG_REPO`, `WOODPECKER_REPO_ID`, monitoring toggles, Matrix config, etc.). | env.sh (when `PROJECT_TOML` is set), supervisor-poll (per-project iteration) | | `lib/parse-deps.sh` | Extracts dependency issue numbers from an issue body (stdin → stdout, one number per line). Matches `## Dependencies` / `## Depends on` / `## Blocked by` sections and inline `depends on #N` patterns. Not sourced — executed via `bash lib/parse-deps.sh`. | dev-poll, supervisor-poll | | `lib/matrix_listener.sh` | Long-poll Matrix sync daemon. Dispatches thread replies to the correct agent via well-known files (`/tmp/{agent}-escalation-reply`). Handles supervisor, gardener, dev, review, vault, and action reply routing. Run as systemd service. | Standalone daemon | +| `lib/formula-session.sh` | `acquire_cron_lock()`, `check_memory()`, `load_formula()`, `build_context_block()`, `start_formula_session()`, `formula_phase_callback()` — shared helpers for formula-driven cron agents (lock, memory guard, formula loading, tmux session, crash recovery). | planner-run.sh | | `lib/file-action-issue.sh` | `file_action_issue()` — dedup check, label lookup, and issue creation for formula-driven cron wrappers. Sets `FILED_ISSUE_NUM` on success. | gardener-run.sh | | `lib/agent-session.sh` | Shared tmux + Claude session helpers: `create_agent_session()`, `inject_formula()`, `agent_wait_for_claude_ready()`, `agent_inject_into_session()`, `agent_kill_session()`, `monitor_phase_loop()`, `read_phase()`. `create_agent_session(session, workdir, [phase_file])` optionally installs a PostToolUse hook (matcher `Bash\|Write`) that detects phase file writes in real-time — when Claude writes to the phase file, the hook writes a marker so `monitor_phase_loop` reacts on the next poll instead of waiting for mtime changes. Also installs a StopFailure hook (matcher `rate_limit\|server_error\|authentication_failed\|billing_error`) that writes `PHASE:failed` with an `api_error` reason to the phase file and touches the phase-changed marker, so the orchestrator discovers API errors within one poll cycle instead of waiting for idle timeout. When `MATRIX_THREAD_ID` is exported, also installs a Stop hook (`on-stop-matrix.sh`) that streams each Claude response to the Matrix thread. `monitor_phase_loop` sets `_MONITOR_LOOP_EXIT` to one of: `done`, `idle_timeout`, `idle_prompt` (Claude returned to `❯` for 3 consecutive polls without writing any phase — callback invoked with `PHASE:failed`, session already dead), `crashed`, or a `PHASE:*` string. Agents must handle `idle_prompt` in both their callback and their post-loop exit handler. | dev-agent.sh, gardener-agent.sh, action-agent.sh | diff --git a/gardener/gardener-agent.sh b/gardener/gardener-agent.sh index 3a3f658..699e8be 100644 --- a/gardener/gardener-agent.sh +++ b/gardener/gardener-agent.sh @@ -29,6 +29,8 @@ export PROJECT_TOML="${1:-}" source "$FACTORY_ROOT/lib/env.sh" # shellcheck source=../lib/agent-session.sh source "$FACTORY_ROOT/lib/agent-session.sh" +# shellcheck source=../lib/formula-session.sh +source "$FACTORY_ROOT/lib/formula-session.sh" LOG_FILE="$SCRIPT_DIR/gardener.log" SESSION_NAME="gardener-${PROJECT_NAME}" @@ -275,32 +277,15 @@ matrix_send "gardener" "🌱 Gardener session started for ${CODEBERG_REPO}" 2>/d # ── Phase monitoring loop ───────────────────────────────────────────────── log "Monitoring phase file: ${PHASE_FILE}" -GARDENER_CRASH_COUNT=0 +_FORMULA_CRASH_COUNT=0 gardener_phase_callback() { - local phase="$1" - log "phase: ${phase}" - case "$phase" in - PHASE:crashed) - if [ "$GARDENER_CRASH_COUNT" -gt 0 ]; then - log "ERROR: session crashed again after recovery — giving up" - return 0 - fi - GARDENER_CRASH_COUNT=$((GARDENER_CRASH_COUNT + 1)) - log "WARNING: tmux session died unexpectedly — attempting recovery" - rm -f "$RESULT_FILE" - touch "$RESULT_FILE" - if create_agent_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROJECT_REPO_ROOT" "$PHASE_FILE" 2>/dev/null; then - agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROMPT" - log "Recovery session started" - else - log "ERROR: could not restart session after crash" - fi - ;; - PHASE:done|PHASE:failed|PHASE:needs_human|PHASE:merged) - agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" - ;; - esac + # Gardener-specific cleanup before shared crash recovery + if [ "$1" = "PHASE:crashed" ]; then + rm -f "$RESULT_FILE" + touch "$RESULT_FILE" + fi + formula_phase_callback "$1" } monitor_phase_loop "$PHASE_FILE" 7200 "gardener_phase_callback" diff --git a/lib/formula-session.sh b/lib/formula-session.sh new file mode 100644 index 0000000..3d73983 --- /dev/null +++ b/lib/formula-session.sh @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# formula-session.sh — Shared helpers for formula-driven cron agents +# +# Provides reusable functions for the common cron-wrapper + tmux-session +# pattern used by planner-run.sh and gardener-agent.sh. +# +# Functions: +# acquire_cron_lock LOCK_FILE — PID lock with stale cleanup +# check_memory [MIN_MB] — skip if available RAM too low +# load_formula FORMULA_FILE — sets FORMULA_CONTENT +# build_context_block FILE [FILE ...] — sets CONTEXT_BLOCK +# start_formula_session SESSION WORKDIR PHASE_FILE — create tmux + claude +# formula_phase_callback PHASE — standard crash-recovery callback +# +# Requires: lib/agent-session.sh sourced first (for create_agent_session, +# agent_kill_session, agent_inject_into_session). +# Globals used by formula_phase_callback: SESSION_NAME, PHASE_FILE, +# PROJECT_REPO_ROOT, PROMPT (set by the calling script). + +# ── Cron guards ────────────────────────────────────────────────────────── + +# acquire_cron_lock LOCK_FILE +# Acquires a PID lock. Exits 0 if another instance is running. +# Sets an EXIT trap to clean up the lock file. +acquire_cron_lock() { + _CRON_LOCK_FILE="$1" + if [ -f "$_CRON_LOCK_FILE" ]; then + local lock_pid + lock_pid=$(cat "$_CRON_LOCK_FILE" 2>/dev/null || true) + if [ -n "$lock_pid" ] && kill -0 "$lock_pid" 2>/dev/null; then + log "run: already running (PID $lock_pid)" + exit 0 + fi + rm -f "$_CRON_LOCK_FILE" + fi + echo $$ > "$_CRON_LOCK_FILE" + trap 'rm -f "$_CRON_LOCK_FILE"' EXIT +} + +# check_memory [MIN_MB] +# Exits 0 (skip) if available memory is below MIN_MB (default 2000). +check_memory() { + local min_mb="${1:-2000}" + local avail_mb + avail_mb=$(free -m | awk '/Mem:/{print $7}') + if [ "${avail_mb:-0}" -lt "$min_mb" ]; then + log "run: skipping — only ${avail_mb}MB available (need ${min_mb})" + exit 0 + fi +} + +# ── Formula loading ────────────────────────────────────────────────────── + +# load_formula FORMULA_FILE +# Reads formula TOML into FORMULA_CONTENT. Exits 1 if missing. +load_formula() { + local formula_file="$1" + if [ ! -f "$formula_file" ]; then + log "ERROR: formula not found: $formula_file" + exit 1 + fi + # shellcheck disable=SC2034 # consumed by the calling script + FORMULA_CONTENT=$(cat "$formula_file") +} + +# build_context_block FILE [FILE ...] +# Reads each file from $PROJECT_REPO_ROOT and builds CONTEXT_BLOCK. +build_context_block() { + CONTEXT_BLOCK="" + local ctx ctx_path + for ctx in "$@"; do + ctx_path="${PROJECT_REPO_ROOT}/${ctx}" + if [ -f "$ctx_path" ]; then + CONTEXT_BLOCK="${CONTEXT_BLOCK} +### ${ctx} +$(cat "$ctx_path") +" + fi + done +} + +# ── Session management ─────────────────────────────────────────────────── + +# start_formula_session SESSION WORKDIR PHASE_FILE +# Kills stale session, resets phase file, creates new tmux + claude session. +# Returns 0 on success, 1 on failure. +start_formula_session() { + local session="$1" workdir="$2" phase_file="$3" + agent_kill_session "$session" + rm -f "$phase_file" + log "Creating tmux session: ${session}" + if ! create_agent_session "$session" "$workdir" "$phase_file"; then + log "ERROR: failed to create tmux session ${session}" + return 1 + fi +} + +# formula_phase_callback PHASE +# Standard crash-recovery phase callback for formula sessions. +# Requires globals: SESSION_NAME, PHASE_FILE, PROJECT_REPO_ROOT, PROMPT. +# Uses _FORMULA_CRASH_COUNT (auto-initialized) for single-retry limit. +# shellcheck disable=SC2154 # SESSION_NAME, PHASE_FILE, PROJECT_REPO_ROOT, PROMPT set by caller +formula_phase_callback() { + local phase="$1" + log "phase: ${phase}" + case "$phase" in + PHASE:crashed) + if [ "${_FORMULA_CRASH_COUNT:-0}" -gt 0 ]; then + log "ERROR: session crashed again after recovery — giving up" + return 0 + fi + _FORMULA_CRASH_COUNT=$(( ${_FORMULA_CRASH_COUNT:-0} + 1 )) + log "WARNING: tmux session died unexpectedly — attempting recovery" + if create_agent_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROJECT_REPO_ROOT" "$PHASE_FILE" 2>/dev/null; then + agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROMPT" + log "Recovery session started" + else + log "ERROR: could not restart session after crash" + fi + ;; + PHASE:done|PHASE:failed|PHASE:needs_human|PHASE:merged) + agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" + ;; + esac +} diff --git a/planner/planner-run.sh b/planner/planner-run.sh index dbc913f..2ad01cc 100755 --- a/planner/planner-run.sh +++ b/planner/planner-run.sh @@ -20,9 +20,10 @@ export PROJECT_TOML="$FACTORY_ROOT/projects/disinto.toml" source "$FACTORY_ROOT/lib/env.sh" # shellcheck source=../lib/agent-session.sh source "$FACTORY_ROOT/lib/agent-session.sh" +# shellcheck source=../lib/formula-session.sh +source "$FACTORY_ROOT/lib/formula-session.sh" LOG_FILE="$SCRIPT_DIR/planner.log" -LOCK_FILE="/tmp/planner-run.lock" SESSION_NAME="planner-${PROJECT_NAME}" PHASE_FILE="/tmp/planner-session-${PROJECT_NAME}.phase" @@ -31,46 +32,15 @@ PHASE_POLL_INTERVAL=15 log() { echo "[$(date -u +%Y-%m-%dT%H:%M:%S)Z] $*" >> "$LOG_FILE"; } -# ── Lock ────────────────────────────────────────────────────────────────── -if [ -f "$LOCK_FILE" ]; then - LOCK_PID=$(cat "$LOCK_FILE" 2>/dev/null || true) - if [ -n "$LOCK_PID" ] && kill -0 "$LOCK_PID" 2>/dev/null; then - log "run: planner running (PID $LOCK_PID)" - exit 0 - fi - rm -f "$LOCK_FILE" -fi -echo $$ > "$LOCK_FILE" -trap 'rm -f "$LOCK_FILE"' EXIT - -# ── Memory guard ────────────────────────────────────────────────────────── -AVAIL_MB=$(free -m | awk '/Mem:/{print $7}') -if [ "${AVAIL_MB:-0}" -lt 2000 ]; then - log "run: skipping — only ${AVAIL_MB}MB available (need 2000)" - exit 0 -fi +# ── Guards ──────────────────────────────────────────────────────────────── +acquire_cron_lock "/tmp/planner-run.lock" +check_memory 2000 log "--- Planner run start ---" -# ── Load formula ───────────────────────────────────────────────────────── -FORMULA_FILE="$FACTORY_ROOT/formulas/run-planner.toml" -if [ ! -f "$FORMULA_FILE" ]; then - log "ERROR: formula not found: $FORMULA_FILE" - exit 1 -fi -FORMULA_CONTENT=$(cat "$FORMULA_FILE") - -# ── Read context files ─────────────────────────────────────────────────── -CONTEXT_BLOCK="" -for ctx in VISION.md AGENTS.md RESOURCES.md; do - ctx_path="${PROJECT_REPO_ROOT}/${ctx}" - if [ -f "$ctx_path" ]; then - CONTEXT_BLOCK="${CONTEXT_BLOCK} -### ${ctx} -$(cat "$ctx_path") -" - fi -done +# ── Load formula + context ─────────────────────────────────────────────── +load_formula "$FACTORY_ROOT/formulas/run-planner.toml" +build_context_block VISION.md AGENTS.md RESOURCES.md # ── Read planner memory ───────────────────────────────────────────────── MEMORY_BLOCK="" @@ -113,15 +83,9 @@ When all work is done: On unrecoverable error: printf 'PHASE:failed\nReason: %s\n' 'describe error' > '${PHASE_FILE}'" -# ── Reset phase file + kill stale session ──────────────────────────────── -agent_kill_session "$SESSION_NAME" -rm -f "$PHASE_FILE" - # ── Create tmux session ───────────────────────────────────────────────── -log "Creating tmux session: ${SESSION_NAME}" export CLAUDE_MODEL="opus" -if ! create_agent_session "$SESSION_NAME" "$PROJECT_REPO_ROOT" "$PHASE_FILE"; then - log "ERROR: failed to create tmux session ${SESSION_NAME}" +if ! start_formula_session "$SESSION_NAME" "$PROJECT_REPO_ROOT" "$PHASE_FILE"; then exit 1 fi @@ -131,33 +95,9 @@ matrix_send "planner" "Planner session started for ${CODEBERG_REPO}" 2>/dev/null # ── Phase monitoring loop ──────────────────────────────────────────────── log "Monitoring phase file: ${PHASE_FILE}" -PLANNER_CRASH_COUNT=0 +_FORMULA_CRASH_COUNT=0 -planner_phase_callback() { - local phase="$1" - log "phase: ${phase}" - case "$phase" in - PHASE:crashed) - if [ "$PLANNER_CRASH_COUNT" -gt 0 ]; then - log "ERROR: session crashed again after recovery — giving up" - return 0 - fi - PLANNER_CRASH_COUNT=$((PLANNER_CRASH_COUNT + 1)) - log "WARNING: tmux session died unexpectedly — attempting recovery" - if create_agent_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROJECT_REPO_ROOT" "$PHASE_FILE" 2>/dev/null; then - agent_inject_into_session "${_MONITOR_SESSION:-$SESSION_NAME}" "$PROMPT" - log "Recovery session started" - else - log "ERROR: could not restart session after crash" - fi - ;; - PHASE:done|PHASE:failed|PHASE:needs_human|PHASE:merged) - agent_kill_session "${_MONITOR_SESSION:-$SESSION_NAME}" - ;; - esac -} - -monitor_phase_loop "$PHASE_FILE" 7200 "planner_phase_callback" +monitor_phase_loop "$PHASE_FILE" 7200 "formula_phase_callback" FINAL_PHASE=$(read_phase "$PHASE_FILE") log "Final phase: ${FINAL_PHASE:-none}"