Look up the backlog label ID via the Gitea labels API (with fallback to
1300815) and replace '{"labels":["backlog"]}' with the integer ID form
at both call sites (cleanup() line 135 and idle_timeout handler line 713).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Look up backlog label ID via Codeberg API at the start of the PHASE:failed
branch and replace '{"labels":["backlog"]}' at lines 547 and 628 with
the numeric ID, matching the pattern already used in gardener.
- Fix set -e bug: use `_merge_rc=0; do_merge ... || _merge_rc=$?` so non-zero
returns don't kill the agent before _merge_rc is captured
- Fix sentinel path: skip sentinel break for APPROVE so do_merge() always runs,
even when review-poll.sh injected the verdict first
- Fix fragile grep: match HTTP 405 alone instead of `grep -qi "not enough"` —
any 405 from the merge endpoint is a structural block (approvals, branch
protection), not a transient error
- Fix stale comment/status in PHASE:done handler: "orchestrator or Claude"
instead of "agent"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- phase-handler.sh: remove do_merge(); on APPROVAL inject exact API
commands for agent to merge+close directly; PHASE:done now only
does local cleanup (tmux, worktree, labels) — merge already done
- dev-agent.sh: update PHASE_PROTOCOL_INSTRUCTIONS — Approved means
merge via API, close issue, then write PHASE:done
- dev-poll.sh: remove try_merge_or_rebase(); for approved+CI-green
orphaned PRs, spawn dev-agent (recovery mode) to merge instead
- .env.example: document new token roles (CODEBERG_TOKEN = bot for
push/PR/merge; REVIEW_BOT_TOKEN = human account for approvals)
- AGENTS.md: update token descriptions to match new roles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Lost during #160 refactor. These are dev-agent specific (reference
$ISSUE, $THREAD_FILE, $LOGFILE) so they belong in the agent script,
not the shared library.
Library functions need explicit session name argument — they no longer
have closure over $SESSION_NAME from the parent script.
- agent_kill_session: add $SESSION_NAME to all 11 call sites
- agent_inject_into_session: add $SESSION_NAME to all call sites in
phase-handler.sh and gardener-agent.sh
- agent_kill_session: guard against missing arg (defensive)
kill_tmux_session → agent_kill_session
inject_into_session → agent_inject_into_session
wait_for_claude_ready → agent_wait_for_claude_ready
Also restore status() function lost during #160 refactor.
Fixes dev-agent and gardener-agent crash on startup:
line 149: status: command not found
line 280: kill_tmux_session: command not found
Fixes#160
## Changes
Extracted phase callback functions (post_refusal_comment, do_merge, _on_phase_change) from dev/dev-agent.sh into new dev/phase-handler.sh. dev-agent.sh now sources both lib/agent-session.sh and dev/phase-handler.sh. Replaced inline dependency extraction with lib/parse-deps.sh. dev-agent.sh reduced from 1516 to 684 lines (55% reduction). AGENTS.md shellcheck command updated to include the new files.
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/disinto/pulls/173
Reviewed-by: Disinto_bot <disinto_bot@noreply.codeberg.org>
- `ci_fix_check_and_increment` now accepts an optional `check_only` arg:
- count < 3, check_only: returns `ok:N` without incrementing (deferred
to launch time, preserving the WAITING_PRS protection)
- count < 3, non-check_only: increments and returns `ok:N` (unchanged)
- count == 3 (any mode): atomically bumps to 4 and returns
`exhausted_first_time:3` — only one concurrent poller can win this
- count > 3 (any mode): returns `exhausted:N` with no write
- `handle_ci_exhaustion` unified to a single code path for both
check_only and non-check_only:
- Writes escalation JSONL + matrix_send only when sentinel is
`exhausted_first_time` — never on a bare integer comparison outside
a lock
- Removes the two separate `ci_fix_increment` bump-to-4 calls that
were racy (the sentinel bump is now inside the flock in Python)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous commit introduced a counter leak in the backlog scan path:
handle_ci_exhaustion (without check_only) atomically incremented the CI
fix counter before the WAITING_PRS guard, so an exit 0 that never spawned
a dev-agent would silently consume one of the three allowed fix attempts.
Restore the READY_PR_FOR_INCREMENT / deferred-increment mechanism:
- Backlog scan calls handle_ci_exhaustion with "check_only" (read-only,
no increment) to detect exhaustion without touching the counter.
- The counter is bumped atomically at LAUNCH time via handle_ci_exhaustion
(without check_only), so the increment only happens when we are certain
a dev-agent is being spawned. If a concurrent poller already exhausted
the counter between scan and launch, the LAUNCH call returns 0 and we
bail out cleanly without double-spawning.
The in-progress, stuck-PR, and try_merge_or_rebase paths are unaffected:
they call handle_ci_exhaustion without check_only, which continues to use
the atomic ci_fix_check_and_increment to prevent concurrent double-spawning.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add ci_fix_check_and_increment() that performs read + threshold-check +
conditional increment in a single flock-protected Python call, replacing
the prior three-step sequence (ci_fix_count / bash check / ci_fix_increment)
that allowed two concurrent poll invocations to both pass the threshold and
spawn duplicate dev-agents for the same PR.
handle_ci_exhaustion now calls ci_fix_check_and_increment atomically and
returns the new count in CI_FIX_ATTEMPTS; all separate ci_fix_increment
calls after handle_ci_exhaustion (including the deferred READY_PR_FOR_INCREMENT
mechanism) are removed. Log messages updated from CI_FIX_ATTEMPTS+1 to
CI_FIX_ATTEMPTS to reflect the post-increment count.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wrap ci_fix_count(), ci_fix_increment(), and ci_fix_reset() with flock
on a shared lockfile to prevent concurrent modification of the JSON
tracker. Uses flock(1) in command-wrapping mode so each Python process
holds an exclusive lock for the duration of its read-modify-write cycle.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract CI-exhaustion check/escalate logic into handle_ci_exhaustion() helper.
All three call sites (orphaned PRs, stuck PRs, backlog PRs) now use the shared
function, eliminating future drift between the copies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- dev-agent.sh: add explicit guard that skips formula-labeled issues with a
clear log message instead of silently producing no formula behavior
- BOOTSTRAP.md: rewrite formula label entry to state it is not yet functional
and that dev-agent will skip such issues until feat/formula is merged
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix SC2164: add || exit 1 to bare cd in update-prompt.sh
- Fix SC2155: separate declare and assign in env.sh, supervisor-poll.sh, dev-agent.sh
- Fix SC2034: inline suppression for vars used by sourced helpers
- Remove unused `mergeable` declaration, rename unused loop var to `_w`
- Remove || true from shellcheck CI step — failures are now blocking
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix double-injection bug: flat-file write only when direct tmux inject didn't happen
- Fix ci_exhausted href='#' fallback to use CODEBERG_WEB/pulls/N
- Remove duplicate $THREAD_FILE in rm command
- HTML-escape CI snippet before embedding in <pre> block
- notify_ctx falls back to plain matrix_send when no thread exists
- Thread root uses HTML-formatted message for consistency
- Deduplicate _ci_pipeline_url variable
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix dev-agent.sh comment: gardener-poll.sh is the backup injector, not review-poll.sh
- Add renotify marker cleanup to gardener injection path
- Use atomic mv to claim reply file, preventing double-injection race between supervisor and gardener
- Add break after supervisor injection for symmetry with gardener
- Remove overly prescriptive PHASE:awaiting_ci hardcode from injection instructions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace fixed sleep(3) + paste-buffer race with a wait_for_claude_ready()
function that polls the tmux pane for the ❯ prompt (up to 120s). This
fixes the bug where the initial prompt was pasted before Claude Code
finished initializing, resulting in a stuck session with an empty prompt.
Observed on issue #81: session sat idle for 42+ minutes because the
paste arrived during Claude's startup splash screen.
Changes:
- Add wait_for_claude_ready() that polls tmux capture-pane for ❯
- Call it inside inject_into_session() before every paste
- Use inject_into_session() for initial prompt (was inline paste-buffer)
- Remove fixed sleep(3) from session creation and recovery paths
- Fail hard if claude doesn't become ready within timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing MAX_CI_FIXES=3 and MAX_REVIEW_ROUNDS=5 constants to the
config section; referencing undefined variables with set -euo pipefail
caused an abort on first CI failure or REQUEST_CHANGES review.
- cleanup() trap now calls kill_tmux_session() so any unexpected exit
(SIGTERM, errexit, unbound variable) kills the Claude session rather
than leaving it running autonomously without an orchestrator.
- do_merge() initial CI wait loop now breaks and returns 1 immediately
on failure/error states, avoiding a full 10-minute poll before a
merge attempt that would also fail.
- Inner review-poll loop no longer updates LAST_PHASE_MTIME when it
detects a mid-wait phase-file change; leaving it stale ensures the
outer loop detects and dispatches the new phase on its next tick
(previously the phase was silently swallowed).
- post_refusal_comment dedup now fetches the last 5 comments and checks
any of them, so a human reply between two agent runs no longer causes
a duplicate refusal comment.
- Remove duplicate DELETE labels/backlog call in claim section.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace fire-and-forget `claude -p` calls with a persistent tmux session
that Claude Code runs in interactively. The orchestrator (dev-agent.sh)
monitors a phase file and reacts to Claude's signals:
- Session lifecycle: create `dev-{project}-{issue}` tmux session, send
the full initial prompt (issue body + phase protocol instructions) via
`tmux load-buffer` / `tmux paste-buffer`, then enter a phase monitor loop.
- Phase monitor loop: polls `/tmp/dev-session-{project}-{issue}.phase`
every 30s for mtime changes. Handles all five phase sentinels:
- PHASE:awaiting_ci → create PR if needed, poll CI, inject result
- PHASE:awaiting_review → poll for review comment, inject verdict
- PHASE:needs_human → send Matrix notification, wait for injection
- PHASE:done → call do_merge(), exit on success
- PHASE:failed → detect refusal JSON vs genuine failure, post
comment / escalate, kill session, restore backlog
- Crash recovery: if the tmux session dies unexpectedly, dev-agent.sh
restarts it in the same worktree and injects a recovery prompt with
the last known phase and git diff.
- Idle timeout: 2h with no phase update kills the session gracefully.
- PR creation moved into the PHASE:awaiting_ci handler; Claude pushes the
branch and writes the phase, orchestrator creates the PR and starts CI.
- Summary file `/tmp/dev-impl-summary-{project}-{issue}.txt` carries the
implementation summary (for PR body) and refusal JSON between Claude and
the orchestrator.
- All existing logic preserved: dep preflight, label management, do_merge()
with rebase retry, CI escalation, prior art detection, log rotation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- supervisor: skip *.done.jsonl in escalation glob (bug: wildcard matched
harb.done.jsonl producing spurious 'pending' log noise every cycle)
- supervisor: use wc -l instead of grep -c . for line counting (style nit)
- supervisor: consume gardener-esc-resolved.log via fixed() so escalation
resolutions appear in end-of-cycle supervisor reporting
- dev-poll: update all 'escalated to supervisor' log/matrix strings to
'escalated to gardener' (lines 263, 268, 344, 420)
- gardener: track _esc_total_created across all escalation entries and
write count to supervisor/gardener-esc-resolved.log after processing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- dev-poll.sh: write escalations to per-project files
(supervisor/escalations-{PROJECT_NAME}.jsonl) and add "project" field
so each project's escalations are isolated; update is_escalated() to
read from the same per-project paths
- gardener-poll.sh: add escalation processing block that reads the
per-project escalation file, fetches CI logs via Woodpecker, and
creates per-file ShellCheck sub-issues or generic CI failure issues
labeled backlog — runs with the correct CODEBERG_API and
WOODPECKER_REPO_ID already loaded from the project TOML
- supervisor-poll.sh: remove the escalation processing block; replace
with a simple flog report counting pending escalations per project
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The wait-for-CI loop sleeps 30s × 60 iterations waiting for CI
to report. Projects with WOODPECKER_REPO_ID=0 never get a status,
so the agent times out after 30min without merging approved PRs.
Now detects no-CI early and treats as success immediately.
- Race condition: mv escalations.jsonl to a PID-stamped snapshot before
processing so concurrent dev-poll appends go to a fresh file; rm snapshot
after loop — no entries are ever silently dropped
- SQL injection: validate ESC_PR_SHA is a 40-char hex string before
interpolating into the wpdb query
- sc_codes scope: compute per-file from file_errors (already filtered to
that file) instead of the entire step log; also switch grep to -F so
dots in filenames are not treated as regex wildcards
- step_pid validation: reject non-integer values from Woodpecker API before
passing as CLI argument
- Fallback body now distinguishes "CI logs unavailable" from "logs found
but issue creation API calls failed"
- ESC_GENERIC_FAIL: avoid leading blank line by using conditional separator
and fix code-block opening newline
- is_escalated(): remove dead esc_file/done_file locals; add Python-level
int() guard so empty/non-numeric issue or pr values fail cleanly instead
of producing a syntax error suppressed by 2>/dev/null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- supervisor-poll.sh: replace P3 escalation log with actionable sub-issue creation.
For each entry in escalations.jsonl: fetch CI logs via woodpecker-cli, create one
sub-issue per file for ShellCheck failures, one combined issue for other CI failures,
or a fallback investigation issue if logs are unavailable. Move processed entries to
escalations.done.jsonl and clear escalations.jsonl.
- dev-poll.sh: add is_escalated() helper that checks both escalations.jsonl and
escalations.done.jsonl; use it (alongside ci_fix_count >= 3) in all three CI-fix
spawn paths so escalated PRs are skipped even if the ci-fixes tracker is reset.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs after #53 merged:
1. Escalation written every poll cycle (4 entries in 30min) — now writes once, bumps counter to 4 to skip
2. Exit after escalation blocked backlog work — now falls through to pick up next issue
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/disinto/pulls/59
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
Dev-poll spawned a fresh agent every 10min for CI failures. Each agent started with CI_FIX_COUNT=0 — infinite loop.
Now tracks attempts per PR in `/tmp/dev-poll-ci-fixes-{project}.json`. After 3 failed rounds:
- Writes escalation to `supervisor/escalations.jsonl`
- Sends Matrix alert
- Stops respawning
Part of #52 (supervisor escalation pipeline).
Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/disinto/pulls/53
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
dev-poll.sh had 5 places checking CI_STATE='success', all blocking
projects without CI. Extracted ci_passed() helper that treats
empty/pending/unknown as pass when WOODPECKER_REPO_ID=0.