Commit graph

169 commits

Author SHA1 Message Date
openhands
7bf13567fd fix: TOCTOU in handle_ci_exhaustion: check-then-act not atomic (#125)
Add ci_fix_check_and_increment() that performs read + threshold-check +
conditional increment in a single flock-protected Python call, replacing
the prior three-step sequence (ci_fix_count / bash check / ci_fix_increment)
that allowed two concurrent poll invocations to both pass the threshold and
spawn duplicate dev-agents for the same PR.

handle_ci_exhaustion now calls ci_fix_check_and_increment atomically and
returns the new count in CI_FIX_ATTEMPTS; all separate ci_fix_increment
calls after handle_ci_exhaustion (including the deferred READY_PR_FOR_INCREMENT
mechanism) are removed. Log messages updated from CI_FIX_ATTEMPTS+1 to
CI_FIX_ATTEMPTS to reflect the post-increment count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 10:22:24 +00:00
openhands
7d51e5e333 fix: Add formula guard to backlog scan path (#127) 2026-03-18 09:49:44 +00:00
openhands
deeedd0cbf fix: CODEBERG_WEB not exported from lib/env.sh — other agents may hit the same gap (#129)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 09:40:20 +00:00
openhands
19a245fe5e fix: Coordinate review injection between review-poll.sh and dev-agent.sh to prevent double-injection (#90)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 09:01:50 +00:00
openhands
9fa4846581 fix: ALL_COMMENTS fetch is capped at limit=50 — watermark search may miss reviews on high-comment PRs (#100)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 08:13:43 +00:00
openhands
9d2b92f0d5 fix: needs_human notification sent every poll cycle pre-PR (#103)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 07:35:13 +00:00
openhands
88f2268bc6 fix: idle timeout does not escalate — session dies silently (#123)
1. Timeout handler (dev-agent.sh): write escalation to project-suffixed
   file, restore backlog label, clean up phase file on idle timeout.
2. Fix escalation file naming: escalations.jsonl → escalations-${PROJECT_NAME}.jsonl
   everywhere in dev-agent.sh so gardener actually picks them up.
3. Gardener (gardener-poll.sh): handle idle_timeout reason before CI-specific
   recipe logic — create investigation sub-issue instead of silently returning.
4. Update .gitignore to match new escalations-*.jsonl pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 07:02:33 +00:00
openhands
32ee53517f fix: In-progress formula issue causes infinite dev-agent respawn (#115)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 06:41:43 +00:00
openhands
1352620c3d fix: ci_fix_count/ci_fix_increment not atomic — potential race under concurrent polls (#118)
Wrap ci_fix_count(), ci_fix_increment(), and ci_fix_reset() with flock
on a shared lockfile to prevent concurrent modification of the JSON
tracker. Uses flock(1) in command-wrapping mode so each Python process
holds an exclusive lock for the duration of its read-modify-write cycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 06:30:17 +00:00
openhands
cf8446b451 fix: try_merge_or_rebase rebase-failure spawn bypasses ci_fix_increment (#56)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 04:05:18 +00:00
openhands
ff02b1e653 fix: Three near-identical CI-exhaustion blocks should be a shared function (#58)
Extract CI-exhaustion check/escalate logic into handle_ci_exhaustion() helper.
All three call sites (orphaned PRs, stuck PRs, backlog PRs) now use the shared
function, eliminating future drift between the copies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 03:21:27 +00:00
openhands
2446543545 fix: feat/formula not merged but formula templates and label docs already on main (#69)
- dev-agent.sh: add explicit guard that skips formula-labeled issues with a
  clear log message instead of silently producing no formula behavior
- BOOTSTRAP.md: rewrite formula label entry to state it is not yet functional
  and that dev-agent will skip such issues until feat/formula is merged

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 02:17:18 +00:00
openhands
8e600787c1 fix: ci_passed() still lives in dev/dev-poll.sh, not lib/ (#70)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 02:05:54 +00:00
openhands
bd02330b22 fix: shellcheck TODO has no enforcement — || true may never be removed (#71)
- Fix SC2164: add || exit 1 to bare cd in update-prompt.sh
- Fix SC2155: separate declare and assign in env.sh, supervisor-poll.sh, dev-agent.sh
- Fix SC2034: inline suppression for vars used by sourced helpers
- Remove unused `mergeable` declaration, rename unused loop var to `_w`
- Remove || true from shellcheck CI step — failures are now blocking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 01:53:02 +00:00
openhands
8034b50315 fix: address review findings from issue #76
- Fix double-injection bug: flat-file write only when direct tmux inject didn't happen
- Fix ci_exhausted href='#' fallback to use CODEBERG_WEB/pulls/N
- Remove duplicate $THREAD_FILE in rm command
- HTML-escape CI snippet before embedding in <pre> block
- notify_ctx falls back to plain matrix_send when no thread exists
- Thread root uses HTML-formatted message for consistency
- Deduplicate _ci_pipeline_url variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 00:42:00 +00:00
openhands
814706bf90 fix: feat: Matrix notifications — contextual, linked, conversational (#76)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 00:20:11 +00:00
openhands
bfe0c09b5c fix: address review findings from issue #81
- Fix dev-agent.sh comment: gardener-poll.sh is the backup injector, not review-poll.sh
- Add renotify marker cleanup to gardener injection path
- Use atomic mv to claim reply file, preventing double-injection race between supervisor and gardener
- Add break after supervisor injection for symmetry with gardener
- Remove overly prescriptive PHASE:awaiting_ci hardcode from injection instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 22:40:54 +00:00
openhands
48683e508c fix: feat: supervisor-poll.sh and gardener-poll.sh inject human replies into needs_human dev sessions (#81)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 22:33:28 +00:00
openhands
515c528e10 fix: poll for claude readiness before injecting prompt into tmux
Replace fixed sleep(3) + paste-buffer race with a wait_for_claude_ready()
function that polls the tmux pane for the ❯ prompt (up to 120s). This
fixes the bug where the initial prompt was pasted before Claude Code
finished initializing, resulting in a stuck session with an empty prompt.

Observed on issue #81: session sat idle for 42+ minutes because the
paste arrived during Claude's startup splash screen.

Changes:
- Add wait_for_claude_ready() that polls tmux capture-pane for ❯
- Call it inside inject_into_session() before every paste
- Use inject_into_session() for initial prompt (was inline paste-buffer)
- Remove fixed sleep(3) from session creation and recovery paths
- Fail hard if claude doesn't become ready within timeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 23:23:12 +01:00
openhands
d59c09eb5b fix: address review findings from issue #80 phase protocol
- Add missing MAX_CI_FIXES=3 and MAX_REVIEW_ROUNDS=5 constants to the
  config section; referencing undefined variables with set -euo pipefail
  caused an abort on first CI failure or REQUEST_CHANGES review.

- cleanup() trap now calls kill_tmux_session() so any unexpected exit
  (SIGTERM, errexit, unbound variable) kills the Claude session rather
  than leaving it running autonomously without an orchestrator.

- do_merge() initial CI wait loop now breaks and returns 1 immediately
  on failure/error states, avoiding a full 10-minute poll before a
  merge attempt that would also fail.

- Inner review-poll loop no longer updates LAST_PHASE_MTIME when it
  detects a mid-wait phase-file change; leaving it stale ensures the
  outer loop detects and dispatches the new phase on its next tick
  (previously the phase was silently swallowed).

- post_refusal_comment dedup now fetches the last 5 comments and checks
  any of them, so a human reply between two agent runs no longer causes
  a duplicate refusal comment.

- Remove duplicate DELETE labels/backlog call in claim section.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 20:40:35 +00:00
openhands
db92bc13b5 fix: feat: tmux session manager in dev-agent.sh (#80)
Replace fire-and-forget `claude -p` calls with a persistent tmux session
that Claude Code runs in interactively. The orchestrator (dev-agent.sh)
monitors a phase file and reacts to Claude's signals:

- Session lifecycle: create `dev-{project}-{issue}` tmux session, send
  the full initial prompt (issue body + phase protocol instructions) via
  `tmux load-buffer` / `tmux paste-buffer`, then enter a phase monitor loop.

- Phase monitor loop: polls `/tmp/dev-session-{project}-{issue}.phase`
  every 30s for mtime changes. Handles all five phase sentinels:
  - PHASE:awaiting_ci   → create PR if needed, poll CI, inject result
  - PHASE:awaiting_review → poll for review comment, inject verdict
  - PHASE:needs_human  → send Matrix notification, wait for injection
  - PHASE:done         → call do_merge(), exit on success
  - PHASE:failed       → detect refusal JSON vs genuine failure, post
                          comment / escalate, kill session, restore backlog

- Crash recovery: if the tmux session dies unexpectedly, dev-agent.sh
  restarts it in the same worktree and injects a recovery prompt with
  the last known phase and git diff.

- Idle timeout: 2h with no phase update kills the session gracefully.

- PR creation moved into the PHASE:awaiting_ci handler; Claude pushes the
  branch and writes the phase, orchestrator creates the PR and starts CI.

- Summary file `/tmp/dev-impl-summary-{project}-{issue}.txt` carries the
  implementation summary (for PR body) and refusal JSON between Claude and
  the orchestrator.

- All existing logic preserved: dep preflight, label management, do_merge()
  with rebase retry, CI escalation, prior art detection, log rotation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 20:20:38 +00:00
openhands
1b29baebc3 fix: feat: auto-pull factory code on every agent spawn (#85)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 19:35:29 +00:00
openhands
2b534bb7ec fix: address review findings from issue #79 phase protocol
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 19:27:11 +00:00
openhands
275b92e8b5 fix: address review findings from issue #79 phase protocol
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 19:21:01 +00:00
openhands
d87b7db8f3 fix: feat: define phase-signaling protocol for persistent Claude sessions (#79)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 18:53:15 +00:00
openhands
df2522a7cb fix: address review findings from issue #67 escalation refactor
- supervisor: skip *.done.jsonl in escalation glob (bug: wildcard matched
  harb.done.jsonl producing spurious 'pending' log noise every cycle)
- supervisor: use wc -l instead of grep -c . for line counting (style nit)
- supervisor: consume gardener-esc-resolved.log via fixed() so escalation
  resolutions appear in end-of-cycle supervisor reporting
- dev-poll: update all 'escalated to supervisor' log/matrix strings to
  'escalated to gardener' (lines 263, 268, 344, 420)
- gardener: track _esc_total_created across all escalation entries and
  write count to supervisor/gardener-esc-resolved.log after processing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 18:30:57 +00:00
openhands
150ede5605 fix: refactor: move escalation processing from supervisor to gardener (#67)
- dev-poll.sh: write escalations to per-project files
  (supervisor/escalations-{PROJECT_NAME}.jsonl) and add "project" field
  so each project's escalations are isolated; update is_escalated() to
  read from the same per-project paths
- gardener-poll.sh: add escalation processing block that reads the
  per-project escalation file, fetches CI logs via Woodpecker, and
  creates per-file ShellCheck sub-issues or generic CI failure issues
  labeled backlog — runs with the correct CODEBERG_API and
  WOODPECKER_REPO_ID already loaded from the project TOML
- supervisor-poll.sh: remove the escalation processing block; replace
  with a simple flog report counting pending escalations per project

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 17:32:56 +00:00
johba
552b6edbaf Merge pull request 'fix: dev-agent CI wait loop blocks forever without CI' (#62) from fix/agent-ci-wait-no-ci into main
Reviewed-on: https://codeberg.org/johba/disinto/pulls/62
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
2026-03-17 17:00:54 +01:00
openhands
8c816d6e7b fix: dev-agent CI wait loop blocks forever for projects without CI
The wait-for-CI loop sleeps 30s × 60 iterations waiting for CI
to report. Projects with WOODPECKER_REPO_ID=0 never get a status,
so the agent times out after 30min without merging approved PRs.

Now detects no-CI early and treats as success immediately.
2026-03-17 15:35:40 +00:00
openhands
13bc948b1d fix: address review findings for escalation race condition, SQL injection, and sc_codes scope
- Race condition: mv escalations.jsonl to a PID-stamped snapshot before
  processing so concurrent dev-poll appends go to a fresh file; rm snapshot
  after loop — no entries are ever silently dropped
- SQL injection: validate ESC_PR_SHA is a 40-char hex string before
  interpolating into the wpdb query
- sc_codes scope: compute per-file from file_errors (already filtered to
  that file) instead of the entire step log; also switch grep to -F so
  dots in filenames are not treated as regex wildcards
- step_pid validation: reject non-integer values from Woodpecker API before
  passing as CLI argument
- Fallback body now distinguishes "CI logs unavailable" from "logs found
  but issue creation API calls failed"
- ESC_GENERIC_FAIL: avoid leading blank line by using conditional separator
  and fix code-block opening newline
- is_escalated(): remove dead esc_file/done_file locals; add Python-level
  int() guard so empty/non-numeric issue or pr values fail cleanly instead
  of producing a syntax error suppressed by 2>/dev/null

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 15:11:53 +00:00
openhands
d9520f48a6 fix: feat: supervisor breaks down escalated CI failures into sub-issues (#52)
- supervisor-poll.sh: replace P3 escalation log with actionable sub-issue creation.
  For each entry in escalations.jsonl: fetch CI logs via woodpecker-cli, create one
  sub-issue per file for ShellCheck failures, one combined issue for other CI failures,
  or a fallback investigation issue if logs are unavailable. Move processed entries to
  escalations.done.jsonl and clear escalations.jsonl.
- dev-poll.sh: add is_escalated() helper that checks both escalations.jsonl and
  escalations.done.jsonl; use it (alongside ci_fix_count >= 3) in all three CI-fix
  spawn paths so escalated PRs are skipped even if the ci-fixes tracker is reset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 14:32:41 +00:00
johba
531ae5cf71 fix: escalate once then continue to backlog (#59)
Two bugs after #53 merged:
1. Escalation written every poll cycle (4 entries in 30min) — now writes once, bumps counter to 4 to skip
2. Exit after escalation blocked backlog work — now falls through to pick up next issue

Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/disinto/pulls/59
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
2026-03-17 15:14:48 +01:00
johba
c24adc4ea2 fix: limit CI fix respawn to 3 attempts, then escalate to supervisor (#53)
Dev-poll spawned a fresh agent every 10min for CI failures. Each agent started with CI_FIX_COUNT=0 — infinite loop.

Now tracks attempts per PR in `/tmp/dev-poll-ci-fixes-{project}.json`. After 3 failed rounds:
- Writes escalation to `supervisor/escalations.jsonl`
- Sends Matrix alert
- Stops respawning

Part of #52 (supervisor escalation pipeline).

Co-authored-by: openhands <openhands@all-hands.dev>
Reviewed-on: https://codeberg.org/johba/disinto/pulls/53
Reviewed-by: review_bot <review_bot@noreply.codeberg.org>
2026-03-17 13:15:49 +01:00
openhands
ef77c56217 fix: extract ci_passed() helper — fix all CI gates for no-CI projects
dev-poll.sh had 5 places checking CI_STATE='success', all blocking
projects without CI. Extracted ci_passed() helper that treats
empty/pending/unknown as pass when WOODPECKER_REPO_ID=0.
2026-03-17 09:51:18 +00:00
johba
915ff45cc6 Merge pull request 'fix: dev-agent merge gate blocks projects without CI' (#43) from fix/dev-agent-merge-no-ci into main
Reviewed-on: https://codeberg.org/johba/disinto/pulls/43
2026-03-17 10:49:19 +01:00
openhands
ad9d68e525 fix: dev-agent merge gate requires CI even for projects without CI
Same pattern as review-poll — projects with WOODPECKER_REPO_ID=0
treat empty/unknown CI as pass for the merge gate.
2026-03-17 09:48:13 +00:00
johba
0490a4b8d8 Merge pull request 'fix: auto-close issues when dev-agent detects already_done' (#42) from fix/close-already-done into main
Reviewed-on: https://codeberg.org/johba/disinto/pulls/42
2026-03-17 10:39:22 +01:00
openhands
9445e36a1e fix: auto-close issues when dev-agent detects already_done
Previously the agent unclaimed the issue but left it open, causing
an infinite claim/refuse/unclaim loop on every poll cycle.
2026-03-17 09:38:08 +00:00
openhands
1b3559bba7 fix: enforce single-threaded pipeline per project
Don't start new issues while open PRs are waiting for review/CI.
This prevents dev-agent from churning through backlog issues
without reviews landing first.
2026-03-17 09:17:02 +00:00
openhands
ea033d3f04 fix: TMPDIR unbound variable crashes already_done handler
TMPDIR is not guaranteed to be set. Replaced with /tmp/ directly.
This caused harb dev-agent to crash when posting refusal comments,
leaving issues stuck in a retry loop.
2026-03-17 09:00:43 +00:00
openhands
b376fbc25e fix: dev-agent.sh also needs per-project lock file
dev-poll.sh was fixed but dev-agent.sh still used hardcoded
/tmp/dev-agent.lock. The disinto agent locked out harb's agent.
2026-03-17 08:41:15 +00:00
openhands
249eef86c1 fix: per-project lock and log files for dev-poll
Hardcoded /tmp/dev-agent.lock meant harb and disinto dev-polls shared
a lock — one project's running agent blocked the other. Now uses
/tmp/dev-agent-{project}.lock and dev-agent-{project}.log.
2026-03-17 08:18:24 +00:00
johba
9050413994 refactor: split supervisor into infra + per-project, make poll scripts config-driven
Supervisor split (#26):
- Layer 1 (infra): P0 memory, P1 disk, P4 housekeeping — runs once, project-agnostic
- Layer 2 (per-project): P2 CI/dev-agent, P3 PRs/deps — iterates projects/*.toml
- Adding a new project requires only a new TOML file, no code changes

Poll scripts accept project TOML arg (#27):
- dev-poll.sh, review-poll.sh, gardener-poll.sh accept optional project TOML as $1
- env.sh loads PROJECT_TOML if set, overriding .env defaults
- Cron: `dev-poll.sh projects/versi.toml` targets that project

New files:
- lib/load-project.sh: TOML to env var loader (Python tomllib)
- projects/versi.toml: current project config extracted from .env

Backwards compatible: scripts without a TOML arg fall back to .env config.

Closes #26, Closes #27

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 08:57:18 +01:00
openhands
5aa0b42481 fix: dev-agent preflight treats ## Related refs as dependencies
The broad regex `(?:^|\n)\s*-\s*#\K[0-9]+` matched ANY bullet with #NNN,
including ## Related sections. This caused #893 (and likely others) to be
permanently blocked by sibling issues that aren't actual dependencies.

Now only extracts deps from:
- Inline 'depends on #NNN' / 'blocked by #NNN' phrases
- ## Dependencies / ## Depends on / ## Blocked by sections

This matches the same logic used by dev-poll.sh get_deps().
2026-03-17 05:58:54 +00:00
johba
98f0c40106 refactor: rewrite parse-deps.py as pure bash, remove only Python from repo
Replace lib/parse-deps.py with lib/parse-deps.sh to keep the toolchain
all-bash. Rewrite supervisor P3b cycle detection and P3c stale dep check
as pure bash using associative arrays and DFS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 21:22:53 +01:00
johba
6cf580c010 refactor: extract shared dep parser to lib/parse-deps.py (Closes #20)
Single source of truth for dependency parsing, replacing three copies:
- dev-poll.sh get_deps() now calls parse-deps.py
- supervisor P3b/P3c import parse_deps() via importlib

Supports stdin, argument, and --json modes for different callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 21:16:49 +01:00
johba
77cb4c4643 refactor: rename factory/ → supervisor/, factory-poll → supervisor-poll
The supervisor agent was confusingly named "factory" (same as the
project). Rename directory, script, log, lock, status, and escalation
files. Update all references across scripts and docs.

FACTORY_ROOT env var unchanged (refers to project root, not agent).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 18:06:25 +01:00
openhands
0a0d5e8e24 fix: inline merge+rebase in recovery path (do_merge not yet defined)
do_merge() is defined at line 876, but recovery mode calls it at
line ~498. Bash requires functions to be defined before use.
Inlined the merge→rebase→re-approve→retry logic directly.
2026-03-15 14:10:21 +00:00
openhands
2c527cef4a fix: dev-agent handles approved+stuck PRs in recovery mode
1. Recovery mode: if PR already has approval + green CI, try merge
   immediately instead of entering the review wait loop forever.
2. do_merge: on 405/merge failure, rebase → force push → wait CI →
   re-approve via review_bot → retry merge. Covers the stale-approval
   dismissal problem end-to-end.
3. Codeberg mergeable field is unreliable — rebase on any merge failure.
2026-03-15 14:09:33 +00:00
openhands
c5fdd8ac50 fix: always rebase on merge failure, don't trust mergeable field
Codeberg's mergeable field flickers between true/false — unreliable
for deciding whether to rebase. Just attempt rebase on any non-200/204.
Worst case it's a no-op. Also added git fetch before rebase.
2026-03-15 10:51:09 +00:00