fix: dev-agent failure cleanup should preserve remote branch and PR for debugging #131

Closed
opened 2026-04-02 05:28:59 +00:00 by dev-bot · 0 comments
Collaborator

Problem

Issue #115 added cleanup on CI exhausted / block: close PR, delete remote branch, clean up worktree. This fixed the stale-branch recovery-mode problem, but it also destroys all evidence needed to debug WHY the attempt failed. The PR diff, CI logs, and review comments are all lost.

This made it impossible to debug the #124 (smoke-init) failure — the branch was deleted and CI logs became inaccessible.

Fix

Change the failure path in dev/dev-agent.sh to preserve the remote state while still ensuring a fresh start on retry:

On failure (CI exhausted, agent_failed, blocked):

  1. Clean up local worktree — keep (free disk)
  2. Delete local SID/temp files — keep (cleanup)
  3. Close the PR — REMOVE (keep PR open for inspection)
  4. Delete the remote branch — REMOVE (keep branch for debugging)

On retry (next dev-poll pickup):

Use a unique branch name per attempt so the new attempt never collides with the old one:

BRANCH="fix/issue-${ISSUE}-$(date +%s)"

Or use an incrementing suffix by counting existing branches:

ATTEMPT=$(git ls-remote --heads origin "fix/issue-${ISSUE}*" 2>/dev/null | wc -l)
BRANCH="fix/issue-${ISSUE}$( [ "$ATTEMPT" -gt 0 ] && echo "-${ATTEMPT}" )"

This means:

  • First attempt: fix/issue-124 (no suffix)
  • Second attempt: fix/issue-124-1
  • Third attempt: fix/issue-124-2

Each attempt creates its own PR. Failed PRs stay open with their CI logs intact. The dev-poll pr_find_by_branch won't match the old branch name, so no recovery mode.

What stays the same

  • Success path: worktree cleanup, mirror push (unchanged)
  • issue_block still marks the issue as blocked with the exit reason

Affected files

  • dev/dev-agent.sh — failure path: remove PR close and branch delete; change BRANCH naming to include attempt suffix
  • dev/dev-poll.sh — may need adjustment if it searches for existing branches/PRs by issue number (ensure it doesn't pick up old failed PRs)

Acceptance criteria

  • On failure: local worktree cleaned, remote branch and PR preserved
  • On retry: new unique branch name (no collision with previous attempts)
  • Failed PRs remain open with CI logs accessible
  • New attempt does not enter recovery mode from old branch
  • Success path unchanged
  • CI green

Additional fix: stale branch detection in dev-poll recovery mode

The unique branch naming only prevents collisions between consecutive attempts. Two more scenarios cause stale branch problems:

  1. Issue closed and reopened — old branch/PR from before the close still exists
  2. Dependency landed after branch was created — branch is behind main, missing the fix it depends on

Fix in dev/dev-poll.sh recovery mode

Before entering recovery mode (when an existing PR/branch is found for the issue), check if the branch is behind main:

# Check if existing branch is behind main
BEHIND=$(git rev-list --count origin/${PRIMARY_BRANCH}..origin/${BRANCH} 2>/dev/null || echo "0")
AHEAD=$(git rev-list --count origin/${BRANCH}..origin/${PRIMARY_BRANCH} 2>/dev/null || echo "999")

if [ "$AHEAD" -gt 0 ]; then
  log "branch ${BRANCH} is ${AHEAD} commits behind main — closing stale PR and starting fresh"
  # Close the stale PR
  # Delete the stale branch
  # Fall through to create new branch from main
fi

This ensures recovery mode only activates when the branch is up-to-date with main. If main has moved ahead (new dependencies landed, other fixes merged), the old branch is abandoned and a fresh one is created.

Updated acceptance criteria

  • dev-poll checks if existing branch is behind main before recovery mode
  • Stale branches (behind main) trigger: close PR, delete branch, create fresh
  • Only up-to-date branches enter recovery mode
## Problem Issue #115 added cleanup on CI exhausted / block: close PR, delete remote branch, clean up worktree. This fixed the stale-branch recovery-mode problem, but it also destroys all evidence needed to debug WHY the attempt failed. The PR diff, CI logs, and review comments are all lost. This made it impossible to debug the #124 (smoke-init) failure — the branch was deleted and CI logs became inaccessible. ## Fix Change the failure path in `dev/dev-agent.sh` to preserve the remote state while still ensuring a fresh start on retry: ### On failure (CI exhausted, agent_failed, blocked): 1. Clean up local worktree — **keep** (free disk) 2. Delete local SID/temp files — **keep** (cleanup) 3. Close the PR — **REMOVE** (keep PR open for inspection) 4. Delete the remote branch — **REMOVE** (keep branch for debugging) ### On retry (next dev-poll pickup): Use a **unique branch name per attempt** so the new attempt never collides with the old one: ```bash BRANCH="fix/issue-${ISSUE}-$(date +%s)" ``` Or use an incrementing suffix by counting existing branches: ```bash ATTEMPT=$(git ls-remote --heads origin "fix/issue-${ISSUE}*" 2>/dev/null | wc -l) BRANCH="fix/issue-${ISSUE}$( [ "$ATTEMPT" -gt 0 ] && echo "-${ATTEMPT}" )" ``` This means: - First attempt: `fix/issue-124` (no suffix) - Second attempt: `fix/issue-124-1` - Third attempt: `fix/issue-124-2` Each attempt creates its own PR. Failed PRs stay open with their CI logs intact. The dev-poll `pr_find_by_branch` won't match the old branch name, so no recovery mode. ### What stays the same - Success path: worktree cleanup, mirror push (unchanged) - `issue_block` still marks the issue as blocked with the exit reason ## Affected files - `dev/dev-agent.sh` — failure path: remove PR close and branch delete; change BRANCH naming to include attempt suffix - `dev/dev-poll.sh` — may need adjustment if it searches for existing branches/PRs by issue number (ensure it doesn't pick up old failed PRs) ## Acceptance criteria - [ ] On failure: local worktree cleaned, remote branch and PR preserved - [ ] On retry: new unique branch name (no collision with previous attempts) - [ ] Failed PRs remain open with CI logs accessible - [ ] New attempt does not enter recovery mode from old branch - [ ] Success path unchanged - [ ] CI green ## Additional fix: stale branch detection in dev-poll recovery mode The unique branch naming only prevents collisions between consecutive attempts. Two more scenarios cause stale branch problems: 1. **Issue closed and reopened** — old branch/PR from before the close still exists 2. **Dependency landed after branch was created** — branch is behind main, missing the fix it depends on ### Fix in `dev/dev-poll.sh` recovery mode Before entering recovery mode (when an existing PR/branch is found for the issue), check if the branch is behind main: ```bash # Check if existing branch is behind main BEHIND=$(git rev-list --count origin/${PRIMARY_BRANCH}..origin/${BRANCH} 2>/dev/null || echo "0") AHEAD=$(git rev-list --count origin/${BRANCH}..origin/${PRIMARY_BRANCH} 2>/dev/null || echo "999") if [ "$AHEAD" -gt 0 ]; then log "branch ${BRANCH} is ${AHEAD} commits behind main — closing stale PR and starting fresh" # Close the stale PR # Delete the stale branch # Fall through to create new branch from main fi ``` This ensures recovery mode only activates when the branch is up-to-date with main. If main has moved ahead (new dependencies landed, other fixes merged), the old branch is abandoned and a fresh one is created. ### Updated acceptance criteria - [ ] dev-poll checks if existing branch is behind main before recovery mode - [ ] Stale branches (behind main) trigger: close PR, delete branch, create fresh - [ ] Only up-to-date branches enter recovery mode
dev-bot added the
backlog
priority
labels 2026-04-02 05:28:59 +00:00
dev-qwen self-assigned this 2026-04-02 05:29:41 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-02 05:29:41 +00:00
dev-qwen removed their assignment 2026-04-02 05:49:48 +00:00
dev-qwen removed the
in-progress
label 2026-04-02 05:49:49 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#131
No description provided.