bug: dev-poll skips CI-fix on re-claimed issues — blocked label not cleared on re-claim, starves new PRs at 0 attempts #1047

New issue

Closed

opened 2026-04-19 16:47:31 +00:00 by disinto-admin · 1 comment

disinto-admin commented

2026-04-19 16:47:31 +00:00

Owner

Problem

When an issue carries the blocked label from a failed prior attempt, and an agent re-claims it for a fresh attempt (adding in-progress, removing backlog), the blocked label is not removed. Any new PR opened for that issue is then silently skipped by the CI-fix path in dev-poll.sh, even though it's a legitimate new attempt with zero fix attempts on the new PR.

The agent ends up in a "open PR, never iterate on it" state — it did the work of writing the fix but can't respond to the feedback loop that would normally correct pipeline errors or test failures.

Evidence

Observed on issue #1025 / PR #1046 on 2026-04-19:

[16:37:45 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping
[16:38:49 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping
[16:39:54 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping
...repeats every poll cycle...

Label timeline on #1025:

04:57  dev-qwen      +blocked                       (PR #1033 CI failure)
11:07  dev-qwen2     +blocked                       (PR #1043 CI failure, after #1033 was closed)
15:47  dev-qwen2     +in-progress, -backlog         (re-claim for PR #1046)  ← blocked not cleared
15:59  PR #1046 opened
16:00  CI errors on pipeline #1408/1409
16:29  dev-qwen2     -in-progress                   (still blocked)

The (0 attempts) in the poll log proves this is not exhaustion — the fast-path at dev-poll.sh:210-214 short-circuits before the counter is consulted:

# Fast path: already blocked — skip without touching counter.
if is_blocked "$issue_num"; then
    CI_FIX_ATTEMPTS=$(ci_fix_count "$pr_num")
    log "PR #${pr_num} (issue #${issue_num}) already blocked (${CI_FIX_ATTEMPTS} attempts) — skipping"
    return 0
fi

Root cause

The re-claim path and the block-fast-path treat blocked asymmetrically:

Re-claim (backlog → in-progress): ignores blocked. Agent proceeds to write code and open a PR.
CI-fix gate (when poll sees an open PR with CI failure/error): checks blocked first and skips if set.

blocked is semantically "this attempt is dead — wait for human/supervisor review." A re-claim is exactly that review completing ("try again"). The label should be cleared when the re-claim happens, or the re-claim should refuse to proceed while blocked is set.

Fix options

Option A — clear blocked on re-claim (recommended)

In the path that transitions an issue to in-progress (in dev-poll.sh or lib/issue-lifecycle.sh), also remove the blocked label. This matches the existing semantics: starting work on an issue is an implicit statement that the prior block is resolved.

Option B — refuse re-claim while blocked

Don't let an agent claim in-progress on an issue that has blocked. Human / gardener / supervisor must clear blocked first. Safer but slower; would also have prevented PR #1046 from being opened at all.

Option C — don't short-circuit on blocked when the PR is fresh

In handle_ci_exhaustion(), if is_blocked && ci_fix_count(pr) == 0, treat the PR as eligible. Least invasive to existing flow but adds complexity to the gate.

Recommend A. It's one label-removal call, matches the verbal semantics users already carry in their heads, and works symmetrically with how humans manually unblock issues today.

Acceptance criteria

When an agent transitions an issue from any state to in-progress, the blocked label is removed atomically (if present)
After the change, reproduce #1025 / #1046-style scenario: agent claims a blocked issue, opens PR, CI fails → next dev-poll spawns dev-agent to fix (not "already blocked — skipping")
ci_fix_count counter behaves normally for the new PR (starts at 0, increments per fix attempt, exhausts at 3)
Blocked-issue fast-path still works for the case it was designed for: PR exists, CI repeatedly fails, attempts exhausted — stays blocked

Affected files

dev/dev-poll.sh — claim/re-claim path (wherever in-progress is added)
lib/issue-lifecycle.sh — issue_block / claim helpers
Possibly: any gardener logic that currently manually clears blocked

#1025 / PR #1046 — instance that surfaced this bug
#850 / multiple PR attempts — if the same pattern applies there, multi-attempt issues would also have been affected (check #859, #860, #899, #908, #941, #942, #971, #1000, #1013, #1014, #1033, #1043 — all in ci-fixes-disinto.json at counter=4)

## Problem When an issue carries the `blocked` label from a failed prior attempt, and an agent re-claims it for a fresh attempt (adding `in-progress`, removing `backlog`), the `blocked` label is **not removed**. Any new PR opened for that issue is then silently skipped by the CI-fix path in `dev-poll.sh`, even though it's a legitimate new attempt with zero fix attempts on the new PR. The agent ends up in a "open PR, never iterate on it" state — it did the work of writing the fix but can't respond to the feedback loop that would normally correct pipeline errors or test failures. ## Evidence Observed on issue #1025 / PR #1046 on 2026-04-19: ``` [16:37:45 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping [16:38:49 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping [16:39:54 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping ...repeats every poll cycle... ``` Label timeline on #1025: ``` 04:57 dev-qwen +blocked (PR #1033 CI failure) 11:07 dev-qwen2 +blocked (PR #1043 CI failure, after #1033 was closed) 15:47 dev-qwen2 +in-progress, -backlog (re-claim for PR #1046) ← blocked not cleared 15:59 PR #1046 opened 16:00 CI errors on pipeline #1408/1409 16:29 dev-qwen2 -in-progress (still blocked) ``` The `(0 attempts)` in the poll log proves this is not exhaustion — the fast-path at `dev-poll.sh:210-214` short-circuits before the counter is consulted: ```bash # Fast path: already blocked — skip without touching counter. if is_blocked "$issue_num"; then CI_FIX_ATTEMPTS=$(ci_fix_count "$pr_num") log "PR #${pr_num} (issue #${issue_num}) already blocked (${CI_FIX_ATTEMPTS} attempts) — skipping" return 0 fi ``` ## Root cause The re-claim path and the block-fast-path treat `blocked` asymmetrically: - **Re-claim** (backlog → in-progress): ignores `blocked`. Agent proceeds to write code and open a PR. - **CI-fix gate** (when poll sees an open PR with CI failure/error): checks `blocked` first and skips if set. `blocked` is semantically "this attempt is dead — wait for human/supervisor review." A re-claim is exactly that review completing ("try again"). The label should be cleared when the re-claim happens, or the re-claim should refuse to proceed while `blocked` is set. ## Fix options **Option A — clear `blocked` on re-claim (recommended)** In the path that transitions an issue to `in-progress` (in `dev-poll.sh` or `lib/issue-lifecycle.sh`), also remove the `blocked` label. This matches the existing semantics: starting work on an issue is an implicit statement that the prior block is resolved. **Option B — refuse re-claim while blocked** Don't let an agent claim `in-progress` on an issue that has `blocked`. Human / gardener / supervisor must clear `blocked` first. Safer but slower; would also have prevented PR #1046 from being opened at all. **Option C — don't short-circuit on `blocked` when the PR is fresh** In `handle_ci_exhaustion()`, if `is_blocked && ci_fix_count(pr) == 0`, treat the PR as eligible. Least invasive to existing flow but adds complexity to the gate. Recommend **A**. It's one label-removal call, matches the verbal semantics users already carry in their heads, and works symmetrically with how humans manually unblock issues today. ## Acceptance criteria - [ ] When an agent transitions an issue from any state to `in-progress`, the `blocked` label is removed atomically (if present) - [ ] After the change, reproduce #1025 / #1046-style scenario: agent claims a blocked issue, opens PR, CI fails → next dev-poll spawns `dev-agent` to fix (not "already blocked — skipping") - [ ] `ci_fix_count` counter behaves normally for the new PR (starts at 0, increments per fix attempt, exhausts at 3) - [ ] Blocked-issue fast-path still works for the case it was designed for: PR exists, CI repeatedly fails, attempts exhausted — stays blocked ## Affected files - `dev/dev-poll.sh` — claim/re-claim path (wherever `in-progress` is added) - `lib/issue-lifecycle.sh` — `issue_block` / claim helpers - Possibly: any gardener logic that currently manually clears `blocked` ## Related - #1025 / PR #1046 — instance that surfaced this bug - #850 / multiple PR attempts — if the same pattern applies there, multi-attempt issues would also have been affected (check #859, #860, #899, #908, #941, #942, #971, #1000, #1013, #1014, #1033, #1043 — all in `ci-fixes-disinto.json` at counter=4)

disinto-admin added the

bug-report

label 2026-04-19 16:47:31 +00:00

planner-bot added the

backlog

priority

labels 2026-04-19 17:02:25 +00:00

planner-bot commented

2026-04-19 17:02:26 +00:00

Collaborator

Planner run 5 (2026-04-19): Added backlog+priority. This is a core pipeline defect — the CI-fix feedback loop is broken for re-claimed issues. Blocks reliable multi-attempt issue resolution. The issue is well-specified with Option A as the recommended fix (clear blocked on re-claim).

**Planner run 5 (2026-04-19):** Added backlog+priority. This is a core pipeline defect — the CI-fix feedback loop is broken for re-claimed issues. Blocks reliable multi-attempt issue resolution. The issue is well-specified with Option A as the recommended fix (clear `blocked` on re-claim).