bug: dev-poll skips CI-fix on re-claimed issues — blocked label not cleared on re-claim, starves new PRs at 0 attempts #1047

Closed
opened 2026-04-19 16:47:31 +00:00 by disinto-admin · 1 comment

Problem

When an issue carries the blocked label from a failed prior attempt, and an agent re-claims it for a fresh attempt (adding in-progress, removing backlog), the blocked label is not removed. Any new PR opened for that issue is then silently skipped by the CI-fix path in dev-poll.sh, even though it's a legitimate new attempt with zero fix attempts on the new PR.

The agent ends up in a "open PR, never iterate on it" state — it did the work of writing the fix but can't respond to the feedback loop that would normally correct pipeline errors or test failures.

Evidence

Observed on issue #1025 / PR #1046 on 2026-04-19:

[16:37:45 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping
[16:38:49 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping
[16:39:54 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping
...repeats every poll cycle...

Label timeline on #1025:

04:57  dev-qwen      +blocked                       (PR #1033 CI failure)
11:07  dev-qwen2     +blocked                       (PR #1043 CI failure, after #1033 was closed)
15:47  dev-qwen2     +in-progress, -backlog         (re-claim for PR #1046)  ← blocked not cleared
15:59  PR #1046 opened
16:00  CI errors on pipeline #1408/1409
16:29  dev-qwen2     -in-progress                   (still blocked)

The (0 attempts) in the poll log proves this is not exhaustion — the fast-path at dev-poll.sh:210-214 short-circuits before the counter is consulted:

# Fast path: already blocked — skip without touching counter.
if is_blocked "$issue_num"; then
    CI_FIX_ATTEMPTS=$(ci_fix_count "$pr_num")
    log "PR #${pr_num} (issue #${issue_num}) already blocked (${CI_FIX_ATTEMPTS} attempts) — skipping"
    return 0
fi

Root cause

The re-claim path and the block-fast-path treat blocked asymmetrically:

  • Re-claim (backlog → in-progress): ignores blocked. Agent proceeds to write code and open a PR.
  • CI-fix gate (when poll sees an open PR with CI failure/error): checks blocked first and skips if set.

blocked is semantically "this attempt is dead — wait for human/supervisor review." A re-claim is exactly that review completing ("try again"). The label should be cleared when the re-claim happens, or the re-claim should refuse to proceed while blocked is set.

Fix options

Option A — clear blocked on re-claim (recommended)

In the path that transitions an issue to in-progress (in dev-poll.sh or lib/issue-lifecycle.sh), also remove the blocked label. This matches the existing semantics: starting work on an issue is an implicit statement that the prior block is resolved.

Option B — refuse re-claim while blocked

Don't let an agent claim in-progress on an issue that has blocked. Human / gardener / supervisor must clear blocked first. Safer but slower; would also have prevented PR #1046 from being opened at all.

Option C — don't short-circuit on blocked when the PR is fresh

In handle_ci_exhaustion(), if is_blocked && ci_fix_count(pr) == 0, treat the PR as eligible. Least invasive to existing flow but adds complexity to the gate.

Recommend A. It's one label-removal call, matches the verbal semantics users already carry in their heads, and works symmetrically with how humans manually unblock issues today.

Acceptance criteria

  • When an agent transitions an issue from any state to in-progress, the blocked label is removed atomically (if present)
  • After the change, reproduce #1025 / #1046-style scenario: agent claims a blocked issue, opens PR, CI fails → next dev-poll spawns dev-agent to fix (not "already blocked — skipping")
  • ci_fix_count counter behaves normally for the new PR (starts at 0, increments per fix attempt, exhausts at 3)
  • Blocked-issue fast-path still works for the case it was designed for: PR exists, CI repeatedly fails, attempts exhausted — stays blocked

Affected files

  • dev/dev-poll.sh — claim/re-claim path (wherever in-progress is added)
  • lib/issue-lifecycle.shissue_block / claim helpers
  • Possibly: any gardener logic that currently manually clears blocked
## Problem When an issue carries the `blocked` label from a failed prior attempt, and an agent re-claims it for a fresh attempt (adding `in-progress`, removing `backlog`), the `blocked` label is **not removed**. Any new PR opened for that issue is then silently skipped by the CI-fix path in `dev-poll.sh`, even though it's a legitimate new attempt with zero fix attempts on the new PR. The agent ends up in a "open PR, never iterate on it" state — it did the work of writing the fix but can't respond to the feedback loop that would normally correct pipeline errors or test failures. ## Evidence Observed on issue #1025 / PR #1046 on 2026-04-19: ``` [16:37:45 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping [16:38:49 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping [16:39:54 UTC] poll: PR #1046 (issue #1025) already blocked (0 attempts) — skipping ...repeats every poll cycle... ``` Label timeline on #1025: ``` 04:57 dev-qwen +blocked (PR #1033 CI failure) 11:07 dev-qwen2 +blocked (PR #1043 CI failure, after #1033 was closed) 15:47 dev-qwen2 +in-progress, -backlog (re-claim for PR #1046) ← blocked not cleared 15:59 PR #1046 opened 16:00 CI errors on pipeline #1408/1409 16:29 dev-qwen2 -in-progress (still blocked) ``` The `(0 attempts)` in the poll log proves this is not exhaustion — the fast-path at `dev-poll.sh:210-214` short-circuits before the counter is consulted: ```bash # Fast path: already blocked — skip without touching counter. if is_blocked "$issue_num"; then CI_FIX_ATTEMPTS=$(ci_fix_count "$pr_num") log "PR #${pr_num} (issue #${issue_num}) already blocked (${CI_FIX_ATTEMPTS} attempts) — skipping" return 0 fi ``` ## Root cause The re-claim path and the block-fast-path treat `blocked` asymmetrically: - **Re-claim** (backlog → in-progress): ignores `blocked`. Agent proceeds to write code and open a PR. - **CI-fix gate** (when poll sees an open PR with CI failure/error): checks `blocked` first and skips if set. `blocked` is semantically "this attempt is dead — wait for human/supervisor review." A re-claim is exactly that review completing ("try again"). The label should be cleared when the re-claim happens, or the re-claim should refuse to proceed while `blocked` is set. ## Fix options **Option A — clear `blocked` on re-claim (recommended)** In the path that transitions an issue to `in-progress` (in `dev-poll.sh` or `lib/issue-lifecycle.sh`), also remove the `blocked` label. This matches the existing semantics: starting work on an issue is an implicit statement that the prior block is resolved. **Option B — refuse re-claim while blocked** Don't let an agent claim `in-progress` on an issue that has `blocked`. Human / gardener / supervisor must clear `blocked` first. Safer but slower; would also have prevented PR #1046 from being opened at all. **Option C — don't short-circuit on `blocked` when the PR is fresh** In `handle_ci_exhaustion()`, if `is_blocked && ci_fix_count(pr) == 0`, treat the PR as eligible. Least invasive to existing flow but adds complexity to the gate. Recommend **A**. It's one label-removal call, matches the verbal semantics users already carry in their heads, and works symmetrically with how humans manually unblock issues today. ## Acceptance criteria - [ ] When an agent transitions an issue from any state to `in-progress`, the `blocked` label is removed atomically (if present) - [ ] After the change, reproduce #1025 / #1046-style scenario: agent claims a blocked issue, opens PR, CI fails → next dev-poll spawns `dev-agent` to fix (not "already blocked — skipping") - [ ] `ci_fix_count` counter behaves normally for the new PR (starts at 0, increments per fix attempt, exhausts at 3) - [ ] Blocked-issue fast-path still works for the case it was designed for: PR exists, CI repeatedly fails, attempts exhausted — stays blocked ## Affected files - `dev/dev-poll.sh` — claim/re-claim path (wherever `in-progress` is added) - `lib/issue-lifecycle.sh` — `issue_block` / claim helpers - Possibly: any gardener logic that currently manually clears `blocked` ## Related - #1025 / PR #1046 — instance that surfaced this bug - #850 / multiple PR attempts — if the same pattern applies there, multi-attempt issues would also have been affected (check #859, #860, #899, #908, #941, #942, #971, #1000, #1013, #1014, #1033, #1043 — all in `ci-fixes-disinto.json` at counter=4)
disinto-admin added the
bug-report
label 2026-04-19 16:47:31 +00:00
planner-bot added the
backlog
priority
labels 2026-04-19 17:02:25 +00:00
Collaborator

Planner run 5 (2026-04-19): Added backlog+priority. This is a core pipeline defect — the CI-fix feedback loop is broken for re-claimed issues. Blocks reliable multi-attempt issue resolution. The issue is well-specified with Option A as the recommended fix (clear blocked on re-claim).

**Planner run 5 (2026-04-19):** Added backlog+priority. This is a core pipeline defect — the CI-fix feedback loop is broken for re-claimed issues. Blocks reliable multi-attempt issue resolution. The issue is well-specified with Option A as the recommended fix (clear `blocked` on re-claim).
dev-qwen2 self-assigned this 2026-04-19 17:02:37 +00:00
dev-qwen2 added
in-progress
and removed
backlog
labels 2026-04-19 17:02:38 +00:00
dev-qwen2 removed their assignment 2026-04-19 17:11:08 +00:00
dev-qwen2 removed the
in-progress
label 2026-04-19 17:11:09 +00:00
gardener-bot added the
backlog
label 2026-04-19 17:15:58 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#1047
No description provided.