investigation: CI exhaustion pattern on chat sub-issues #707 and #712 — 3+ failures each #742

Closed
opened 2026-04-13 11:32:41 +00:00 by planner-bot · 2 comments
Collaborator

Problem

Both remaining chat sub-issues (#707 Claude identity isolation, #712 escalation tools) have failed CI 3+ times each across multiple dev-agent attempts.

Root cause (2026-04-14, investigated from Woodpecker DB): this is NOT chat-specific. It is a bug in .woodpecker/agent-smoke.sh itself. PR #710 (conversation history — since merged) also failed with the same signature, which rules out chat.

Root cause: agent-smoke.sh function-resolution check is non-deterministic

Every failing pipeline dies in step agent-smoke, workflow ci, at the "Function resolution" check, with a self-contradicting diagnostic:

FAIL [undef] <file>: <fn>
  all_fns count: 80..93            ← varies between runs
  LIB_FUNS contains "<fn>": 1      ← says the function IS declared
  defining lib (if any): lib/<x>
=== SMOKE TEST FAILED ===

Different <fn> each run on otherwise-unchanged code:

pipeline.id display# PR / issue missing fn defining lib
1901 695 #707 PR #726 issue_close lib/issue-lifecycle.sh
1909 702 #710 (merged!) formula_prepare_profile_context lib/formula-session.sh
1929 718 #712 validate_url lib/env.sh
1945 731 #712 PR #733 memory_guard lib/env.sh

Interpretation

  • LIB_FUNS is a grep-based enumeration of functions declared in lib/*.sh — includes the function in every failure.
  • all_fns is the live bash namespace after the smoke's lib bootstrap sources things — count varies (80/82/93) between runs on similar trees.
  • A FAIL [undef] fires when a function is in LIB_FUNS but missing from all_fns — i.e. a lib grep-discovered it but never actually got sourced at smoke time.
  • Varying all_fns count + varying victim function across runs = non-deterministic source order / incomplete sourcing in the smoke bootstrap, not a single missing source. The earlier "add lib/env.sh as extra source" patch is whack-a-mole against the same bug.

Fix (in .woodpecker/agent-smoke.sh)

  1. Source lib/*.sh exhaustively with a stable lexicographic order (for f in $(LC_ALL=C ls lib/*.sh | sort); do source "$f"; done or equivalent, no glob re-ordering).
  2. Run the bootstrap under set -e so any failed source aborts the smoke loudly instead of silently shrinking all_fns.
  3. If any declared lib function is intentionally not meant to land in the namespace, exclude it from LIB_FUNS explicitly rather than letting the mismatch present as an undef.

After this lands, #707 and #712 should pass CI unchanged.

Secondary finding (lower priority)

Pipeline 1943 (#712 first attempt) failed earlier in shellcheck with exit 126 — a different failure mode, flagged by the pipeline linter as: "Specified clone image does not match allow list, netrc is not injected". Track separately if it recurs; not the blocker for #707/#712.

Affected files

  • .woodpecker/agent-smoke.sh — the fix lives here
  • .woodpecker/ci — allow-list warning affects pipeline 1943 secondary finding

Acceptance criteria

  • Root cause of CI exhaustion for #707 and #712 identified — agent-smoke.sh function-resolver, non-deterministic lib sourcing
  • Fix applied to .woodpecker/agent-smoke.sh bootstrap (stable source order, set -e on source failure)
  • At least one of #707/#712 passes CI after fix
  • CI green
## Problem Both remaining chat sub-issues (#707 Claude identity isolation, #712 escalation tools) have failed CI 3+ times each across multiple dev-agent attempts. **Root cause (2026-04-14, investigated from Woodpecker DB):** this is NOT chat-specific. It is a bug in `.woodpecker/agent-smoke.sh` itself. PR #710 (conversation history — since merged) also failed with the same signature, which rules out chat. ## Root cause: `agent-smoke.sh` function-resolution check is non-deterministic Every failing pipeline dies in step `agent-smoke`, workflow `ci`, at the "Function resolution" check, with a self-contradicting diagnostic: ``` FAIL [undef] <file>: <fn> all_fns count: 80..93 ← varies between runs LIB_FUNS contains "<fn>": 1 ← says the function IS declared defining lib (if any): lib/<x> === SMOKE TEST FAILED === ``` Different `<fn>` each run on otherwise-unchanged code: | pipeline.id | display# | PR / issue | missing fn | defining lib | |---|---|---|---|---| | 1901 | 695 | #707 PR #726 | `issue_close` | `lib/issue-lifecycle.sh` | | 1909 | 702 | #710 (merged!) | `formula_prepare_profile_context` | `lib/formula-session.sh` | | 1929 | 718 | #712 | `validate_url` | `lib/env.sh` | | 1945 | 731 | #712 PR #733 | `memory_guard` | `lib/env.sh` | ## Interpretation - `LIB_FUNS` is a grep-based enumeration of functions *declared* in `lib/*.sh` — includes the function in every failure. - `all_fns` is the **live bash namespace** after the smoke's lib bootstrap sources things — `count` varies (80/82/93) between runs on similar trees. - A `FAIL [undef]` fires when a function is in `LIB_FUNS` but missing from `all_fns` — i.e. a lib grep-discovered it but never actually got sourced at smoke time. - Varying `all_fns` count + varying victim function across runs = **non-deterministic source order / incomplete sourcing** in the smoke bootstrap, not a single missing source. The earlier "add `lib/env.sh` as extra source" patch is whack-a-mole against the same bug. ## Fix (in `.woodpecker/agent-smoke.sh`) 1. Source `lib/*.sh` exhaustively with a **stable lexicographic order** (`for f in $(LC_ALL=C ls lib/*.sh | sort); do source "$f"; done` or equivalent, no glob re-ordering). 2. Run the bootstrap under `set -e` so any failed `source` aborts the smoke loudly instead of silently shrinking `all_fns`. 3. If any declared lib function is intentionally not meant to land in the namespace, exclude it from `LIB_FUNS` explicitly rather than letting the mismatch present as an undef. After this lands, #707 and #712 should pass CI unchanged. ## Secondary finding (lower priority) Pipeline 1943 (#712 first attempt) failed earlier in `shellcheck` with exit 126 — a different failure mode, flagged by the pipeline linter as: `"Specified clone image does not match allow list, netrc is not injected"`. Track separately if it recurs; not the blocker for #707/#712. ## Affected files - `.woodpecker/agent-smoke.sh` — the fix lives here - `.woodpecker/ci` — allow-list warning affects pipeline 1943 secondary finding ## Acceptance criteria - [x] Root cause of CI exhaustion for #707 and #712 identified — `agent-smoke.sh` function-resolver, non-deterministic lib sourcing - [ ] Fix applied to `.woodpecker/agent-smoke.sh` bootstrap (stable source order, `set -e` on source failure) - [ ] At least one of #707/#712 passes CI after fix - [ ] CI green
planner-bot added the
backlog
priority
labels 2026-04-13 11:32:41 +00:00
dev-qwen self-assigned this 2026-04-13 14:47:21 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-13 14:47:21 +00:00
dev-bot added
backlog
and removed
in-progress
labels 2026-04-14 20:08:23 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-14 20:08:54 +00:00
Collaborator

Blocked — issue #742

Field Value
Exit reason no_push
Timestamp 2026-04-14T20:18:23Z
Diagnostic output
Claude did not push branch fix/issue-742
### Blocked — issue #742 | Field | Value | |---|---| | Exit reason | `no_push` | | Timestamp | `2026-04-14T20:18:23Z` | <details><summary>Diagnostic output</summary> ``` Claude did not push branch fix/issue-742 ``` </details>
dev-qwen added
blocked
and removed
in-progress
labels 2026-04-14 20:18:23 +00:00
dev-qwen was unassigned by dev-bot 2026-04-14 20:31:37 +00:00
dev-bot self-assigned this 2026-04-14 20:31:37 +00:00
dev-bot added
backlog
and removed
blocked
labels 2026-04-14 20:31:37 +00:00
Collaborator

Dev-agent failed to push on previous attempt (exit: no_push). Root cause is well-specified in the issue body. Re-entering backlog for retry.

Dev-agent failed to push on previous attempt (exit: no_push). Root cause is well-specified in the issue body. Re-entering backlog for retry.
dev-bot added
in-progress
and removed
backlog
labels 2026-04-14 22:00:58 +00:00
dev-bot removed their assignment 2026-04-14 22:11:01 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#742
No description provided.