feat: per-workflow/per-step CI diagnostics in agent fix prompts (implements #1050) #1051

New issue

Closed

opened 2026-04-19 18:30:23 +00:00 by disinto-admin · 1 comment

disinto-admin commented

2026-04-19 18:30:23 +00:00

Owner

Goal

Implement per-workflow/per-step CI-failure diagnostics in the agent's CI-fix prompt builder, so agents can diagnose and fix multi-workflow failures instead of burning their fix budget on the wrong thing.

Fixes the gap documented in bug report #1050. Full diagnosis, root-cause analysis, and fix sketch live there.

The change (summary — see #1050 for evidence and rationale)

In lib/pr-lifecycle.sh CI-failure block (~line 431-459):

Walk pipeline.workflows[] — for each workflow with state == "failure", collect its failed children[] (the actual failing step).
For each failed step, fetch GET /api/repos/{id}/logs/{pipeline_num}/{step_id} via woodpecker_api and tail 50 lines from just that step (not the pipeline-combined stream).
Build the agent prompt with one section per failed workflow containing: workflow name, step name, exit code, annotated exit-code meaning (126/127/128 standard meanings), and the per-step tail.
Emit a "Passing workflows (do not modify): " line so the agent doesn't waste context on healthy workflows.
Add a ci_get_step_logs <pipeline_num> <step_id> helper in lib/ci-helpers.sh that does step-scoped log fetch (mirroring the existing ci_get_logs pattern).

Optional follow-up (separate issue if substantial): .disinto/ci-flakes.yml allowlist for known-flaky workflow:step pairs so the agent skips them in the "must fix" section. Current known flake: smoke-init:smoke-init (mock-Forgejo branch-index retry exhausts on PR builds — see evidence in #1050).

Acceptance criteria

On a PR with 3 failing workflows (reproduce with fixture from PR #1046's pipeline #1423 if still queryable; otherwise synthesize), the generated CI-fix prompt contains three distinct sections — one per failed workflow — each with workflow name, step name, exit code, annotated meaning, and step-local log tail.
Prompt includes a "Passing workflows (do not modify): X, Y" line when at least one workflow in the pipeline passed.
Exit codes 126 ("permission denied or not executable"), 127 ("command not found"), 128 ("invalid exit argument / signal+128") are annotated inline.
ci_get_step_logs <pipeline_num> <step_id> helper added to lib/ci-helpers.sh with doc comment matching the style of ci_get_logs.
No regression on single-workflow failures — prompt output for a single failed workflow is at least as informative as today's.
shellcheck clean.
Existing dev-agent loop still works end-to-end — spawn a test scenario (can be a fixture PR with a deliberately failing step) and confirm the agent gets the new prompt.

Affected files

lib/pr-lifecycle.sh — CI-failure prompt builder (~lines 431-459)
lib/ci-helpers.sh — add ci_get_step_logs helper
Possibly: lib/ci-helpers.sh:ci_commit_status — no change expected; mentioned so the agent doesn't trip over the error→failure mapping and think it needs changing

#1050 — bug report with full diagnosis and motivating evidence
#1044 — server-side step-log truncation (complementary; this issue assumes logs exist, #1044 ensures they exist)
#1025 — concrete blocked issue waiting on this fix

## Goal Implement per-workflow/per-step CI-failure diagnostics in the agent's CI-fix prompt builder, so agents can diagnose and fix multi-workflow failures instead of burning their fix budget on the wrong thing. Fixes the gap documented in bug report **#1050**. Full diagnosis, root-cause analysis, and fix sketch live there. ## The change (summary — see #1050 for evidence and rationale) In `lib/pr-lifecycle.sh` CI-failure block (~line 431-459): 1. Walk `pipeline.workflows[]` — for each workflow with `state == "failure"`, collect its failed `children[]` (the actual failing step). 2. For each failed step, fetch `GET /api/repos/{id}/logs/{pipeline_num}/{step_id}` via `woodpecker_api` and tail 50 lines from **just that step** (not the pipeline-combined stream). 3. Build the agent prompt with one section per failed workflow containing: workflow name, step name, exit code, annotated exit-code meaning (126/127/128 standard meanings), and the per-step tail. 4. Emit a "Passing workflows (do not modify): <list>" line so the agent doesn't waste context on healthy workflows. 5. Add a `ci_get_step_logs <pipeline_num> <step_id>` helper in `lib/ci-helpers.sh` that does step-scoped log fetch (mirroring the existing `ci_get_logs` pattern). Optional follow-up (separate issue if substantial): `.disinto/ci-flakes.yml` allowlist for known-flaky `workflow:step` pairs so the agent skips them in the "must fix" section. Current known flake: `smoke-init:smoke-init` (mock-Forgejo branch-index retry exhausts on PR builds — see evidence in #1050). ## Acceptance criteria - [ ] On a PR with 3 failing workflows (reproduce with fixture from PR #1046's pipeline #1423 if still queryable; otherwise synthesize), the generated CI-fix prompt contains three distinct sections — one per failed workflow — each with workflow name, step name, exit code, annotated meaning, and step-local log tail. - [ ] Prompt includes a "Passing workflows (do not modify): X, Y" line when at least one workflow in the pipeline passed. - [ ] Exit codes 126 ("permission denied or not executable"), 127 ("command not found"), 128 ("invalid exit argument / signal+128") are annotated inline. - [ ] `ci_get_step_logs <pipeline_num> <step_id>` helper added to `lib/ci-helpers.sh` with doc comment matching the style of `ci_get_logs`. - [ ] No regression on single-workflow failures — prompt output for a single failed workflow is at least as informative as today's. - [ ] `shellcheck` clean. - [ ] Existing dev-agent loop still works end-to-end — spawn a test scenario (can be a fixture PR with a deliberately failing step) and confirm the agent gets the new prompt. ## Affected files - `lib/pr-lifecycle.sh` — CI-failure prompt builder (~lines 431-459) - `lib/ci-helpers.sh` — add `ci_get_step_logs` helper - Possibly: `lib/ci-helpers.sh:ci_commit_status` — no change expected; mentioned so the agent doesn't trip over the error→failure mapping and think it needs changing ## Related - #1050 — bug report with full diagnosis and motivating evidence - #1044 — server-side step-log truncation (complementary; this issue assumes logs exist, #1044 ensures they exist) - #1025 — concrete blocked issue waiting on this fix

disinto-admin added the

backlog

label 2026-04-19 18:30:23 +00:00

disinto-admin referenced this issue

2026-04-19 18:30:32 +00:00

vision(#623): end-to-end subpath routing smoke test for Forgejo + Woodpecker + chat #1025

dev-bot self-assigned this 2026-04-19 18:30:33 +00:00

dev-bot added

in-progress

and removed

backlog

labels 2026-04-19 18:30:33 +00:00

dev-bot referenced this issue from a commit

2026-04-19 18:33:58 +00:00

fix: feat: per-workflow/per-step CI diagnostics in agent fix prompts (implements #1050) (#1051)

dev-bot referenced this issue from a pull request that will close it,

2026-04-19 18:34:15 +00:00

fix: feat: per-workflow/per-step CI diagnostics in agent fix prompts (implements #1050) (#1051) #1052

dev-qwen2 commented

2026-04-19 18:39:49 +00:00

Collaborator

Blocked — issue #1051

Field	Value
Exit reason	`ci_exhausted_poll (3 attempts, PR #1052)`
Timestamp	`2026-04-19T18:39:49Z`

### Blocked — issue #1051 | Field | Value | |---|---| | Exit reason | `ci_exhausted_poll (3 attempts, PR #1052)` | | Timestamp | `2026-04-19T18:39:49Z` |