vision: CI provider adapter + signal classifier — decouple agents from forge-specific CI #1138

Open
opened 2026-04-21 16:38:02 +00:00 by dev-bot · 1 comment
Collaborator

Architectural response to the 2026-04-21 CI chaos-monkey cascade (Codeberg #843). Long-horizon; filed as vision, not a near-term task.

Motivation

dev-poll, review-poll, supervisor, and the gardener all reach directly into Woodpecker internals (SQLite at /var/lib/woodpecker/woodpecker.sqlite, REST at /api/repos/2/pipelines/...) and into Forgejo's combined commit status endpoint. This is fine while disinto targets only Woodpecker+Forgejo on a single box. It will not generalize.

Disinto's stated direction is to bootstrap factories for arbitrary projects — mobile (Xcode + emulator CI), web (Playwright + Vite), crypto (Anvil, Hardhat fork, ZK provers), data (Spark, Airflow). Each domain has radically different:

  • Backend providers (GitHub Actions, GitLab CI, Buildkite, CircleCI, self-hosted Woodpecker)
  • Flake signatures (emulator boot, Playwright browser download, Anvil port collision, IPFS pin retry, npm registry rate-limit)
  • Timeout profiles (5s for a lint, 20min for an iOS build)
  • Recovery primitives (restart agent, clear cache, rotate runner, retrigger)

We cannot hardcode "docker restart disinto-woodpecker-agent" as the universal recovery action.

Proposal

Two separable pieces.

Part A — CI Provider adapter

A narrow interface in lib/ci/ that every consumer calls:

ci_get_required_contexts(repo, branch) → [ctx, ...]
ci_get_pipeline(pr)                    → {state, workflows[], started, finished, ...}
ci_get_step_log(pipeline, step_id)     → text (per-step, NOT combined)
ci_get_provider_health()               → healthy|degraded|down
ci_retrigger(pipeline)                 → ok|denied
ci_cancel(pipeline)                    → ok|denied

One implementation per backend: lib/ci/woodpecker.sh, lib/ci/gh-actions.sh, lib/ci/gitlab.sh. Project declares its backend in projects/<name>.toml:

[ci]
provider = "woodpecker"  # or gh-actions, gitlab, buildkite
endpoint = "http://10.10.10.67:8000"
credentials_ref = "forge_token"

All dev-poll / reviewer / supervisor / gardener code reads only through these helpers. No direct SQLite, no direct Forgejo combined-status calls from decision paths.

Part B — Signal classifier

Between the adapter and every consumer, a classifier function:

classify(pipeline, step_logs, provider_health, fingerprints) → verdict

verdict ∈ {
  green,
  red-code,          # real test/code failure
  red-test-harness,  # test setup wrong, code under test is fine
  red-ci-config,     # YAML error, allow-list rejection — pre-step
  red-infra-flake,   # fast-fail + provider-unhealth correlation
  red-provider-down, # CI backend unreachable
  unknown            # fallthrough → LLM triage
}

Inputs to the classifier:

  1. Step exit code with annotations — 126 ("permission denied or not executable"), 127 ("command not found"), 128 ("invalid exit / signal+128"), 124 ("timeout"). See #1050 for the shape.
  2. Duration heuristic — failures with duration <60s are flake candidates (see #867 diagnosis).
  3. Provider health probe — from ci_get_provider_health().
  4. Fingerprint match.disinto/ci-flakes.yml per-project allowlist of known flaky patterns. A crypto project declares "anvil port 8545 in use"; a mobile project declares "no provisioning profile found". Shared patterns (network timeouts, DNS failures) live in a shared library file.
  5. LLM triage fallback — for unknown cases, a small prompt takes the per-step log tail + PR diff and returns one of {code, test-harness, ci-config, flake}. This is the #894 triage pattern, promoted from supervisor sweeper to first-class service.

Agents never see raw pipeline state. They subscribe to verdicts. CI-fix prompts (#1050/#1051) receive classified context rather than a combined log tail.

Non-goals

  • Writing all adapters up front. Ship the Woodpecker adapter as the proving ground; write gh-actions second (on a non-crypto project, to avoid coupling debugging to a domain that has its own flake complexity).
  • Rewriting history. Existing scripts get migrated consumer-by-consumer behind a feature flag USE_CI_ADAPTER=1.
  • Replacing Woodpecker. Adapter is a seam, not a migration.

Why this is vision and not backlog

  • Scope touches every polling agent and both long-running bash supervisor loops.
  • Requires a migration plan (feature flag, canary project, rollback).
  • Classifier needs training/tuning data — accumulate fingerprints from real incidents over weeks, not a single PR.
  • Interface shape will evolve once adapter #2 lands; premature freezing would ossify Woodpecker-isms.

Incremental steps when work begins

  1. Extract existing Woodpecker calls into lib/ci/woodpecker.sh behind the interface — pure refactor, no consumer changes.
  2. Port dev-poll to consume the helpers instead of raw calls.
  3. Port reviewer, then supervisor, then gardener. One PR per consumer.
  4. Introduce classifier with verdict={green, red, unknown} only. No sub-classes yet.
  5. Incrementally add red-sub-classes as fingerprints accumulate from real incidents.
  6. Second adapter (gh-actions). Shake out Woodpecker-isms that leaked through.
  • Codeberg #843 — incident writeup motivating the rethink
  • #867 — infra-flake detection heuristic (input to classifier)
  • #894 — supervisor sweeper with LLM triage (prior-art for the classifier)
  • #1044 — step log truncation (blocks classifier; the classifier assumes logs exist)
  • #1050 / #1051 — per-workflow/per-step diagnostics (the prompt shape the classifier feeds)
  • #1124 / #1131 — concrete flake the classifier should one day catch without humans patching each step
Architectural response to the 2026-04-21 CI chaos-monkey cascade (Codeberg #843). Long-horizon; filed as vision, not a near-term task. ## Motivation dev-poll, review-poll, supervisor, and the gardener all reach directly into Woodpecker internals (SQLite at `/var/lib/woodpecker/woodpecker.sqlite`, REST at `/api/repos/2/pipelines/...`) and into Forgejo's combined commit status endpoint. This is fine while disinto targets only Woodpecker+Forgejo on a single box. It will not generalize. Disinto's stated direction is to bootstrap factories for arbitrary projects — **mobile** (Xcode + emulator CI), **web** (Playwright + Vite), **crypto** (Anvil, Hardhat fork, ZK provers), **data** (Spark, Airflow). Each domain has radically different: - Backend providers (GitHub Actions, GitLab CI, Buildkite, CircleCI, self-hosted Woodpecker) - Flake signatures (emulator boot, Playwright browser download, Anvil port collision, IPFS pin retry, npm registry rate-limit) - Timeout profiles (5s for a lint, 20min for an iOS build) - Recovery primitives (restart agent, clear cache, rotate runner, retrigger) We cannot hardcode "`docker restart disinto-woodpecker-agent`" as the universal recovery action. ## Proposal Two separable pieces. ### Part A — CI Provider adapter A narrow interface in `lib/ci/` that every consumer calls: ``` ci_get_required_contexts(repo, branch) → [ctx, ...] ci_get_pipeline(pr) → {state, workflows[], started, finished, ...} ci_get_step_log(pipeline, step_id) → text (per-step, NOT combined) ci_get_provider_health() → healthy|degraded|down ci_retrigger(pipeline) → ok|denied ci_cancel(pipeline) → ok|denied ``` One implementation per backend: `lib/ci/woodpecker.sh`, `lib/ci/gh-actions.sh`, `lib/ci/gitlab.sh`. Project declares its backend in `projects/<name>.toml`: ```toml [ci] provider = "woodpecker" # or gh-actions, gitlab, buildkite endpoint = "http://10.10.10.67:8000" credentials_ref = "forge_token" ``` All dev-poll / reviewer / supervisor / gardener code reads **only** through these helpers. No direct SQLite, no direct Forgejo combined-status calls from decision paths. ### Part B — Signal classifier Between the adapter and every consumer, a classifier function: ``` classify(pipeline, step_logs, provider_health, fingerprints) → verdict verdict ∈ { green, red-code, # real test/code failure red-test-harness, # test setup wrong, code under test is fine red-ci-config, # YAML error, allow-list rejection — pre-step red-infra-flake, # fast-fail + provider-unhealth correlation red-provider-down, # CI backend unreachable unknown # fallthrough → LLM triage } ``` Inputs to the classifier: 1. **Step exit code with annotations** — 126 ("permission denied or not executable"), 127 ("command not found"), 128 ("invalid exit / signal+128"), 124 ("timeout"). See #1050 for the shape. 2. **Duration heuristic** — failures with duration <60s are flake candidates (see #867 diagnosis). 3. **Provider health probe** — from `ci_get_provider_health()`. 4. **Fingerprint match** — `.disinto/ci-flakes.yml` per-project allowlist of known flaky patterns. A crypto project declares `"anvil port 8545 in use"`; a mobile project declares `"no provisioning profile found"`. Shared patterns (network timeouts, DNS failures) live in a shared library file. 5. **LLM triage fallback** — for `unknown` cases, a small prompt takes the per-step log tail + PR diff and returns one of {code, test-harness, ci-config, flake}. This is the #894 triage pattern, promoted from supervisor sweeper to first-class service. Agents never see raw pipeline state. They subscribe to verdicts. CI-fix prompts (#1050/#1051) receive classified context rather than a combined log tail. ## Non-goals - Writing all adapters up front. Ship the Woodpecker adapter as the proving ground; write gh-actions second (on a non-crypto project, to avoid coupling debugging to a domain that has its own flake complexity). - Rewriting history. Existing scripts get migrated consumer-by-consumer behind a feature flag `USE_CI_ADAPTER=1`. - Replacing Woodpecker. Adapter is a seam, not a migration. ## Why this is vision and not backlog - Scope touches every polling agent and both long-running bash supervisor loops. - Requires a migration plan (feature flag, canary project, rollback). - Classifier needs training/tuning data — accumulate fingerprints from real incidents over weeks, not a single PR. - Interface shape will evolve once adapter #2 lands; premature freezing would ossify Woodpecker-isms. ## Incremental steps when work begins 1. Extract existing Woodpecker calls into `lib/ci/woodpecker.sh` behind the interface — **pure refactor**, no consumer changes. 2. Port dev-poll to consume the helpers instead of raw calls. 3. Port reviewer, then supervisor, then gardener. One PR per consumer. 4. Introduce classifier with verdict={green, red, unknown} only. No sub-classes yet. 5. Incrementally add red-sub-classes as fingerprints accumulate from real incidents. 6. Second adapter (gh-actions). Shake out Woodpecker-isms that leaked through. ## Related - Codeberg #843 — incident writeup motivating the rethink - #867 — infra-flake detection heuristic (input to classifier) - #894 — supervisor sweeper with LLM triage (prior-art for the classifier) - #1044 — step log truncation (blocks classifier; the classifier assumes logs exist) - #1050 / #1051 — per-workflow/per-step diagnostics (the prompt shape the classifier feeds) - #1124 / #1131 — concrete flake the classifier should one day catch without humans patching each step
dev-bot added the
vision
label 2026-04-21 16:38:02 +00:00
Author
Collaborator

Scope-compression note (2026-04-21 vision-pass review):

This vision is long-arc — it pays off when disinto targets a second CI provider (GitHub Actions, GitLab, Buildkite). Today disinto targets only Woodpecker+Forgejo on a single host, so the adapter's generalization benefit is hypothetical.

Explicit non-blocker statement: this issue does not block #1139 (circuit breaker) or #894 (supervisor stuck-PR sweeper). Both can ship as Woodpecker-specific implementations now and migrate to this adapter later. The circuit breaker's v1 threshold evaluator reads Woodpecker directly; the stuck-PR sweeper's triage prompt takes raw step logs. Neither needs the adapter interface to exist.

Suggested policy: leave this issue parked as vision indefinitely. Escalate only when one of:

  1. A concrete second-CI-provider project is about to onboard, OR
  2. Woodpecker-specific assumptions in dev-poll / reviewer / supervisor become a measurable bottleneck to solving a separate fragility problem.

Until then, prefer inlining Woodpecker-specific calls over anticipatory abstraction. Premature abstraction is itself a fragility source (harder to modify, harder to debug, harder for dev-agents to pattern-match on).

No change requested in scope or acceptance criteria — just clarifying that this vision shouldn't be read as a precondition for the near-term resilience work.

Scope-compression note (2026-04-21 vision-pass review): This vision is long-arc — it pays off when disinto targets a second CI provider (GitHub Actions, GitLab, Buildkite). Today disinto targets only Woodpecker+Forgejo on a single host, so the adapter's generalization benefit is hypothetical. **Explicit non-blocker statement:** this issue does not block #1139 (circuit breaker) or #894 (supervisor stuck-PR sweeper). Both can ship as Woodpecker-specific implementations now and migrate to this adapter later. The circuit breaker's v1 threshold evaluator reads Woodpecker directly; the stuck-PR sweeper's triage prompt takes raw step logs. Neither needs the adapter interface to exist. **Suggested policy:** leave this issue parked as `vision` indefinitely. Escalate only when one of: 1. A concrete second-CI-provider project is about to onboard, OR 2. Woodpecker-specific assumptions in `dev-poll` / `reviewer` / `supervisor` become a measurable bottleneck to solving a separate fragility problem. Until then, prefer inlining Woodpecker-specific calls over anticipatory abstraction. Premature abstraction is itself a fragility source (harder to modify, harder to debug, harder for dev-agents to pattern-match on). No change requested in scope or acceptance criteria — just clarifying that this vision shouldn't be read as a precondition for the near-term resilience work.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#1138
No description provided.