CI: edge-subpath/caddy-validate step times out on docker.sock (context deadline exceeded) #1124
Labels
No labels
action
backlog
blocked
bug-report
cannot-reproduce
in-progress
in-triage
needs-triage
prediction/actioned
prediction/dismissed
prediction/unreviewed
priority
rejected
reproduced
tech-debt
underspecified
vision
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: disinto-admin/disinto#1124
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
The
caddy-validatestep in theedge-subpathworkflow fails intermittently with:Exit code on the step:
126. Downstream steps (caddyfile-routing-test,test-caddyfile-routing, etc.) get skipped, and the workflow reportsfailure.This showed up on PR #1108 (gardener housekeeping, commit
0946ca9828, pipeline 1597, workflow id 3470, step pid 12). Also pending-forever on the sibling workflows for PR #1112 (pipeline 1599) and PR #1113 (pipeline 1601) — not confirmed to be the same failure mode on those two, but the delay pattern matches (edge-subpath sits pending long afterci,nomad-validate,secret-scancomplete). Worth verifying on a fresh run.The
edge-subpathworkflow is not in the required-status-contexts list (branch protection requiresci/woodpecker/pr/ciandci/woodpecker/push/cionly), so this doesn't block merge by itself. But it does leave combined commit status atfailure/pendingand forcesforce_merge: trueon any admin-merge path — plus the reviewer-agent almost certainly gates on combined status, so every legitimate review flow stalls here.Reproduction
Happens under load when multiple pipelines queue up. Triggered every time while the WP agent was wedged on 2026-04-20 — dozens of
wait(): code: DeadlineExceededlines indocker logs disinto-woodpecker-agentbetween 11:17Z and 13:10Z. After restarting the agent, pipeline 1597 completed except for this step.Likely cause
The step mounts the host's
/var/run/docker.sockinside the workflow container (typical pattern when a CI step needs to spin up sibling containers — caddy-validate presumably spawns a Caddy container to render/validate the Caddyfile). TheGet /v1.41/containers/.../jsoncall is Docker-in-Docker introspection and it times out.Candidates:
GET containerrequests exceed the default deadline.context.WithTimeoutthat doesn't account for a busy Docker daemon.Proposal
Pick one based on what the step actually does:
caddy validateis a single binary call, no container-orchestration needed for a linter-style check.when.failure: ignoreor move edge-subpath into a separate optional pipeline so it stops showing up as combined-statusfailureon otherwise-green PRs.Acceptance
ciworkflow also produces a green (or explicitly-optional)edge-subpathresult, with nocontext deadline exceededin the step logs over ten consecutive runs.Context
Observed 2026-04-21 during triage of why PRs were backing up in queue. WP agent restart drained the queue for most workflows; this one step remained stuck or timing out. The merged commit for #1108 shipped with this check in
failure.header_upat handle-block top level — Caddy rejects config (#1117) #1125disinto init --with edgeauto-addschatdep but chat Dockerfile was deleted in PR #1085 (#1115) #1122service "forgejo"(Consul) but factory runs Nomad native service discovery (#1114) #1118disinto validaterejects CI steps with no timeout declared #1137