fix: cap caddy-validate step timeout so it fails fast instead of hanging the whole queue (implements #1124) #1127
Labels
No labels
action
backlog
blocked
bug-report
cannot-reproduce
in-progress
in-triage
needs-triage
prediction/actioned
prediction/dismissed
prediction/unreviewed
priority
rejected
reproduced
tech-debt
underspecified
vision
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: disinto-admin/disinto#1127
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Implements fix for #1124 — the caddy-validate step in
.woodpecker/edge-subpath.ymlcurrently doescurl -sS -o /tmp/caddy https://caddyserver.com/api/download?os=linux&arch=amd64with no timeout. When that curl hangs (observed repeatedly 2026-04-21), the step runs until the Woodpecker agent's own ~1h deadline kicks in. With agent capacity=1 parallel pipeline, this blocks every other workflow in the queue.Minimum-scope change (hotpatch)
In
.woodpecker/edge-subpath.yml, change thecaddy-validatestep's curl to:--max-time 60caps total wall time at 60s--connect-timeout 10fails fast on network unreachable-fmakes HTTP errors non-zero (currently-sSwould swallow 5xx)-Lfollows redirects (future-proofing)If the download fails, the step exits non-zero within ~60s and the workflow is marked failed. The queue moves on instead of waiting an hour.
Better follow-on (separate issue, low-priority)
Cache the caddy binary in a base image instead of downloading on every run. The current "download caddy on every CI run" pattern is fragile (external dep) and wasteful (~40MB per run × every PR). Build a small
disinto/caddy-validate:latestimage indocker/with caddy pre-installed, push to the internal registry, reference it from the step. Track separately — not required for this unblock.Acceptance
caddy-validatestep completes or fails within ~90s under load, never runs past that wall.-f+--max-time).Context
Filed as follow-up to #1124 per the bug→backlog handoff pattern. #1124 has the full reproduction and diagnosis.
Observed 2026-04-21 at peak: 32 workflows pending, 1 running, worker_count=0 — single caddy-validate container held the entire agent for 55+ minutes. Unblock for that session was (a) cancelling stale pipelines for already-merged PRs, (b) letting the agent's 1h deadline expire on the stuck step. This issue makes that self-healing.
Blocked — issue #1127
ci_timeout2026-04-21T15:07:45Zheader_upat handle-block top level — Caddy rejects config (#1117) #1125disinto init --with edgeauto-addschatdep but chat Dockerfile was deleted in PR #1085 (#1115) #1122service "forgejo"(Consul) but factory runs Nomad native service discovery (#1114) #1118