fix: bug: disinto-woodpecker-agent unhealthy; step logs truncated on short-duration failures (#1044) #1061

Merged
dev-qwen merged 1 commit from fix/issue-1044 into main 2026-04-19 20:19:27 +00:00
Collaborator

Fixes #1044

Changes

Fixes #1044 ## Changes
dev-qwen added 1 commit 2026-04-19 20:09:34 +00:00
fix: bug: disinto-woodpecker-agent unhealthy; step logs truncated on short-duration failures (#1044)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
e90ff4eb7b
Add gRPC keepalive settings to maintain stable connections between
woodpecker-agent and woodpecker-server:

- WOODPECKER_GRPC_KEEPALIVE_TIME=10s: Send ping every 10s to detect
  stale connections before they timeout
- WOODPECKER_GRPC_KEEPALIVE_TIMEOUT=20s: Allow 20s for ping response
  before marking connection dead
- WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS=true: Keep connection
  alive even during idle periods between workflows

Also reduce Nomad healthcheck interval from 15s to 10s for faster
detection of agent failures.

These settings address the "queue: task canceled" and "wait(): code:
Unknown" gRPC errors that were causing step logs to be truncated when
the agent-server connection dropped mid-stream.
Collaborator

AI Review

Summary\n\nAdds gRPC keepalive settings to the woodpecker-agent in both the compose generator and the Nomad jobspec to prevent silent connection drops that truncated step logs on short-duration failures.\n\n### Findings\n\n- info nomad/jobs/woodpecker-agent.hcl:60: Health check interval reduced 15s→10s — minor complementary improvement, no concern.\n\nAll three keepalive vars (, , ) are present and identical in both deployment targets. No existing configuration was removed. Values are correct: timeout > time is valid; is the critical flag for maintaining idle connections.

Verdict

APPROVE — Symmetric keepalive env vars in both compose and Nomad targets; values are correct and no existing config dropped.


Reviewed at e90ff4e | AGENTS.md

## AI Review <!-- reviewed: e90ff4eb7b6c9c736469847d394583dbaa1d45a7 --> ### Summary\n\nAdds gRPC keepalive settings to the woodpecker-agent in both the compose generator and the Nomad jobspec to prevent silent connection drops that truncated step logs on short-duration failures.\n\n### Findings\n\n- **info** `nomad/jobs/woodpecker-agent.hcl:60`: Health check interval reduced 15s→10s — minor complementary improvement, no concern.\n\nAll three keepalive vars (, , ) are present and identical in both deployment targets. No existing configuration was removed. Values are correct: timeout > time is valid; is the critical flag for maintaining idle connections. ### Verdict **APPROVE** — Symmetric keepalive env vars in both compose and Nomad targets; values are correct and no existing config dropped. --- *Reviewed at `e90ff4e` | [AGENTS.md](AGENTS.md)*
review-bot approved these changes 2026-04-19 20:19:03 +00:00
review-bot left a comment
Collaborator

AI Review: APPROVE — Symmetric keepalive env vars in both compose and Nomad targets; values are correct and no existing config dropped.

AI Review: **APPROVE** — Symmetric keepalive env vars in both compose and Nomad targets; values are correct and no existing config dropped.
dev-qwen merged commit 449611e6df into main 2026-04-19 20:19:27 +00:00
dev-qwen deleted branch fix/issue-1044 2026-04-19 20:19:28 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#1061
No description provided.