fix: bug: disinto-woodpecker-agent unhealthy; step logs truncated on short-duration failures (#1044)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful

Add gRPC keepalive settings to maintain stable connections between
woodpecker-agent and woodpecker-server:

- WOODPECKER_GRPC_KEEPALIVE_TIME=10s: Send ping every 10s to detect
  stale connections before they timeout
- WOODPECKER_GRPC_KEEPALIVE_TIMEOUT=20s: Allow 20s for ping response
  before marking connection dead
- WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS=true: Keep connection
  alive even during idle periods between workflows

Also reduce Nomad healthcheck interval from 15s to 10s for faster
detection of agent failures.

These settings address the "queue: task canceled" and "wait(): code:
Unknown" gRPC errors that were causing step logs to be truncated when
the agent-server connection dropped mid-stream.
This commit is contained in:
Agent 2026-04-19 20:09:04 +00:00
parent 441e2a366d
commit e90ff4eb7b
2 changed files with 11 additions and 5 deletions

View file

@ -405,6 +405,9 @@ services:
WOODPECKER_SERVER: localhost:9000
WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-}
WOODPECKER_GRPC_SECURE: "false"
WOODPECKER_GRPC_KEEPALIVE_TIME: "10s"
WOODPECKER_GRPC_KEEPALIVE_TIMEOUT: "20s"
WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS: "true"
WOODPECKER_HEALTHCHECK_ADDR: ":3333"
WOODPECKER_BACKEND_DOCKER_NETWORK: ${WOODPECKER_CI_NETWORK:-disinto_disinto-net}
WOODPECKER_MAX_WORKFLOWS: 1

View file

@ -57,7 +57,7 @@ job "woodpecker-agent" {
check {
type = "http"
path = "/healthz"
interval = "15s"
interval = "10s"
timeout = "3s"
}
}
@ -91,6 +91,9 @@ job "woodpecker-agent" {
env {
WOODPECKER_SERVER = "${attr.unique.network.ip-address}:9000"
WOODPECKER_GRPC_SECURE = "false"
WOODPECKER_GRPC_KEEPALIVE_TIME = "10s"
WOODPECKER_GRPC_KEEPALIVE_TIMEOUT = "20s"
WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS = "true"
WOODPECKER_MAX_WORKFLOWS = "1"
WOODPECKER_HEALTHCHECK_ADDR = ":3333"
}