fix: bug: disinto-woodpecker-agent unhealthy; step logs truncated on short-duration failures (#1044)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add gRPC keepalive settings to maintain stable connections between woodpecker-agent and woodpecker-server: - WOODPECKER_GRPC_KEEPALIVE_TIME=10s: Send ping every 10s to detect stale connections before they timeout - WOODPECKER_GRPC_KEEPALIVE_TIMEOUT=20s: Allow 20s for ping response before marking connection dead - WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS=true: Keep connection alive even during idle periods between workflows Also reduce Nomad healthcheck interval from 15s to 10s for faster detection of agent failures. These settings address the "queue: task canceled" and "wait(): code: Unknown" gRPC errors that were causing step logs to be truncated when the agent-server connection dropped mid-stream.
This commit is contained in:
parent
441e2a366d
commit
e90ff4eb7b
2 changed files with 11 additions and 5 deletions
|
|
@ -405,6 +405,9 @@ services:
|
|||
WOODPECKER_SERVER: localhost:9000
|
||||
WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-}
|
||||
WOODPECKER_GRPC_SECURE: "false"
|
||||
WOODPECKER_GRPC_KEEPALIVE_TIME: "10s"
|
||||
WOODPECKER_GRPC_KEEPALIVE_TIMEOUT: "20s"
|
||||
WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS: "true"
|
||||
WOODPECKER_HEALTHCHECK_ADDR: ":3333"
|
||||
WOODPECKER_BACKEND_DOCKER_NETWORK: ${WOODPECKER_CI_NETWORK:-disinto_disinto-net}
|
||||
WOODPECKER_MAX_WORKFLOWS: 1
|
||||
|
|
|
|||
|
|
@ -57,7 +57,7 @@ job "woodpecker-agent" {
|
|||
check {
|
||||
type = "http"
|
||||
path = "/healthz"
|
||||
interval = "15s"
|
||||
interval = "10s"
|
||||
timeout = "3s"
|
||||
}
|
||||
}
|
||||
|
|
@ -89,10 +89,13 @@ job "woodpecker-agent" {
|
|||
# Nomad's port stanza to the allocation's IP (not localhost), so the
|
||||
# agent must use the LXC's eth0 IP, not 127.0.0.1.
|
||||
env {
|
||||
WOODPECKER_SERVER = "${attr.unique.network.ip-address}:9000"
|
||||
WOODPECKER_GRPC_SECURE = "false"
|
||||
WOODPECKER_MAX_WORKFLOWS = "1"
|
||||
WOODPECKER_HEALTHCHECK_ADDR = ":3333"
|
||||
WOODPECKER_SERVER = "${attr.unique.network.ip-address}:9000"
|
||||
WOODPECKER_GRPC_SECURE = "false"
|
||||
WOODPECKER_GRPC_KEEPALIVE_TIME = "10s"
|
||||
WOODPECKER_GRPC_KEEPALIVE_TIMEOUT = "20s"
|
||||
WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS = "true"
|
||||
WOODPECKER_MAX_WORKFLOWS = "1"
|
||||
WOODPECKER_HEALTHCHECK_ADDR = ":3333"
|
||||
}
|
||||
|
||||
# ── Vault-templated agent secret ──────────────────────────────────
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue