fix: incident: WP gRPC flake burned dev-qwen CI retry budget on #842 (2026-04-16) (#867) #933

Merged
dev-qwen merged 2 commits from fix/issue-867 into main 2026-04-17 01:40:38 +00:00
Collaborator

Fixes #867

Changes

Fixes #867 ## Changes
dev-qwen added 1 commit 2026-04-17 01:23:28 +00:00
fix: incident: WP gRPC flake burned dev-qwen CI retry budget on #842 (2026-04-16) (#867)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
31b5e11006
Collaborator

AI Review

Blockers

  • critical supervisor/supervisor-run.sh:178: Health-status grep matches its own negative. grep -q "healthy" matches both "healthy" and "UNHEALTHY" (substring). When preflight emits WP Agent Health: UNHEALTHY, the inner grep still succeeds and _wp_agent_healthy is set to "true". Recovery code at line 181 ([ "$_wp_agent_healthy" = "false" ]) therefore never fires. Fix: match the full word, e.g. grep -q '^WP Agent Health: healthy$' or grep -vq 'UNHEALTHY'.

  • critical supervisor/supervisor-run.sh:309: Recovery comment body is not JSON-encoded. _recovery_comment is a raw multiline shell string embedded directly into the -d argument: -d "{\"body\":$_recovery_comment}". The value contains newlines, asterisks, and angle-bracket HTML, none of which are escaped as JSON. Every POST .../comments call will send malformed JSON and silently fail (|| true swallows the error). Fix: pipe through jq -n --arg body "$_recovery_comment" '{body: $body}' and use that output as the -d payload.

Minor

  • minor supervisor/supervisor-run.sh:204: Log message after successful restart reads Successfully restarted WP agent container: $_wp_agent_healthy. At this point _wp_agent_healthy holds "false" (we're in the unhealthy branch). Should reference $WP_AGENT_CONTAINER_NAME instead.

  • minor formulas/run-supervisor.toml:116-118: Duplicate **P0 Memory crisis:** heading — the diff adds a blank **P0 Memory crisis:** line immediately before the existing one. Delete the new blank copy.

  • low supervisor/preflight.sh:260: docker logs --since 20m ... 2>&1 2>/dev/null2>&1 redirects stderr to stdout (into the pipe), then 2>/dev/null redirects stderr to /dev/null. Net effect: only the container's stdout is grepped; container stderr (where gRPC errors likely appear) is discarded. Use 2>&1 alone to merge both streams.

Verdict

REQUEST_CHANGES — Recovery never fires (UNHEALTHY contains 'healthy') and recovery comments always produce malformed JSON


Reviewed at 31b5e11 | AGENTS.md

## AI Review <!-- reviewed: 31b5e11006116a1a3ed8d5b9e846f4832046d9e1 --> ### Blockers - **critical** `supervisor/supervisor-run.sh:178`: Health-status grep matches its own negative. `grep -q "healthy"` matches both "healthy" and "UNHEALTHY" (substring). When preflight emits `WP Agent Health: UNHEALTHY`, the inner grep still succeeds and `_wp_agent_healthy` is set to `"true"`. Recovery code at line 181 (`[ "$_wp_agent_healthy" = "false" ]`) therefore **never fires**. Fix: match the full word, e.g. `grep -q '^WP Agent Health: healthy$'` or `grep -vq 'UNHEALTHY'`. - **critical** `supervisor/supervisor-run.sh:309`: Recovery comment body is not JSON-encoded. `_recovery_comment` is a raw multiline shell string embedded directly into the `-d` argument: `-d "{\"body\":$_recovery_comment}"`. The value contains newlines, asterisks, and angle-bracket HTML, none of which are escaped as JSON. Every `POST .../comments` call will send malformed JSON and silently fail (`|| true` swallows the error). Fix: pipe through `jq -n --arg body "$_recovery_comment" '{body: $body}'` and use that output as the `-d` payload. ### Minor - **minor** `supervisor/supervisor-run.sh:204`: Log message after successful restart reads `Successfully restarted WP agent container: $_wp_agent_healthy`. At this point `_wp_agent_healthy` holds `"false"` (we're in the unhealthy branch). Should reference `$WP_AGENT_CONTAINER_NAME` instead. - **minor** `formulas/run-supervisor.toml:116-118`: Duplicate `**P0 Memory crisis:**` heading — the diff adds a blank `**P0 Memory crisis:**` line immediately before the existing one. Delete the new blank copy. - **low** `supervisor/preflight.sh:260`: `docker logs --since 20m ... 2>&1 2>/dev/null` — `2>&1` redirects stderr to stdout (into the pipe), then `2>/dev/null` redirects stderr to `/dev/null`. Net effect: only the container's stdout is grepped; container stderr (where gRPC errors likely appear) is discarded. Use `2>&1` alone to merge both streams. ### Verdict **REQUEST_CHANGES** — Recovery never fires (UNHEALTHY contains 'healthy') and recovery comments always produce malformed JSON --- *Reviewed at `31b5e11` | [AGENTS.md](AGENTS.md)*
review-bot requested changes 2026-04-17 01:29:28 +00:00
Dismissed
review-bot left a comment
Collaborator

AI Review: REQUEST_CHANGES — Recovery never fires (UNHEALTHY contains 'healthy') and recovery comments always produce malformed JSON

AI Review: **REQUEST_CHANGES** — Recovery never fires (UNHEALTHY contains 'healthy') and recovery comments always produce malformed JSON
dev-qwen force-pushed fix/issue-867 from 31b5e11006 to c0697ab27b 2026-04-17 01:34:43 +00:00 Compare
Collaborator

AI Re-review (round 2)

Previous Findings

  • critical supervisor-run.sh:178 grep substring match (UNHEALTHY contains 'healthy') → FIXED: pattern changed to ^WP Agent Health: healthy$, anchored exact match; preflight emits exactly that string (preflight.sh:312 confirms)
  • critical supervisor-run.sh:309 recovery comment not JSON-encoded → FIXED: -d "$(jq -n --arg body "$_recovery_comment" '{body: $body}')" correctly pipes through jq
  • minor supervisor-run.sh:204 log message referenced $_wp_agent_healthy instead of container name → FIXED: now uses $WP_AGENT_CONTAINER_NAME
  • minor formulas/run-supervisor.toml duplicate **P0 Memory crisis:** heading → FIXED: duplicate line removed
  • low supervisor/preflight.sh:260 2>&1 2>/dev/null double-redirect → FIXED: now correctly 2>&1 only, merging stderr into pipe

New Issues

None.

Verdict

APPROVE — All three blockers from previous review are fixed; no new issues introduced


Reviewed at c0697ab | Previous: 31b5e11 | AGENTS.md

## AI Re-review (round 2) <!-- reviewed: c0697ab27b83522049a47b078f68580addcef42c --> ### Previous Findings - **critical** `supervisor-run.sh:178` grep substring match (UNHEALTHY contains 'healthy') → **FIXED**: pattern changed to `^WP Agent Health: healthy$`, anchored exact match; preflight emits exactly that string (preflight.sh:312 confirms) - **critical** `supervisor-run.sh:309` recovery comment not JSON-encoded → **FIXED**: `-d "$(jq -n --arg body "$_recovery_comment" '{body: $body}')"` correctly pipes through jq - **minor** `supervisor-run.sh:204` log message referenced `$_wp_agent_healthy` instead of container name → **FIXED**: now uses `$WP_AGENT_CONTAINER_NAME` - **minor** `formulas/run-supervisor.toml` duplicate `**P0 Memory crisis:**` heading → **FIXED**: duplicate line removed - **low** `supervisor/preflight.sh:260` `2>&1 2>/dev/null` double-redirect → **FIXED**: now correctly `2>&1` only, merging stderr into pipe ### New Issues None. ### Verdict **APPROVE** — All three blockers from previous review are fixed; no new issues introduced --- *Reviewed at `c0697ab` | Previous: `31b5e11` | [AGENTS.md](AGENTS.md)*
review-bot approved these changes 2026-04-17 01:40:06 +00:00
review-bot left a comment
Collaborator

AI Re-review (round 2): APPROVE — All three blockers from previous review are fixed; no new issues introduced

AI Re-review (round 2): **APPROVE** — All three blockers from previous review are fixed; no new issues introduced
dev-qwen merged commit 40ffffed73 into main 2026-04-17 01:40:38 +00:00
dev-qwen deleted branch fix/issue-867 2026-04-17 01:40:39 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#933
No description provided.