fix: [nomad-step-2] S2.6 — CI: vault policy fmt + validate + roles.yaml check (#884) #903

Merged

dev-qwen2 merged 1 commit from fix/issue-884-1 into main

2026-04-16 18:27:35 +00:00

dev-bot commented

2026-04-16 18:15:38 +00:00

Collaborator

Fixes #884

Changes

Fixes #884 ## Changes

dev-bot added 1 commit 2026-04-16 18:15:39 +00:00

fix: [nomad-step-2] S2.6 — CI: vault policy fmt + validate + roles.yaml check (#884 )

ci/woodpecker/push/ci Pipeline was successful

Details

ci/woodpecker/push/nomad-validate Pipeline was successful

Details

ci/woodpecker/pr/ci Pipeline was successful

Details

ci/woodpecker/pr/nomad-validate Pipeline was successful

Details

ci/woodpecker/pr/secret-scan Pipeline was successful

Details

6e73c6dd1f

Extend .woodpecker/nomad-validate.yml with three new fail-closed steps
that guard every artifact under vault/policies/ and vault/roles.yaml
before it can land:

  4. vault-policy-fmt      — cp+fmt+diff idempotence check (vault 1.18.5
                             has no `policy fmt -check` flag, so we
                             build the non-destructive check out of
                             `vault policy fmt` on a /tmp copy + diff
                             against the original)
  5. vault-policy-validate — HCL syntax + capability validation via
                             `vault policy write` against an inline
                             dev-mode Vault server (no offline
                             `policy validate` subcommand exists;
                             dev-mode writes are ephemeral so this is
                             a validator, not a deploy)
  6. vault-roles-validate  — yamllint + PyYAML-based role→policy
                             reference check (every role's `policy:`
                             field must match a vault/policies/*.hcl
                             basename; also checks the four required
                             fields name/policy/namespace/job_id)

Secret-scan coverage for vault/policies/*.hcl is already provided by
the P11 gate (.woodpecker/secret-scan.yml) via its `vault/**/*` trigger
path — this pipeline intentionally does NOT duplicate that gate to
avoid the inline-heredoc / YAML-parse failure mode that sank the prior
attempt at this issue (PR #896).

Trigger paths extended: `vault/policies/**` and `vault/roles.yaml`.
`lib/init/nomad/vault-*.sh` is already covered by the existing
`lib/init/nomad/**` glob.

Docs: nomad/AGENTS.md and vault/policies/AGENTS.md updated with the
policy lifecycle, the CI enforcement table, and the common failure
modes authors will see.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

review-bot commented

2026-04-16 18:25:30 +00:00

Collaborator

AI Review

Summary

CI pipeline extended from 5 to 8 steps: vault-policy-fmt (format idempotence), vault-policy-validate (HCL syntax via inline dev Vault), and vault-roles-validate (yamllint + role→policy reference check). Documentation in nomad/AGENTS.md and vault/policies/AGENTS.md updated consistently throughout.

Findings

low .woodpecker/nomad-validate.yml:172: vault policy fmt "$tmp" >/dev/null 2>&1 silences stderr. On a completely malformed HCL file, vault policy fmt will exit non-zero and set -e aborts the loop — but with no diagnostic visible in CI logs. The author sees a failed step with no message. Step 5 would have caught it with a clear Vault parser error, but step 5 never runs. Fix: drop the 2>&1 so only stdout is suppressed (no noise on success) while stderr remains visible for failure cases. Non-blocking because CI still blocks merge, and the scenario requires a fully-unparseable policy file.

Verification

Format check (copy → fmt → diff) logic is correct. The [ -f "$f" ] || continue guard handles the no-match POSIX glob case properly.
Dev-mode Vault lifecycle: trap ... EXIT INT TERM fires on set -e abort (EXIT trap is correct here; the RETURN-trap gotcha only applies inside functions). Server is reliably killed on any step exit.
vault status inside an if condition is exempt from set -e, so the readiness poll does not abort early on transient non-zero returns during startup.
Python step: os.listdir('vault/policies') would throw FileNotFoundError if the directory is absent, but the trigger paths (vault/policies/**) guarantee the directory exists whenever this pipeline runs. Edge-case only if vault/roles.yaml exists with no sibling vault/policies/ directory — not a real scenario given current repo layout.
roles.yaml cross-check: all 18 policy basenames in vault/policies/ have matching entries in roles.yaml and vice versa.
Trigger paths (vault/policies/**, vault/roles.yaml) are correct; secret-scan delegation to P11 is properly noted and not duplicated.
Step numbering (4-8) is consistent across the pipeline YAML, nomad/AGENTS.md, and vault/policies/AGENTS.md.

Verdict

APPROVE — Three new CI steps are logically sound and well-documented; one minor stderr-suppression DX issue noted but non-blocking.

Reviewed at 6e73c6d | AGENTS.md

## AI Review  ### Summary CI pipeline extended from 5 to 8 steps: vault-policy-fmt (format idempotence), vault-policy-validate (HCL syntax via inline dev Vault), and vault-roles-validate (yamllint + role→policy reference check). Documentation in nomad/AGENTS.md and vault/policies/AGENTS.md updated consistently throughout. ### Findings - **low** `.woodpecker/nomad-validate.yml:172`: `vault policy fmt "$tmp" >/dev/null 2>&1` silences stderr. On a completely malformed HCL file, `vault policy fmt` will exit non-zero and `set -e` aborts the loop — but with no diagnostic visible in CI logs. The author sees a failed step with no message. Step 5 would have caught it with a clear Vault parser error, but step 5 never runs. Fix: drop the `2>&1` so only stdout is suppressed (no noise on success) while stderr remains visible for failure cases. Non-blocking because CI still blocks merge, and the scenario requires a fully-unparseable policy file. ### Verification - Format check (copy → fmt → diff) logic is correct. The `[ -f "$f" ] || continue` guard handles the no-match POSIX glob case properly. - Dev-mode Vault lifecycle: `trap ... EXIT INT TERM` fires on `set -e` abort (EXIT trap is correct here; the RETURN-trap gotcha only applies inside functions). Server is reliably killed on any step exit. - `vault status` inside an `if` condition is exempt from `set -e`, so the readiness poll does not abort early on transient non-zero returns during startup. - Python step: `os.listdir('vault/policies')` would throw `FileNotFoundError` if the directory is absent, but the trigger paths (`vault/policies/**`) guarantee the directory exists whenever this pipeline runs. Edge-case only if `vault/roles.yaml` exists with no sibling `vault/policies/` directory — not a real scenario given current repo layout. - roles.yaml cross-check: all 18 policy basenames in `vault/policies/` have matching entries in `roles.yaml` and vice versa. - Trigger paths (`vault/policies/**`, `vault/roles.yaml`) are correct; secret-scan delegation to P11 is properly noted and not duplicated. - Step numbering (4-8) is consistent across the pipeline YAML, nomad/AGENTS.md, and vault/policies/AGENTS.md. ### Verdict **APPROVE** — Three new CI steps are logically sound and well-documented; one minor stderr-suppression DX issue noted but non-blocking. --- *Reviewed at `6e73c6d` | [AGENTS.md](AGENTS.md)*

review-bot approved these changes 2026-04-16 18:25:30 +00:00

review-bot left a comment

Collaborator

AI Review: APPROVE — Three new CI steps are logically sound and well-documented; one minor stderr-suppression DX issue noted but non-blocking.

AI Review: **APPROVE** — Three new CI steps are logically sound and well-documented; one minor stderr-suppression DX issue noted but non-blocking.