Compare commits
3 commits
24d6441e14
...
f1f60e555c
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f1f60e555c | ||
| dd61d0d29e | |||
|
|
6e73c6dd1f |
4 changed files with 301 additions and 23 deletions
|
|
@ -1,16 +1,21 @@
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# .woodpecker/nomad-validate.yml — Static validation for Nomad+Vault artifacts
|
# .woodpecker/nomad-validate.yml — Static validation for Nomad+Vault artifacts
|
||||||
#
|
#
|
||||||
# Part of the Nomad+Vault migration (S0.5, issue #825). Locks in the
|
# Part of the Nomad+Vault migration (S0.5, issue #825; extended in S2.6,
|
||||||
# "no-ad-hoc-steps" principle: every HCL/shell artifact under nomad/ or
|
# issue #884). Locks in the "no-ad-hoc-steps" principle: every HCL/shell
|
||||||
# lib/init/nomad/, plus the `disinto init` dispatcher, gets checked
|
# artifact under nomad/, lib/init/nomad/, vault/policies/, plus the
|
||||||
# before it can land.
|
# `disinto init` dispatcher and vault/roles.yaml, gets checked before it
|
||||||
|
# can land.
|
||||||
#
|
#
|
||||||
# Triggers on PRs (and pushes) that touch any of:
|
# Triggers on PRs (and pushes) that touch any of:
|
||||||
# nomad/** — HCL configs (server, client, vault)
|
# nomad/** — HCL configs (server, client, vault)
|
||||||
# lib/init/nomad/** — cluster-up / install / systemd / vault-init
|
# lib/init/nomad/** — cluster-up / install / systemd / vault-init /
|
||||||
|
# vault-nomad-auth (S2.6 trigger: vault-*.sh
|
||||||
|
# is a subset of this glob)
|
||||||
# bin/disinto — `disinto init --backend=nomad` dispatcher
|
# bin/disinto — `disinto init --backend=nomad` dispatcher
|
||||||
# tests/disinto-init-nomad.bats — the bats suite itself
|
# tests/disinto-init-nomad.bats — the bats suite itself
|
||||||
|
# vault/policies/** — Vault ACL policy HCL files (S2.1, S2.6)
|
||||||
|
# vault/roles.yaml — JWT-auth role bindings (S2.3, S2.6)
|
||||||
# .woodpecker/nomad-validate.yml — the pipeline definition
|
# .woodpecker/nomad-validate.yml — the pipeline definition
|
||||||
#
|
#
|
||||||
# Steps (all fail-closed — any error blocks merge):
|
# Steps (all fail-closed — any error blocks merge):
|
||||||
|
|
@ -19,8 +24,22 @@
|
||||||
# nomad/jobs/*.hcl (new jobspecs get
|
# nomad/jobs/*.hcl (new jobspecs get
|
||||||
# CI coverage automatically)
|
# CI coverage automatically)
|
||||||
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
|
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
|
||||||
# 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
|
# 4. vault-policy-fmt — `vault policy fmt` idempotence check on
|
||||||
# 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
|
# every vault/policies/*.hcl (format drift =
|
||||||
|
# CI fail; non-destructive via cp+diff)
|
||||||
|
# 5. vault-policy-validate — HCL syntax + capability validation for every
|
||||||
|
# vault/policies/*.hcl via `vault policy write`
|
||||||
|
# against an inline dev-mode Vault server
|
||||||
|
# 6. vault-roles-validate — yamllint + role→policy reference check on
|
||||||
|
# vault/roles.yaml (every referenced policy
|
||||||
|
# must exist as vault/policies/<name>.hcl)
|
||||||
|
# 7. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
|
||||||
|
# 8. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
|
||||||
|
#
|
||||||
|
# Secret-scan coverage: vault/policies/*.hcl is already scanned by the
|
||||||
|
# P11 gate (.woodpecker/secret-scan.yml, issue #798) — its trigger path
|
||||||
|
# `vault/**/*` covers everything under this directory. We intentionally
|
||||||
|
# do NOT duplicate that gate here; one scanner, one source of truth.
|
||||||
#
|
#
|
||||||
# Pinned image versions match lib/init/nomad/install.sh (nomad 1.9.5 /
|
# Pinned image versions match lib/init/nomad/install.sh (nomad 1.9.5 /
|
||||||
# vault 1.18.5). Bump there AND here together — drift = CI passing on
|
# vault 1.18.5). Bump there AND here together — drift = CI passing on
|
||||||
|
|
@ -34,6 +53,8 @@ when:
|
||||||
- "lib/init/nomad/**"
|
- "lib/init/nomad/**"
|
||||||
- "bin/disinto"
|
- "bin/disinto"
|
||||||
- "tests/disinto-init-nomad.bats"
|
- "tests/disinto-init-nomad.bats"
|
||||||
|
- "vault/policies/**"
|
||||||
|
- "vault/roles.yaml"
|
||||||
- ".woodpecker/nomad-validate.yml"
|
- ".woodpecker/nomad-validate.yml"
|
||||||
|
|
||||||
# Authenticated clone — same pattern as .woodpecker/ci.yml. Forgejo is
|
# Authenticated clone — same pattern as .woodpecker/ci.yml. Forgejo is
|
||||||
|
|
@ -123,7 +144,176 @@ steps:
|
||||||
*) echo "vault config: hard failure (rc=$rc)" >&2; exit "$rc" ;;
|
*) echo "vault config: hard failure (rc=$rc)" >&2; exit "$rc" ;;
|
||||||
esac
|
esac
|
||||||
|
|
||||||
# ── 4. Shellcheck ────────────────────────────────────────────────────────
|
# ── 4. Vault policy fmt idempotence check ────────────────────────────────
|
||||||
|
# `vault policy fmt <file>` formats a local HCL policy file in place.
|
||||||
|
# There's no `-check`/dry-run flag (vault 1.18.5), so we implement a
|
||||||
|
# non-destructive check as cp → fmt-on-copy → diff against original.
|
||||||
|
# Any diff means the committed file would be rewritten by `vault policy
|
||||||
|
# fmt` — failure steers the author to run `vault policy fmt <file>`
|
||||||
|
# locally before pushing.
|
||||||
|
#
|
||||||
|
# Scope: vault/policies/*.hcl only. The `[ -f "$f" ]` guard handles the
|
||||||
|
# no-match case (POSIX sh does not nullglob) so an empty policies/
|
||||||
|
# directory does not fail this step.
|
||||||
|
#
|
||||||
|
# Note: `vault policy fmt` is purely local (HCL text transform) and does
|
||||||
|
# not require a running Vault server, which is why this step can run
|
||||||
|
# without starting one.
|
||||||
|
- name: vault-policy-fmt
|
||||||
|
image: hashicorp/vault:1.18.5
|
||||||
|
commands:
|
||||||
|
- |
|
||||||
|
set -e
|
||||||
|
failed=0
|
||||||
|
for f in vault/policies/*.hcl; do
|
||||||
|
[ -f "$f" ] || continue
|
||||||
|
tmp="/tmp/$(basename "$f").fmt"
|
||||||
|
cp "$f" "$tmp"
|
||||||
|
vault policy fmt "$tmp" >/dev/null 2>&1
|
||||||
|
if ! diff -u "$f" "$tmp"; then
|
||||||
|
echo "ERROR: $f is not formatted — run 'vault policy fmt $f' locally" >&2
|
||||||
|
failed=1
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if [ "$failed" -gt 0 ]; then
|
||||||
|
echo "vault-policy-fmt: formatting drift detected" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "vault-policy-fmt: all policies formatted correctly"
|
||||||
|
|
||||||
|
# ── 5. Vault policy HCL syntax + capability validation ───────────────────
|
||||||
|
# Vault has no offline `vault policy validate` subcommand — the closest
|
||||||
|
# in-CLI validator is `vault policy write`, which sends the HCL to a
|
||||||
|
# running server which parses it, checks capability names against the
|
||||||
|
# known set (read, list, create, update, delete, patch, sudo, deny),
|
||||||
|
# and rejects unknown stanzas / malformed path blocks. We start an
|
||||||
|
# inline dev-mode Vault (in-memory, no persistence, root token = "root")
|
||||||
|
# for the duration of this step and loop `vault policy write` over every
|
||||||
|
# vault/policies/*.hcl; the policies never leave the ephemeral dev
|
||||||
|
# server, so this is strictly a validator — not a deploy.
|
||||||
|
#
|
||||||
|
# Exit-code handling:
|
||||||
|
# - `vault policy write` exits 0 on success, non-zero on any parse /
|
||||||
|
# semantic error. We aggregate failures across all files so a single
|
||||||
|
# CI run surfaces every broken policy (not just the first).
|
||||||
|
# - The dev server is killed on any step exit via EXIT trap so the
|
||||||
|
# step tears down cleanly even on failure.
|
||||||
|
#
|
||||||
|
# Why dev-mode is sufficient: we're not persisting secrets, only asking
|
||||||
|
# Vault to parse policy text. The factory's production Vault is NOT
|
||||||
|
# contacted.
|
||||||
|
- name: vault-policy-validate
|
||||||
|
image: hashicorp/vault:1.18.5
|
||||||
|
commands:
|
||||||
|
- |
|
||||||
|
set -e
|
||||||
|
vault server -dev -dev-root-token-id=root -dev-listen-address=127.0.0.1:8200 >/tmp/vault-dev.log 2>&1 &
|
||||||
|
VAULT_PID=$!
|
||||||
|
trap 'kill "$VAULT_PID" 2>/dev/null || true' EXIT INT TERM
|
||||||
|
export VAULT_ADDR=http://127.0.0.1:8200
|
||||||
|
export VAULT_TOKEN=root
|
||||||
|
ready=0
|
||||||
|
i=0
|
||||||
|
while [ "$i" -lt 30 ]; do
|
||||||
|
if vault status >/dev/null 2>&1; then
|
||||||
|
ready=1
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
i=$((i + 1))
|
||||||
|
sleep 0.5
|
||||||
|
done
|
||||||
|
if [ "$ready" -ne 1 ]; then
|
||||||
|
echo "vault-policy-validate: dev server failed to start after 15s" >&2
|
||||||
|
cat /tmp/vault-dev.log >&2 || true
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
failed=0
|
||||||
|
for f in vault/policies/*.hcl; do
|
||||||
|
[ -f "$f" ] || continue
|
||||||
|
name=$(basename "$f" .hcl)
|
||||||
|
echo "validate: $f"
|
||||||
|
if ! vault policy write "$name" "$f"; then
|
||||||
|
echo " ERROR: $f failed validation" >&2
|
||||||
|
failed=1
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if [ "$failed" -gt 0 ]; then
|
||||||
|
echo "vault-policy-validate: validation errors found" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "vault-policy-validate: all policies valid"
|
||||||
|
|
||||||
|
# ── 6. vault/roles.yaml validator ────────────────────────────────────────
|
||||||
|
# Validates the JWT-auth role bindings file (S2.3). Two checks:
|
||||||
|
#
|
||||||
|
# a. `yamllint` — catches YAML syntax errors and indentation drift.
|
||||||
|
# Uses a relaxed config (line length bumped to 200) because
|
||||||
|
# roles.yaml's comments are wide by design.
|
||||||
|
# b. role → policy reference check — every role's `policy:` field
|
||||||
|
# must match a basename in vault/policies/*.hcl. A role pointing
|
||||||
|
# at a non-existent policy = runtime "permission denied" at job
|
||||||
|
# placement; catching the drift here turns it into a CI failure.
|
||||||
|
# Also verifies each role entry has the four required fields
|
||||||
|
# (name, policy, namespace, job_id) per the file's documented
|
||||||
|
# format.
|
||||||
|
#
|
||||||
|
# Parsing is done with PyYAML (the roles.yaml format is a strict
|
||||||
|
# subset that awk-level parsing in tools/vault-apply-roles.sh handles
|
||||||
|
# too, but PyYAML in CI gives us structural validation for free). If
|
||||||
|
# roles.yaml is ever absent (e.g. reverted), the step skips rather
|
||||||
|
# than fails — presence is enforced by S2.3's own tooling, not here.
|
||||||
|
- name: vault-roles-validate
|
||||||
|
image: python:3.12-alpine
|
||||||
|
commands:
|
||||||
|
- pip install --quiet --disable-pip-version-check pyyaml yamllint
|
||||||
|
- |
|
||||||
|
set -e
|
||||||
|
if [ ! -f vault/roles.yaml ]; then
|
||||||
|
echo "vault-roles-validate: vault/roles.yaml not present, skipping"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
yamllint -d '{extends: relaxed, rules: {line-length: {max: 200}}}' vault/roles.yaml
|
||||||
|
echo "vault-roles-validate: yamllint OK"
|
||||||
|
python3 - <<'PY'
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
with open('vault/roles.yaml') as f:
|
||||||
|
data = yaml.safe_load(f) or {}
|
||||||
|
roles = data.get('roles') or []
|
||||||
|
if not roles:
|
||||||
|
print("vault-roles-validate: no roles defined in vault/roles.yaml", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
existing = {
|
||||||
|
os.path.splitext(e)[0]
|
||||||
|
for e in os.listdir('vault/policies')
|
||||||
|
if e.endswith('.hcl')
|
||||||
|
}
|
||||||
|
required = ('name', 'policy', 'namespace', 'job_id')
|
||||||
|
failed = 0
|
||||||
|
for r in roles:
|
||||||
|
if not isinstance(r, dict):
|
||||||
|
print(f"ERROR: role entry is not a mapping: {r!r}", file=sys.stderr)
|
||||||
|
failed = 1
|
||||||
|
continue
|
||||||
|
for field in required:
|
||||||
|
if r.get(field) in (None, ''):
|
||||||
|
print(f"ERROR: role entry missing required field '{field}': {r}", file=sys.stderr)
|
||||||
|
failed = 1
|
||||||
|
policy = r.get('policy')
|
||||||
|
if policy and policy not in existing:
|
||||||
|
print(
|
||||||
|
f"ERROR: role '{r.get('name')}' references policy '{policy}' "
|
||||||
|
f"but vault/policies/{policy}.hcl does not exist",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
failed = 1
|
||||||
|
sys.exit(failed)
|
||||||
|
PY
|
||||||
|
echo "vault-roles-validate: all role→policy references valid"
|
||||||
|
|
||||||
|
# ── 7. Shellcheck ────────────────────────────────────────────────────────
|
||||||
# Covers the new lib/init/nomad/*.sh scripts plus bin/disinto (which owns
|
# Covers the new lib/init/nomad/*.sh scripts plus bin/disinto (which owns
|
||||||
# the backend dispatcher). bin/disinto has no .sh extension so the
|
# the backend dispatcher). bin/disinto has no .sh extension so the
|
||||||
# repo-wide shellcheck in .woodpecker/ci.yml skips it — this step is the
|
# repo-wide shellcheck in .woodpecker/ci.yml skips it — this step is the
|
||||||
|
|
@ -133,7 +323,7 @@ steps:
|
||||||
commands:
|
commands:
|
||||||
- shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
|
- shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
|
||||||
|
|
||||||
# ── 5. bats: `disinto init --backend=nomad --dry-run` ────────────────────
|
# ── 8. bats: `disinto init --backend=nomad --dry-run` ────────────────────
|
||||||
# Smoke-tests the CLI dispatcher: both --backend=nomad variants exit 0
|
# Smoke-tests the CLI dispatcher: both --backend=nomad variants exit 0
|
||||||
# with the expected step list, and --backend=docker stays on the docker
|
# with the expected step list, and --backend=docker stays on the docker
|
||||||
# path (regression guard). Pure dry-run — no sudo, no network.
|
# path (regression guard). Pure dry-run — no sudo, no network.
|
||||||
|
|
|
||||||
|
|
@ -135,7 +135,6 @@ vault_request() {
|
||||||
local vault_env="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/action-vault/vault-env.sh"
|
local vault_env="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/action-vault/vault-env.sh"
|
||||||
if [ ! -f "$vault_env" ]; then
|
if [ ! -f "$vault_env" ]; then
|
||||||
echo "ERROR: vault-env.sh not found at $vault_env" >&2
|
echo "ERROR: vault-env.sh not found at $vault_env" >&2
|
||||||
rm -f "$tmp_toml"
|
|
||||||
return 1
|
return 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|
@ -145,14 +144,13 @@ vault_request() {
|
||||||
if ! source "$vault_env"; then
|
if ! source "$vault_env"; then
|
||||||
FORGE_TOKEN="${_saved_forge_token:-}"
|
FORGE_TOKEN="${_saved_forge_token:-}"
|
||||||
echo "ERROR: failed to source vault-env.sh" >&2
|
echo "ERROR: failed to source vault-env.sh" >&2
|
||||||
rm -f "$tmp_toml"
|
|
||||||
return 1
|
return 1
|
||||||
fi
|
fi
|
||||||
# Restore caller's FORGE_TOKEN after validation
|
# Restore caller's FORGE_TOKEN after validation
|
||||||
FORGE_TOKEN="${_saved_forge_token:-}"
|
FORGE_TOKEN="${_saved_forge_token:-}"
|
||||||
|
|
||||||
# Set trap AFTER sourcing vault-env.sh to avoid RETURN trap firing during source
|
# Set trap AFTER sourcing vault-env.sh to avoid RETURN trap firing during source
|
||||||
trap 'rm -f "$tmp_toml"' EXIT
|
trap 'rm -f "$tmp_toml"' RETURN
|
||||||
|
|
||||||
# Run validation
|
# Run validation
|
||||||
if ! validate_vault_action "$tmp_toml"; then
|
if ! validate_vault_action "$tmp_toml"; then
|
||||||
|
|
|
||||||
|
|
@ -61,8 +61,8 @@ convention, KV path summary, and JWT-auth role bindings (S2.1/S2.3).
|
||||||
## How CI validates these files
|
## How CI validates these files
|
||||||
|
|
||||||
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`
|
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`
|
||||||
(including `nomad/jobs/`), `lib/init/nomad/`, or `bin/disinto`. Five
|
(including `nomad/jobs/`), `lib/init/nomad/`, `bin/disinto`,
|
||||||
fail-closed steps:
|
`vault/policies/`, or `vault/roles.yaml`. Eight fail-closed steps:
|
||||||
|
|
||||||
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
|
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
|
||||||
— parses the HCL, fails on unknown blocks, bad port ranges, invalid
|
— parses the HCL, fails on unknown blocks, bad port ranges, invalid
|
||||||
|
|
@ -87,19 +87,47 @@ fail-closed steps:
|
||||||
disables the runtime checks (CI containers don't have
|
disables the runtime checks (CI containers don't have
|
||||||
`/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only,
|
`/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only,
|
||||||
e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge.
|
e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge.
|
||||||
4. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
|
4. **`vault policy fmt` idempotence check on every `vault/policies/*.hcl`**
|
||||||
|
(S2.6) — `vault policy fmt` has no `-check` flag in 1.18.5, so the
|
||||||
|
step copies each file to `/tmp`, runs `vault policy fmt` on the copy,
|
||||||
|
and diffs against the original. Any non-empty diff means the
|
||||||
|
committed file would be rewritten by `fmt` and the step fails — the
|
||||||
|
author is pointed at `vault policy fmt <file>` to heal the drift.
|
||||||
|
5. **`vault policy write`-based validation against an inline dev-mode Vault**
|
||||||
|
(S2.6) — Vault 1.18.5 has no offline `policy validate` subcommand;
|
||||||
|
the CI step starts a dev-mode server, loops `vault policy write
|
||||||
|
<basename> <file>` over each `vault/policies/*.hcl`, and aggregates
|
||||||
|
failures so one CI run surfaces every broken policy. The server is
|
||||||
|
ephemeral and torn down on step exit — no persistence, no real
|
||||||
|
secrets. Catches unknown capability names (e.g. `"frobnicate"`),
|
||||||
|
malformed `path` blocks, and other semantic errors `fmt` does not.
|
||||||
|
6. **`vault/roles.yaml` validator** (S2.6) — yamllint + a PyYAML-based
|
||||||
|
check that every role's `policy:` field matches a basename under
|
||||||
|
`vault/policies/`, and that every role entry carries all four
|
||||||
|
required fields (`name`, `policy`, `namespace`, `job_id`). Drift
|
||||||
|
between the two directories is a scheduling-time "permission denied"
|
||||||
|
in production; this step turns it into a CI failure at PR time.
|
||||||
|
7. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
|
||||||
— all init/dispatcher shell clean. `bin/disinto` has no `.sh`
|
— all init/dispatcher shell clean. `bin/disinto` has no `.sh`
|
||||||
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
|
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
|
||||||
it — this is the one place it gets checked.
|
it — this is the one place it gets checked.
|
||||||
5. **`bats tests/disinto-init-nomad.bats`**
|
8. **`bats tests/disinto-init-nomad.bats`**
|
||||||
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
|
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
|
||||||
`… --empty --dry-run`, and the `--backend=docker` regression guard.
|
`… --empty --dry-run`, and the `--backend=docker` regression guard.
|
||||||
|
|
||||||
|
**Secret-scan coverage.** Policy HCL files under `vault/policies/` are
|
||||||
|
already swept by the P11 secret-scan gate
|
||||||
|
(`.woodpecker/secret-scan.yml`, #798), whose `vault/**/*` trigger path
|
||||||
|
covers everything in this directory. `nomad-validate.yml` intentionally
|
||||||
|
does NOT duplicate that gate — one scanner, one source of truth.
|
||||||
|
|
||||||
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
|
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
|
||||||
fails with a clear error; if it breaks a jobspec (e.g. misspells
|
fails with a clear error; if it breaks a jobspec (e.g. misspells
|
||||||
`task` as `tsak`, or adds a `volume` stanza without a `source`), step
|
`task` as `tsak`, or adds a `volume` stanza without a `source`), step
|
||||||
2 fails instead. The fix makes it pass. PRs that don't touch any of
|
2 fails; a typo in a `path "..."` block in a vault policy fails step 5
|
||||||
the trigger paths skip this pipeline entirely.
|
with the Vault parser's error; a `roles.yaml` entry that points at a
|
||||||
|
policy basename that does not exist fails step 6. PRs that don't touch
|
||||||
|
any of the trigger paths skip this pipeline entirely.
|
||||||
|
|
||||||
## Version pinning
|
## Version pinning
|
||||||
|
|
||||||
|
|
@ -119,5 +147,13 @@ accept (or vice versa).
|
||||||
|
|
||||||
- `lib/init/nomad/` — installer + systemd units + cluster-up orchestrator.
|
- `lib/init/nomad/` — installer + systemd units + cluster-up orchestrator.
|
||||||
- `.woodpecker/nomad-validate.yml` — this directory's CI pipeline.
|
- `.woodpecker/nomad-validate.yml` — this directory's CI pipeline.
|
||||||
|
- `vault/policies/` — Vault ACL policy HCL files (S2.1); the
|
||||||
|
`vault-policy-fmt` / `vault-policy-validate` CI steps above enforce
|
||||||
|
their shape. See [`../vault/policies/AGENTS.md`](../vault/policies/AGENTS.md)
|
||||||
|
for the policy lifecycle, CI enforcement details, and common failure
|
||||||
|
modes.
|
||||||
|
- `vault/roles.yaml` — JWT-auth role → policy bindings (S2.3); the
|
||||||
|
`vault-roles-validate` CI step above keeps it in lockstep with the
|
||||||
|
policies directory.
|
||||||
- Top-of-file headers in `server.hcl` / `client.hcl` / `vault.hcl`
|
- Top-of-file headers in `server.hcl` / `client.hcl` / `vault.hcl`
|
||||||
document the per-file ownership contract.
|
document the per-file ownership contract.
|
||||||
|
|
|
||||||
|
|
@ -49,12 +49,17 @@ validation.
|
||||||
1. Drop a file matching one of the four naming patterns above. Use an
|
1. Drop a file matching one of the four naming patterns above. Use an
|
||||||
existing file in the same family as the template — comment header,
|
existing file in the same family as the template — comment header,
|
||||||
capability list, and KV path layout should match the family.
|
capability list, and KV path layout should match the family.
|
||||||
2. Run `tools/vault-apply-policies.sh --dry-run` to confirm the new
|
2. Run `vault policy fmt <file>` locally so the formatting matches what
|
||||||
|
the CI fmt-check (step 4 of `.woodpecker/nomad-validate.yml`) will
|
||||||
|
accept. The fmt check runs non-destructively in CI but a dirty file
|
||||||
|
fails the step; running `fmt` locally before pushing is the fastest
|
||||||
|
path.
|
||||||
|
3. Add the matching entry to `../roles.yaml` (see "JWT-auth roles" below)
|
||||||
|
so the CI role-reference check (step 6) stays green.
|
||||||
|
4. Run `tools/vault-apply-policies.sh --dry-run` to confirm the new
|
||||||
basename appears in the planned-work list with the expected SHA.
|
basename appears in the planned-work list with the expected SHA.
|
||||||
3. Run `tools/vault-apply-policies.sh` against a Vault instance to
|
5. Run `tools/vault-apply-policies.sh` against a Vault instance to
|
||||||
create it; re-run to confirm it reports `unchanged`.
|
create it; re-run to confirm it reports `unchanged`.
|
||||||
4. The CI fmt + validate step lands in S2.6 (#884). Until then
|
|
||||||
`vault policy fmt <file>` locally is the fastest sanity check.
|
|
||||||
|
|
||||||
## JWT-auth roles (S2.3)
|
## JWT-auth roles (S2.3)
|
||||||
|
|
||||||
|
|
@ -118,6 +123,56 @@ would let one service's tokens outlive the others — add a field to
|
||||||
`vault/roles.yaml` and the applier at the same time if that ever
|
`vault/roles.yaml` and the applier at the same time if that ever
|
||||||
becomes necessary.
|
becomes necessary.
|
||||||
|
|
||||||
|
## Policy lifecycle
|
||||||
|
|
||||||
|
Adding a policy that an actual workload consumes is a three-step chain;
|
||||||
|
the CI pipeline guards each link.
|
||||||
|
|
||||||
|
1. **Add the policy HCL** — `vault/policies/<name>.hcl`, formatted with
|
||||||
|
`vault policy fmt`. Capabilities must be drawn from the Vault-recognized
|
||||||
|
set (`read`, `list`, `create`, `update`, `delete`, `patch`, `sudo`,
|
||||||
|
`deny`); a typo fails CI step 5 (HCL written to an inline dev-mode Vault
|
||||||
|
via `vault policy write` — a real parser, not a regex).
|
||||||
|
2. **Update `../roles.yaml`** — add a JWT-auth role entry whose `policy:`
|
||||||
|
field matches the new basename (without `.hcl`). CI step 6 re-checks
|
||||||
|
every role in this file against the policy set, so a drift between the
|
||||||
|
two directories fails the step.
|
||||||
|
3. **Reference from a Nomad jobspec** — add `vault { role = "<name>" }` in
|
||||||
|
`nomad/jobs/<service>.hcl` (owned by S2.4). Policies do not take effect
|
||||||
|
until a Nomad job asks for a token via that role.
|
||||||
|
|
||||||
|
See the "Adding a new service" walkthrough below for the applier-script
|
||||||
|
flow once steps 1–3 are committed.
|
||||||
|
|
||||||
|
## CI enforcement (`.woodpecker/nomad-validate.yml`)
|
||||||
|
|
||||||
|
The pipeline triggers on any PR touching `vault/policies/**`,
|
||||||
|
`vault/roles.yaml`, or `lib/init/nomad/vault-*.sh` and runs four
|
||||||
|
vault-scoped checks (in addition to the nomad-scoped steps already in
|
||||||
|
place):
|
||||||
|
|
||||||
|
| Step | Tool | What it catches |
|
||||||
|
|---|---|---|
|
||||||
|
| 4. `vault-policy-fmt` | `vault policy fmt` + `diff` | formatting drift — trailing whitespace, wrong indentation, missing newlines |
|
||||||
|
| 5. `vault-policy-validate` | `vault policy write` against inline dev Vault | HCL syntax errors, unknown stanzas, invalid capability names (e.g. `"frobnicate"`), malformed `path "..." {}` blocks |
|
||||||
|
| 6. `vault-roles-validate` | yamllint + PyYAML | roles.yaml syntax drift, missing required fields, role→policy references with no matching `.hcl` |
|
||||||
|
| P11 | `lib/secret-scan.sh` via `.woodpecker/secret-scan.yml` | literal secret leaked into a policy HCL (rare copy-paste mistake) — already covers `vault/**/*`, no duplicate step here |
|
||||||
|
|
||||||
|
All four steps are fail-closed — any error blocks merge. The pipeline
|
||||||
|
pins `hashicorp/vault:1.18.5` (matching `lib/init/nomad/install.sh`);
|
||||||
|
bumping the runtime version without bumping the CI image is a CI-caught
|
||||||
|
drift.
|
||||||
|
|
||||||
|
## Common failure modes
|
||||||
|
|
||||||
|
| Symptom in CI logs | Root cause | Fix |
|
||||||
|
|---|---|---|
|
||||||
|
| `vault-policy-fmt: … is not formatted — run 'vault policy fmt <file>'` | Trailing whitespace / mixed indent in an HCL file | `vault policy fmt <file>` locally and re-commit |
|
||||||
|
| `vault-policy-validate: … failed validation` plus a `policy` error from Vault | Unknown capability (e.g. `"frobnicate"`), unknown stanza, malformed `path` block | Fix the HCL; valid capabilities are `read`, `list`, `create`, `update`, `delete`, `patch`, `sudo`, `deny` |
|
||||||
|
| `vault-roles-validate: ERROR: role 'X' references policy 'Y' but vault/policies/Y.hcl does not exist` | A role's `policy:` field does not match any file basename in `vault/policies/` | Either add the missing policy HCL or fix the typo in `roles.yaml` |
|
||||||
|
| `vault-roles-validate: ERROR: role entry missing required field 'Z'` | A role in `roles.yaml` is missing one of `name`, `policy`, `namespace`, `job_id` | Add the field; all four are required |
|
||||||
|
| P11 `secret-scan: detected potential secret …` on a `.hcl` file | A literal token/password was pasted into a policy | Policies must name KV paths, not carry secret values — move the literal into KV (S2.2) and have the policy grant `read` on the path |
|
||||||
|
|
||||||
## What this directory does NOT own
|
## What this directory does NOT own
|
||||||
|
|
||||||
- **Attaching policies to Nomad jobs.** That's S2.4 (#882) via the
|
- **Attaching policies to Nomad jobs.** That's S2.4 (#882) via the
|
||||||
|
|
@ -125,4 +180,3 @@ becomes necessary.
|
||||||
name in `vault { role = "..." }` is what binds the policy.
|
name in `vault { role = "..." }` is what binds the policy.
|
||||||
- **Writing the secret values themselves.** That's S2.2 (#880) via
|
- **Writing the secret values themselves.** That's S2.2 (#880) via
|
||||||
`tools/vault-import.sh`.
|
`tools/vault-import.sh`.
|
||||||
- **CI policy fmt + validate + roles.yaml check.** That's S2.6 (#884).
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue