fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful

Wires Nomad → Vault via workload identity so jobs can exchange their
short-lived JWT for a Vault token carrying the policies in
vault/policies/ — no shared VAULT_TOKEN in job env.

- `lib/init/nomad/vault-nomad-auth.sh` — idempotent script: enable jwt
  auth at path `jwt-nomad`, config JWKS/algs, apply roles, install
  server.hcl + SIGHUP nomad on change.
- `tools/vault-apply-roles.sh` — companion sync script (S2.1 sibling);
  reads vault/roles.yaml and upserts each Vault role under
  auth/jwt-nomad/role/<name> with created/updated/unchanged semantics.
- `vault/roles.yaml` — declarative role→policy→bound_claims map; one
  entry per vault/policies/*.hcl. Keeps S2.1 policies and S2.3 role
  bindings visible side-by-side at review time.
- `nomad/server.hcl` — adds vault stanza (enabled, address,
  default_identity.aud=["vault.io"], ttl=1h).
- `lib/hvault.sh` — new `hvault_get_or_empty` helper shared between
  vault-apply-policies.sh, vault-apply-roles.sh, and vault-nomad-auth.sh;
  reads a Vault endpoint and distinguishes 200 / 404 / other.
- `vault/policies/AGENTS.md` — extends S2.1 docs with JWT-auth role
  naming convention, token shape, and the "add new service" flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude 2026-04-16 16:44:22 +00:00
parent 88e49b9e9d
commit 8efef9f1bb
7 changed files with 776 additions and 35 deletions

View file

@ -103,37 +103,6 @@ fi
hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Helper: fetch the on-server policy text, or empty if absent ──────────────
# Echoes the current policy content on stdout. A 404 (policy does not exist
# yet) is a non-error — we print nothing and exit 0 so the caller can treat
# the empty string as "needs create". Any other non-2xx is a hard failure.
#
# Uses a subshell + EXIT trap (not RETURN) for tmpfile cleanup: the RETURN
# trap does NOT fire on set-e abort, so if jq below tripped errexit the
# tmpfile would leak. Subshell exit propagates via the function's last-
# command exit status.
fetch_current_policy() {
local name="$1"
(
local tmp http_code
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
http_code="$(curl -sS -o "$tmp" -w '%{http_code}' \
-H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/sys/policies/acl/${name}")" \
|| { printf '[vault-apply] ERROR: curl failed for policy %s\n' "$name" >&2; exit 1; }
case "$http_code" in
200) jq -r '.data.policy // ""' < "$tmp" ;;
404) printf '' ;; # absent — caller treats as "create"
*)
printf '[vault-apply] ERROR: HTTP %s fetching policy %s:\n' "$http_code" "$name" >&2
cat "$tmp" >&2
exit 1
;;
esac
)
}
# ── Apply each policy, reporting created/updated/unchanged ───────────────────
log "syncing ${#POLICY_FILES[@]} polic(y|ies) from ${POLICIES_DIR}"
@ -141,8 +110,17 @@ for f in "${POLICY_FILES[@]}"; do
name="$(basename "$f" .hcl)"
desired="$(cat "$f")"
current="$(fetch_current_policy "$name")" \
# hvault_get_or_empty returns the raw JSON body on 200 or empty on 404.
# Extract the .data.policy field here (jq on "" yields "", so the
# empty-string-means-create branch below still works).
raw="$(hvault_get_or_empty "sys/policies/acl/${name}")" \
|| die "failed to read existing policy: ${name}"
if [ -n "$raw" ]; then
current="$(printf '%s' "$raw" | jq -r '.data.policy // ""')" \
|| die "failed to parse policy response: ${name}"
else
current=""
fi
if [ -z "$current" ]; then
hvault_policy_apply "$name" "$f" \