fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881)
All checks were successful
All checks were successful
Wires Nomad → Vault via workload identity so jobs can exchange their short-lived JWT for a Vault token carrying the policies in vault/policies/ — no shared VAULT_TOKEN in job env. - `lib/init/nomad/vault-nomad-auth.sh` — idempotent script: enable jwt auth at path `jwt-nomad`, config JWKS/algs, apply roles, install server.hcl + SIGHUP nomad on change. - `tools/vault-apply-roles.sh` — companion sync script (S2.1 sibling); reads vault/roles.yaml and upserts each Vault role under auth/jwt-nomad/role/<name> with created/updated/unchanged semantics. - `vault/roles.yaml` — declarative role→policy→bound_claims map; one entry per vault/policies/*.hcl. Keeps S2.1 policies and S2.3 role bindings visible side-by-side at review time. - `nomad/server.hcl` — adds vault stanza (enabled, address, default_identity.aud=["vault.io"], ttl=1h). - `lib/hvault.sh` — new `hvault_get_or_empty` helper shared between vault-apply-policies.sh, vault-apply-roles.sh, and vault-nomad-auth.sh; reads a Vault endpoint and distinguishes 200 / 404 / other. - `vault/policies/AGENTS.md` — extends S2.1 docs with JWT-auth role naming convention, token shape, and the "add new service" flow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
88e49b9e9d
commit
8efef9f1bb
7 changed files with 776 additions and 35 deletions
|
|
@ -55,12 +55,73 @@ validation.
|
|||
4. The CI fmt + validate step lands in S2.6 (#884). Until then
|
||||
`vault policy fmt <file>` locally is the fastest sanity check.
|
||||
|
||||
## JWT-auth roles (S2.3)
|
||||
|
||||
Policies are inert until a Vault token carrying them is minted. In this
|
||||
migration that mint path is JWT auth — Nomad jobs exchange their
|
||||
workload-identity JWT for a Vault token via
|
||||
`auth/jwt-nomad/role/<name>` → `token_policies = ["<policy>"]`. The
|
||||
role bindings live in [`../roles.yaml`](../roles.yaml); the script that
|
||||
enables the auth method + writes the config + applies roles is
|
||||
[`lib/init/nomad/vault-nomad-auth.sh`](../../lib/init/nomad/vault-nomad-auth.sh).
|
||||
The applier is [`tools/vault-apply-roles.sh`](../../tools/vault-apply-roles.sh).
|
||||
|
||||
### Role → policy naming convention
|
||||
|
||||
Role name == policy name, 1:1. `vault/roles.yaml` carries one entry per
|
||||
`vault/policies/*.hcl` file:
|
||||
|
||||
```yaml
|
||||
roles:
|
||||
- name: service-forgejo # Vault role
|
||||
policy: service-forgejo # ACL policy attached to minted tokens
|
||||
namespace: default # bound_claims.nomad_namespace
|
||||
job_id: forgejo # bound_claims.nomad_job_id
|
||||
```
|
||||
|
||||
The role name is what jobspecs reference via `vault { role = "..." }` —
|
||||
keep it identical to the policy basename so an S2.1↔S2.3 drift (new
|
||||
policy without a role, or vice versa) shows up in one directory review,
|
||||
not as a runtime "permission denied" at job placement.
|
||||
|
||||
`bound_claims.nomad_job_id` is the actual `job "..."` name in the
|
||||
jobspec, which may differ from the policy name (e.g. policy
|
||||
`service-forgejo` binds to job `forgejo`). Update it when each bot's or
|
||||
runner's jobspec lands.
|
||||
|
||||
### Adding a new service
|
||||
|
||||
1. Write `vault/policies/<name>.hcl` using the naming-table family that
|
||||
fits (`service-`, `bot-`, `runner-`, or standalone).
|
||||
2. Add a matching entry to `vault/roles.yaml` with all four fields
|
||||
(`name`, `policy`, `namespace`, `job_id`).
|
||||
3. Apply both — either in one shot via `lib/init/nomad/vault-nomad-auth.sh`
|
||||
(policies → roles → nomad SIGHUP), or granularly via
|
||||
`tools/vault-apply-policies.sh` + `tools/vault-apply-roles.sh`.
|
||||
4. Reference the role in the consuming jobspec's `vault { role = "<name>" }`.
|
||||
|
||||
### Token shape
|
||||
|
||||
All roles share the same token shape, hardcoded in
|
||||
`tools/vault-apply-roles.sh`:
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| `bound_audiences` | `["vault.io"]` — matches `default_identity.aud` in `nomad/server.hcl` |
|
||||
| `token_type` | `service` — auto-revoked when the task exits |
|
||||
| `token_ttl` | `1h` |
|
||||
| `token_max_ttl` | `24h` |
|
||||
|
||||
Bumping any of these is a knowing, repo-wide change. Per-role overrides
|
||||
would let one service's tokens outlive the others — add a field to
|
||||
`vault/roles.yaml` and the applier at the same time if that ever
|
||||
becomes necessary.
|
||||
|
||||
## What this directory does NOT own
|
||||
|
||||
- **Attaching policies to Nomad jobs.** That's S2.4 (#882) via the
|
||||
jobspec `template { vault { policies = […] } }` stanza.
|
||||
- **Enabling JWT auth + Nomad workload identity roles.** That's S2.3
|
||||
(#881).
|
||||
jobspec `template { vault { policies = […] } }` stanza — the role
|
||||
name in `vault { role = "..." }` is what binds the policy.
|
||||
- **Writing the secret values themselves.** That's S2.2 (#880) via
|
||||
`tools/vault-import.sh`.
|
||||
- **CI policy fmt + validate + roles.yaml check.** That's S2.6 (#884).
|
||||
|
|
|
|||
150
vault/roles.yaml
Normal file
150
vault/roles.yaml
Normal file
|
|
@ -0,0 +1,150 @@
|
|||
# =============================================================================
|
||||
# vault/roles.yaml — Vault JWT-auth role bindings for Nomad workload identity
|
||||
#
|
||||
# Part of the Nomad+Vault migration (S2.3, issue #881). One entry per
|
||||
# vault/policies/*.hcl policy. Each entry pairs:
|
||||
#
|
||||
# - the Vault role name (what a Nomad job references via
|
||||
# `vault { role = "..." }` in its jobspec), with
|
||||
# - the ACL policy attached to tokens it mints, and
|
||||
# - the bound claims that gate which Nomad workloads may authenticate
|
||||
# through that role (prevents a jobspec named "woodpecker" from
|
||||
# asking for role "service-forgejo").
|
||||
#
|
||||
# The source of truth for *what* secrets each role's token can read is
|
||||
# vault/policies/<policy>.hcl. This file only wires role→policy→claims.
|
||||
# Keeping the two side-by-side in the repo means an S2.1↔S2.3 drift
|
||||
# (new policy without a role, or vice versa) shows up in one directory
|
||||
# review, not as a runtime "permission denied" at job placement.
|
||||
#
|
||||
# All roles share the same constants (hardcoded in tools/vault-apply-roles.sh):
|
||||
# - bound_audiences = ["vault.io"] — Nomad's default workload-identity aud
|
||||
# - token_type = "service" — revoked when task exits
|
||||
# - token_ttl = "1h" — token lifetime
|
||||
# - token_max_ttl = "24h" — hard cap across renewals
|
||||
#
|
||||
# Format (strict — parsed line-by-line by tools/vault-apply-roles.sh with
|
||||
# awk; keep the "- name:" prefix + two-space nested indent exactly as
|
||||
# shown below):
|
||||
#
|
||||
# roles:
|
||||
# - name: <vault-role-name> # path: auth/jwt-nomad/role/<name>
|
||||
# policy: <acl-policy-name> # must match vault/policies/<name>.hcl
|
||||
# namespace: <nomad-namespace> # bound_claims.nomad_namespace
|
||||
# job_id: <nomad-job-id> # bound_claims.nomad_job_id
|
||||
#
|
||||
# All four fields are required. Comments (#) and blank lines are ignored.
|
||||
#
|
||||
# Adding a new role:
|
||||
# 1. Land the companion vault/policies/<name>.hcl in S2.1 style.
|
||||
# 2. Add a block here with all four fields.
|
||||
# 3. Run tools/vault-apply-roles.sh to upsert it.
|
||||
# 4. Re-run to confirm "role <name> unchanged".
|
||||
# =============================================================================
|
||||
roles:
|
||||
# ── Long-running services (nomad/jobs/<name>.hcl) ──────────────────────────
|
||||
# The jobspec's nomad job name is the bound job_id, e.g. `job "forgejo"`
|
||||
# in nomad/jobs/forgejo.hcl → job_id: forgejo. The policy name stays
|
||||
# `service-<name>` so the directory layout under vault/policies/ groups
|
||||
# platform services under a single prefix.
|
||||
- name: service-forgejo
|
||||
policy: service-forgejo
|
||||
namespace: default
|
||||
job_id: forgejo
|
||||
|
||||
- name: service-woodpecker
|
||||
policy: service-woodpecker
|
||||
namespace: default
|
||||
job_id: woodpecker
|
||||
|
||||
# ── Per-agent bots (nomad/jobs/bot-<role>.hcl — land in later steps) ───────
|
||||
# job_id placeholders match the policy name 1:1 until each bot's jobspec
|
||||
# lands. When a bot's jobspec is added under nomad/jobs/, update the
|
||||
# corresponding job_id here to match the jobspec's `job "<name>"` — and
|
||||
# CI's S2.6 roles.yaml check will confirm the pairing.
|
||||
- name: bot-dev
|
||||
policy: bot-dev
|
||||
namespace: default
|
||||
job_id: bot-dev
|
||||
|
||||
- name: bot-dev-qwen
|
||||
policy: bot-dev-qwen
|
||||
namespace: default
|
||||
job_id: bot-dev-qwen
|
||||
|
||||
- name: bot-review
|
||||
policy: bot-review
|
||||
namespace: default
|
||||
job_id: bot-review
|
||||
|
||||
- name: bot-gardener
|
||||
policy: bot-gardener
|
||||
namespace: default
|
||||
job_id: bot-gardener
|
||||
|
||||
- name: bot-planner
|
||||
policy: bot-planner
|
||||
namespace: default
|
||||
job_id: bot-planner
|
||||
|
||||
- name: bot-predictor
|
||||
policy: bot-predictor
|
||||
namespace: default
|
||||
job_id: bot-predictor
|
||||
|
||||
- name: bot-supervisor
|
||||
policy: bot-supervisor
|
||||
namespace: default
|
||||
job_id: bot-supervisor
|
||||
|
||||
- name: bot-architect
|
||||
policy: bot-architect
|
||||
namespace: default
|
||||
job_id: bot-architect
|
||||
|
||||
- name: bot-vault
|
||||
policy: bot-vault
|
||||
namespace: default
|
||||
job_id: bot-vault
|
||||
|
||||
# ── Edge dispatcher ────────────────────────────────────────────────────────
|
||||
- name: dispatcher
|
||||
policy: dispatcher
|
||||
namespace: default
|
||||
job_id: dispatcher
|
||||
|
||||
# ── Per-secret runner roles ────────────────────────────────────────────────
|
||||
# vault-runner (Step 5) composes runner-<NAME> policies onto each
|
||||
# ephemeral dispatch token based on the action TOML's `secrets = [...]`.
|
||||
# The per-dispatch runner jobspec job_id follows the same `runner-<NAME>`
|
||||
# convention (one jobspec per secret, minted per dispatch) so the bound
|
||||
# claim matches the role name directly.
|
||||
- name: runner-GITHUB_TOKEN
|
||||
policy: runner-GITHUB_TOKEN
|
||||
namespace: default
|
||||
job_id: runner-GITHUB_TOKEN
|
||||
|
||||
- name: runner-CODEBERG_TOKEN
|
||||
policy: runner-CODEBERG_TOKEN
|
||||
namespace: default
|
||||
job_id: runner-CODEBERG_TOKEN
|
||||
|
||||
- name: runner-CLAWHUB_TOKEN
|
||||
policy: runner-CLAWHUB_TOKEN
|
||||
namespace: default
|
||||
job_id: runner-CLAWHUB_TOKEN
|
||||
|
||||
- name: runner-DEPLOY_KEY
|
||||
policy: runner-DEPLOY_KEY
|
||||
namespace: default
|
||||
job_id: runner-DEPLOY_KEY
|
||||
|
||||
- name: runner-NPM_TOKEN
|
||||
policy: runner-NPM_TOKEN
|
||||
namespace: default
|
||||
job_id: runner-NPM_TOKEN
|
||||
|
||||
- name: runner-DOCKER_HUB_TOKEN
|
||||
policy: runner-DOCKER_HUB_TOKEN
|
||||
namespace: default
|
||||
job_id: runner-DOCKER_HUB_TOKEN
|
||||
Loading…
Add table
Add a link
Reference in a new issue