fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881)
All checks were successful
All checks were successful
Wires Nomad → Vault via workload identity so jobs can exchange their short-lived JWT for a Vault token carrying the policies in vault/policies/ — no shared VAULT_TOKEN in job env. - `lib/init/nomad/vault-nomad-auth.sh` — idempotent script: enable jwt auth at path `jwt-nomad`, config JWKS/algs, apply roles, install server.hcl + SIGHUP nomad on change. - `tools/vault-apply-roles.sh` — companion sync script (S2.1 sibling); reads vault/roles.yaml and upserts each Vault role under auth/jwt-nomad/role/<name> with created/updated/unchanged semantics. - `vault/roles.yaml` — declarative role→policy→bound_claims map; one entry per vault/policies/*.hcl. Keeps S2.1 policies and S2.3 role bindings visible side-by-side at review time. - `nomad/server.hcl` — adds vault stanza (enabled, address, default_identity.aud=["vault.io"], ttl=1h). - `lib/hvault.sh` — new `hvault_get_or_empty` helper shared between vault-apply-policies.sh, vault-apply-roles.sh, and vault-nomad-auth.sh; reads a Vault endpoint and distinguishes 200 / 404 / other. - `vault/policies/AGENTS.md` — extends S2.1 docs with JWT-auth role naming convention, token shape, and the "add new service" flow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
88e49b9e9d
commit
8efef9f1bb
7 changed files with 776 additions and 35 deletions
|
|
@ -55,12 +55,73 @@ validation.
|
|||
4. The CI fmt + validate step lands in S2.6 (#884). Until then
|
||||
`vault policy fmt <file>` locally is the fastest sanity check.
|
||||
|
||||
## JWT-auth roles (S2.3)
|
||||
|
||||
Policies are inert until a Vault token carrying them is minted. In this
|
||||
migration that mint path is JWT auth — Nomad jobs exchange their
|
||||
workload-identity JWT for a Vault token via
|
||||
`auth/jwt-nomad/role/<name>` → `token_policies = ["<policy>"]`. The
|
||||
role bindings live in [`../roles.yaml`](../roles.yaml); the script that
|
||||
enables the auth method + writes the config + applies roles is
|
||||
[`lib/init/nomad/vault-nomad-auth.sh`](../../lib/init/nomad/vault-nomad-auth.sh).
|
||||
The applier is [`tools/vault-apply-roles.sh`](../../tools/vault-apply-roles.sh).
|
||||
|
||||
### Role → policy naming convention
|
||||
|
||||
Role name == policy name, 1:1. `vault/roles.yaml` carries one entry per
|
||||
`vault/policies/*.hcl` file:
|
||||
|
||||
```yaml
|
||||
roles:
|
||||
- name: service-forgejo # Vault role
|
||||
policy: service-forgejo # ACL policy attached to minted tokens
|
||||
namespace: default # bound_claims.nomad_namespace
|
||||
job_id: forgejo # bound_claims.nomad_job_id
|
||||
```
|
||||
|
||||
The role name is what jobspecs reference via `vault { role = "..." }` —
|
||||
keep it identical to the policy basename so an S2.1↔S2.3 drift (new
|
||||
policy without a role, or vice versa) shows up in one directory review,
|
||||
not as a runtime "permission denied" at job placement.
|
||||
|
||||
`bound_claims.nomad_job_id` is the actual `job "..."` name in the
|
||||
jobspec, which may differ from the policy name (e.g. policy
|
||||
`service-forgejo` binds to job `forgejo`). Update it when each bot's or
|
||||
runner's jobspec lands.
|
||||
|
||||
### Adding a new service
|
||||
|
||||
1. Write `vault/policies/<name>.hcl` using the naming-table family that
|
||||
fits (`service-`, `bot-`, `runner-`, or standalone).
|
||||
2. Add a matching entry to `vault/roles.yaml` with all four fields
|
||||
(`name`, `policy`, `namespace`, `job_id`).
|
||||
3. Apply both — either in one shot via `lib/init/nomad/vault-nomad-auth.sh`
|
||||
(policies → roles → nomad SIGHUP), or granularly via
|
||||
`tools/vault-apply-policies.sh` + `tools/vault-apply-roles.sh`.
|
||||
4. Reference the role in the consuming jobspec's `vault { role = "<name>" }`.
|
||||
|
||||
### Token shape
|
||||
|
||||
All roles share the same token shape, hardcoded in
|
||||
`tools/vault-apply-roles.sh`:
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| `bound_audiences` | `["vault.io"]` — matches `default_identity.aud` in `nomad/server.hcl` |
|
||||
| `token_type` | `service` — auto-revoked when the task exits |
|
||||
| `token_ttl` | `1h` |
|
||||
| `token_max_ttl` | `24h` |
|
||||
|
||||
Bumping any of these is a knowing, repo-wide change. Per-role overrides
|
||||
would let one service's tokens outlive the others — add a field to
|
||||
`vault/roles.yaml` and the applier at the same time if that ever
|
||||
becomes necessary.
|
||||
|
||||
## What this directory does NOT own
|
||||
|
||||
- **Attaching policies to Nomad jobs.** That's S2.4 (#882) via the
|
||||
jobspec `template { vault { policies = […] } }` stanza.
|
||||
- **Enabling JWT auth + Nomad workload identity roles.** That's S2.3
|
||||
(#881).
|
||||
jobspec `template { vault { policies = […] } }` stanza — the role
|
||||
name in `vault { role = "..." }` is what binds the policy.
|
||||
- **Writing the secret values themselves.** That's S2.2 (#880) via
|
||||
`tools/vault-import.sh`.
|
||||
- **CI policy fmt + validate + roles.yaml check.** That's S2.6 (#884).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue