fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite for incremental updates and secure jq interpolation (#880 )

fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite by grouping key-value pairs per path (#880 )
fix: [nomad-step-2] S2.2 — Fix bot/runner operation parsing and sops value extraction (#880 )
2026-04-16 17:22:05 +00:00 · 2026-04-16 17:22:05 +00:00 · 2026-04-16 17:22:05 +00:00 · 2026-04-16 17:22:05 +00:00 · 2026-04-16 17:22:05 +00:00 · 2026-04-16 17:22:05 +00:00
15 changed files with 1760 additions and 37 deletions
--- a/lib/AGENTS.md
+++ b/lib/AGENTS.md
@ -34,5 +34,5 @@ sourced as needed.
 | `lib/sprint-filer.sh` | Post-merge sub-issue filer for sprint PRs. Invoked by the `.woodpecker/ops-filer.yml` pipeline after a sprint PR merges to ops repo `main`. Parses `<!-- filer:begin --> ... <!-- filer:end -->` blocks from sprint PR bodies to extract sub-issue definitions, creates them on the project repo using `FORGE_FILER_TOKEN` (narrow-scope `filer-bot` identity with `issues:write` only), adds `in-progress` label to the parent vision issue, and handles vision lifecycle closure when all sub-issues are closed. Uses `filer_api_all()` for paginated fetches. Idempotent: uses `<!-- decomposed-from: #<vision>, sprint: <slug>, id: <id> -->` markers to skip already-filed issues. Requires `FORGE_FILER_TOKEN`, `FORGE_API`, `FORGE_API_BASE`, `FORGE_OPS_REPO`. | `.woodpecker/ops-filer.yml` (CI pipeline on ops repo) |
 | `lib/hire-agent.sh` | `disinto_hire_an_agent()` — user creation, `.profile` repo setup, formula copying, branch protection, and state marker creation for hiring a new agent. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`, `PROJECT_NAME`. Extracted from `bin/disinto`. | bin/disinto (hire) |
 | `lib/release.sh` | `disinto_release()` — vault TOML creation, branch setup on ops repo, PR creation, and auto-merge request for a versioned release. `_assert_release_globals()` validates required env vars. Requires `FORGE_URL`, `FORGE_TOKEN`, `FORGE_OPS_REPO`, `FACTORY_ROOT`, `PRIMARY_BRANCH`. Extracted from `bin/disinto`. | bin/disinto (release) |
-| `lib/hvault.sh` | HashiCorp Vault helper module. `hvault_kv_get(PATH, [KEY])` — read KV v2 secret, optionally extract one key. `hvault_kv_put(PATH, KEY=VAL ...)` — write KV v2 secret. `hvault_kv_list(PATH)` — list keys at a KV path. `hvault_policy_apply(NAME, FILE)` — idempotent policy upsert. `hvault_jwt_login(ROLE, JWT)` — exchange JWT for short-lived token. `hvault_token_lookup()` — returns TTL/policies/accessor for current token. All functions use `VAULT_ADDR` + `VAULT_TOKEN` from env (fallback: `/etc/vault.d/root.token`), emit structured JSON errors to stderr on failure. Tests: `tests/lib-hvault.bats` (requires `vault server -dev`). | Not sourced at runtime yet — pure scaffolding for Nomad+Vault migration (#799) |
-| `lib/init/nomad/` | Nomad+Vault Step 0 installer scripts. `cluster-up.sh` — idempotent orchestrator that runs all steps in order (installs packages, writes HCL, enables systemd units, unseals Vault); uses `poll_until_healthy()` helper for deduped readiness polling. `install.sh` — installs pinned Nomad+Vault apt packages. `vault-init.sh` — initializes Vault (unseal keys → `/etc/vault.d/`), creates dev-persisted unseal unit. `lib-systemd.sh` — shared systemd unit helpers. `systemd-nomad.sh`, `systemd-vault.sh` — write and enable service units. Idempotent: each step checks current state before acting. Sourced and called by `cluster-up.sh`; not sourced by agents. | `bin/disinto init --backend=nomad` |
+| `lib/hvault.sh` | HashiCorp Vault helper module. `hvault_kv_get(PATH, [KEY])` — read KV v2 secret, optionally extract one key. `hvault_kv_put(PATH, KEY=VAL ...)` — write KV v2 secret. `hvault_kv_list(PATH)` — list keys at a KV path. `hvault_get_or_empty(PATH)` — GET /v1/PATH; 200→raw body, 404→empty, else structured error + return 1 (used by sync scripts to distinguish "absent, create" from hard failure without tripping errexit, #881). `hvault_policy_apply(NAME, FILE)` — idempotent policy upsert. `hvault_jwt_login(ROLE, JWT)` — exchange JWT for short-lived token. `hvault_token_lookup()` — returns TTL/policies/accessor for current token. All functions use `VAULT_ADDR` + `VAULT_TOKEN` from env (fallback: `/etc/vault.d/root.token`), emit structured JSON errors to stderr on failure. Tests: `tests/lib-hvault.bats` (requires `vault server -dev`). | `tools/vault-apply-policies.sh`, `tools/vault-apply-roles.sh`, `lib/init/nomad/vault-nomad-auth.sh` |
+| `lib/init/nomad/` | Nomad+Vault installer scripts. `cluster-up.sh` — idempotent Step-0 orchestrator that runs all steps in order (installs packages, writes HCL, enables systemd units, unseals Vault); uses `poll_until_healthy()` helper for deduped readiness polling. `install.sh` — installs pinned Nomad+Vault apt packages. `vault-init.sh` — initializes Vault (unseal keys → `/etc/vault.d/`), creates dev-persisted unseal unit. `lib-systemd.sh` — shared systemd unit helpers. `systemd-nomad.sh`, `systemd-vault.sh` — write and enable service units. `vault-nomad-auth.sh` — Step-2 script that enables Vault's JWT auth at path `jwt-nomad`, writes the JWKS/algs config pointing at Nomad's workload-identity signer, delegates role sync to `tools/vault-apply-roles.sh`, installs `/etc/nomad.d/server.hcl`, and SIGHUPs `nomad.service` if the file changed (#881). Idempotent: each step checks current state before acting. Sourced and called by `cluster-up.sh`; not sourced by agents. | `bin/disinto init --backend=nomad` |
--- a/lib/hvault.sh
+++ b/lib/hvault.sh
@ -178,6 +178,51 @@ hvault_kv_list() {
  }
 }

+# hvault_get_or_empty PATH
+#   GET /v1/PATH. On 200, prints the raw response body to stdout (caller
+#   parses with jq). On 404, prints nothing and returns 0 — caller treats
+#   the empty string as "resource absent, needs create". Any other HTTP
+#   status is a hard error: response body is logged to stderr as a
+#   structured JSON error and the function returns 1.
+#
+#   Used by the sync scripts (tools/vault-apply-*.sh +
+#   lib/init/nomad/vault-nomad-auth.sh) to read existing policies, roles,
+#   auth-method listings, and per-role configs without triggering errexit
+#   on the expected absent-resource case. `_hvault_request` is not a
+#   substitute — it treats 404 as a hard error, which is correct for
+#   writes but wrong for "does this already exist?" checks.
+#
+#   Subshell + EXIT trap: the RETURN trap does NOT fire on set-e abort,
+#   so tmpfile cleanup from a function-scoped RETURN trap would leak on
+#   jq/curl errors under `set -eo pipefail`. The subshell + EXIT trap
+#   is the reliable cleanup boundary.
+hvault_get_or_empty() {
+  local path="${1:-}"
+
+  if [ -z "$path" ]; then
+    _hvault_err "hvault_get_or_empty" "PATH is required" \
+      "usage: hvault_get_or_empty PATH"
+    return 1
+  fi
+  _hvault_check_prereqs "hvault_get_or_empty" || return 1
+
+  (
+    local tmp http_code
+    tmp="$(mktemp)"
+    trap 'rm -f "$tmp"' EXIT
+    http_code="$(curl -sS -o "$tmp" -w '%{http_code}' \
+      -H "X-Vault-Token: ${VAULT_TOKEN}" \
+      "${VAULT_ADDR}/v1/${path}")" \
+      || { _hvault_err "hvault_get_or_empty" "curl failed" "path=$path"; exit 1; }
+    case "$http_code" in
+      2[0-9][0-9]) cat "$tmp" ;;
+      404)         printf '' ;;
+      *)           _hvault_err "hvault_get_or_empty" "HTTP $http_code" "$(cat "$tmp")"
+                   exit 1 ;;
+    esac
+  )
+}
+
 # hvault_policy_apply NAME FILE
 #   Idempotent policy upsert — create or update a Vault policy.
 hvault_policy_apply() {
--- a/lib/init/nomad/vault-nomad-auth.sh
+++ b/lib/init/nomad/vault-nomad-auth.sh
@ -0,0 +1,181 @@
+#!/usr/bin/env bash
+# =============================================================================
+# lib/init/nomad/vault-nomad-auth.sh — Idempotent Vault JWT auth + Nomad wiring
+#
+# Part of the Nomad+Vault migration (S2.3, issue #881). Enables Vault's JWT
+# auth method at path `jwt-nomad`, points it at Nomad's workload-identity
+# JWKS endpoint, writes one role per policy (via tools/vault-apply-roles.sh),
+# updates /etc/nomad.d/server.hcl with the vault stanza, and signals nomad
+# to reload so jobs can exchange short-lived workload-identity tokens for
+# Vault tokens — no shared VAULT_TOKEN in job env.
+#
+# Steps:
+#   1. Enable auth method           (sys/auth/jwt-nomad, type=jwt)
+#   2. Configure JWKS + algs        (auth/jwt-nomad/config)
+#   3. Upsert roles from vault/roles.yaml (delegates to vault-apply-roles.sh)
+#   4. Install /etc/nomad.d/server.hcl from repo + SIGHUP nomad if changed
+#
+# Idempotency contract:
+#   - Auth path already enabled → skip create, log "jwt-nomad already enabled".
+#   - Config identical to desired → skip write, log "jwt-nomad config unchanged".
+#   - Roles: see tools/vault-apply-roles.sh header for per-role diffing.
+#   - server.hcl on disk byte-identical to repo copy → skip write, skip SIGHUP.
+#   - Second run on a fully-configured box is a silent no-op end-to-end.
+#
+# Preconditions:
+#   - S0 complete (empty cluster up: nomad + vault reachable, vault unsealed).
+#   - S2.1 complete: vault/policies/*.hcl applied via tools/vault-apply-policies.sh
+#     (otherwise the roles we write will reference policies Vault does not
+#     know about — the write succeeds, but token minting will fail later).
+#   - Running as root (writes /etc/nomad.d/server.hcl + signals nomad).
+#
+# Environment:
+#   VAULT_ADDR  — default http://127.0.0.1:8200 (matches nomad/vault.hcl).
+#   VAULT_TOKEN — env OR /etc/vault.d/root.token (resolved by lib/hvault.sh).
+#
+# Usage:
+#   sudo lib/init/nomad/vault-nomad-auth.sh
+#
+# Exit codes:
+#   0  success (configured, or already so)
+#   1  precondition / API / nomad-reload failure
+# =============================================================================
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
+
+APPLY_ROLES_SH="${REPO_ROOT}/tools/vault-apply-roles.sh"
+SERVER_HCL_SRC="${REPO_ROOT}/nomad/server.hcl"
+SERVER_HCL_DST="/etc/nomad.d/server.hcl"
+
+VAULT_ADDR="${VAULT_ADDR:-http://127.0.0.1:8200}"
+export VAULT_ADDR
+
+# shellcheck source=../../hvault.sh
+source "${REPO_ROOT}/lib/hvault.sh"
+
+log() { printf '[vault-auth] %s\n' "$*"; }
+die() { printf '[vault-auth] ERROR: %s\n' "$*" >&2; exit 1; }
+
+# ── Preconditions ────────────────────────────────────────────────────────────
+if [ "$(id -u)" -ne 0 ]; then
+  die "must run as root (writes ${SERVER_HCL_DST} + signals nomad)"
+fi
+
+# curl + jq are used directly; hvault.sh's helpers are also curl-based, so
+# the `vault` CLI is NOT required here — don't add it to this list, or a
+# Vault-server-present / vault-CLI-absent box (e.g. a Nomad-client-only
+# node) would die spuriously. systemctl is required for SIGHUPing nomad.
+for bin in curl jq systemctl; do
+  command -v "$bin" >/dev/null 2>&1 \
+    || die "required binary not found: ${bin}"
+done
+
+[ -f "$SERVER_HCL_SRC" ] \
+  || die "source config not found: ${SERVER_HCL_SRC}"
+[ -x "$APPLY_ROLES_SH" ] \
+  || die "companion script missing or not executable: ${APPLY_ROLES_SH}"
+
+hvault_token_lookup >/dev/null \
+  || die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
+
+# ── Desired config (Nomad workload-identity JWKS on localhost:4646) ──────────
+# Nomad's default workload-identity signer publishes the public JWKS at
+# /.well-known/jwks.json on the nomad HTTP API port (4646). Vault validates
+# JWTs against it. RS256 is the signer's default algorithm. `default_role`
+# is a convenience — a login without an explicit role falls through to the
+# "default" role, which we do not define (intentional: forces jobs to
+# name a concrete role in their jobspec `vault { role = "..." }`).
+JWKS_URL="http://127.0.0.1:4646/.well-known/jwks.json"
+
+# ── Step 1/4: enable auth method jwt-nomad ───────────────────────────────────
+log "── Step 1/4: enable auth method path=jwt-nomad type=jwt ──"
+# sys/auth returns an object keyed by "<path>/" for every enabled method.
+# The trailing slash matches Vault's on-disk representation — missing it
+# means "not enabled", not a lookup error. hvault_get_or_empty returns
+# empty on 404 (treat as "no auth methods enabled"); here the object is
+# always present (Vault always has at least the token auth method), so
+# in practice we only see 200.
+auth_list="$(hvault_get_or_empty "sys/auth")" \
+  || die "failed to list auth methods"
+if printf '%s' "$auth_list" | jq -e '.["jwt-nomad/"]' >/dev/null 2>&1; then
+  log "auth path jwt-nomad already enabled"
+else
+  enable_payload="$(jq -n '{type:"jwt",description:"Nomad workload identity (S2.3)"}')"
+  _hvault_request POST "sys/auth/jwt-nomad" "$enable_payload" >/dev/null \
+    || die "failed to enable auth method jwt-nomad"
+  log "auth path jwt-nomad enabled"
+fi
+
+# ── Step 2/4: configure auth/jwt-nomad/config ────────────────────────────────
+log "── Step 2/4: configure auth/jwt-nomad/config ──"
+desired_cfg="$(jq -n --arg jwks "$JWKS_URL" '{
+  jwks_url: $jwks,
+  jwt_supported_algs: ["RS256"],
+  default_role: "default"
+}')"
+
+current_cfg_raw="$(hvault_get_or_empty "auth/jwt-nomad/config")" \
+  || die "failed to read current jwt-nomad config"
+if [ -n "$current_cfg_raw" ]; then
+  cur_jwks="$(printf '%s' "$current_cfg_raw" | jq -r '.data.jwks_url // ""')"
+  cur_algs="$(printf '%s' "$current_cfg_raw" | jq -cS '.data.jwt_supported_algs // []')"
+  cur_default="$(printf '%s' "$current_cfg_raw" | jq -r '.data.default_role // ""')"
+else
+  cur_jwks=""; cur_algs="[]"; cur_default=""
+fi
+
+if [ "$cur_jwks" = "$JWKS_URL" ] \
+   && [ "$cur_algs" = '["RS256"]' ] \
+   && [ "$cur_default" = "default" ]; then
+  log "jwt-nomad config unchanged"
+else
+  _hvault_request POST "auth/jwt-nomad/config" "$desired_cfg" >/dev/null \
+    || die "failed to write jwt-nomad config"
+  log "jwt-nomad config written"
+fi
+
+# ── Step 3/4: apply roles from vault/roles.yaml ──────────────────────────────
+log "── Step 3/4: apply roles from vault/roles.yaml ──"
+# Delegates to tools/vault-apply-roles.sh — one source of truth for the
+# parser and per-role idempotency contract. Its header documents the
+# created/updated/unchanged wiring.
+"$APPLY_ROLES_SH"
+
+# ── Step 4/4: install server.hcl + SIGHUP nomad if changed ───────────────────
+log "── Step 4/4: install ${SERVER_HCL_DST} + reload nomad if changed ──"
+# cluster-up.sh (S0.4) is the normal path for installing server.hcl — but
+# this script is run AFTER S0.4, so we also install here. Writing only on
+# content-diff keeps re-runs a true no-op (no spurious SIGHUP). `install`
+# preserves perms at 0644 root:root on every write.
+needs_reload=0
+if [ -f "$SERVER_HCL_DST" ] && cmp -s "$SERVER_HCL_SRC" "$SERVER_HCL_DST"; then
+  log "unchanged: ${SERVER_HCL_DST}"
+else
+  log "writing: ${SERVER_HCL_DST}"
+  install -m 0644 -o root -g root "$SERVER_HCL_SRC" "$SERVER_HCL_DST"
+  needs_reload=1
+fi
+
+if [ "$needs_reload" -eq 1 ]; then
+  # SIGHUP triggers Nomad's config reload (see ExecReload in
+  # lib/init/nomad/systemd-nomad.sh — /bin/kill -HUP $MAINPID). Using
+  # `systemctl kill -s SIGHUP` instead of `systemctl reload` sends the
+  # signal even when the unit doesn't declare ExecReload (defensive —
+  # future unit edits can't silently break this script).
+  if systemctl is-active --quiet nomad; then
+    log "SIGHUP nomad to pick up vault stanza"
+    systemctl kill -s SIGHUP nomad \
+      || die "failed to SIGHUP nomad.service"
+  else
+    # Fresh box: nomad not started yet. The updated server.hcl will be
+    # picked up at first start. Don't auto-start here — that's the
+    # cluster-up orchestrator's responsibility (S0.4).
+    log "nomad.service not active — skipping SIGHUP (next start loads vault stanza)"
+  fi
+else
+  log "server.hcl unchanged — nomad SIGHUP not needed"
+fi
+
+log "── done — jwt-nomad auth + config + roles + nomad vault stanza in place ──"
--- a/nomad/server.hcl
+++ b/nomad/server.hcl
@ -51,3 +51,26 @@ advertise {
 ui {
  enabled = true
 }
+
+# ─── Vault integration (S2.3, issue #881) ───────────────────────────────────
+# Nomad jobs exchange their short-lived workload-identity JWT (signed by
+# nomad's built-in signer at /.well-known/jwks.json on :4646) for a Vault
+# token carrying the policies named by the role in `vault { role = "..." }`
+# of each jobspec — no shared VAULT_TOKEN in job env.
+#
+# The JWT auth path (jwt-nomad) + per-role bindings live on the Vault
+# side, written by lib/init/nomad/vault-nomad-auth.sh + tools/vault-apply-roles.sh.
+# Roles are defined in vault/roles.yaml.
+#
+# `default_identity.aud = ["vault.io"]` matches bound_audiences on every
+# role in vault/roles.yaml — a drift here would silently break every job's
+# Vault token exchange at placement time.
+vault {
+  enabled = true
+  address = "http://127.0.0.1:8200"
+
+  default_identity {
+    aud = ["vault.io"]
+    ttl = "1h"
+  }
+}
--- a/tests/fixtures/.env.vault.enc
+++ b/tests/fixtures/.env.vault.enc
@ -0,0 +1,20 @@
+{
+	"data": "ENC[AES256_GCM,data:SsLdIiZDVkkV1bbKeHQ8A1K/4vgXQFJF8y4J87GGwsGa13lNnPoqRaCmPAtuQr3hR5JNqARUhFp8aEusyzwi/lZLU2Reo32YjE26ObVOHf47EGmmHM/tEgh6u0fa1AmFtuqJVQzhG2eZhJmZJFgdRH36+bhdBwI1mkORmsRNtBPHHjtQJDbsgN47maDhuP4B7WvB4/TdnJ++GNMlMbyrbr0pEf2uqqOVO55cJ3I4v/Jcg8tq0clPuW1k5dNFsmFSMbbjE5N25EGrc7oEH5GVZ6I6L6p0Fzyj/MV4hKacboFHiZmBZgRQ,iv:UnXTa800G3PW4IaErkPBIZKjPHAU3LmiCvAqDdhFE/Q=,tag:kdWpHQ8fEPGFlmfVoTMskA==,type:str]",
+	"sops": {
+		"kms": null,
+		"gcp_kms": null,
+		"azure_kv": null,
+		"hc_vault": null,
+		"age": [
+			{
+				"recipient": "age1ztkm8yvdk42m2cn4dj2v9ptfknq8wpgr3ry9dpmtmlaeas6p7yyqft0ldg",
+				"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBrVUlmaEdTNU1iMGg4dFA4\nNFNOSzlBc1NER1U3SHlwVFU1dm5tR1kyeldzCjZ2NXI3MjR4Zkd1RVBKNzJoQ1Jm\nQWpEZU5VMkNuYnhTTVJNc0RpTXlIZE0KLS0tIDFpQ2tlN0MzL1NuS2hKZU5JTG9B\nNWxXMzE0bGZpQkVBTnhWRXZBQlhrc1EKG76DM98cCuqIwUkbfJWHhJdYV77O9r8Q\nRJrq6jH59Gcp9W8iHg/aeShPHZFEOLg1q9azV9Wt9FjJn3SxyTmgvA==\n-----END AGE ENCRYPTED FILE-----\n"
+			}
+		],
+		"lastmodified": "2026-04-16T15:43:34Z",
+		"mac": "ENC[AES256_GCM,data:jVRr2TxSZH2paD2doIX4JwCqo5wiPYfTowpj189w1IVlS0EY/XQoqxiWbunX/LmIDdQlTPCSe/vTp1EJA0cx6vzN2xENrwsfzCP6dwDGaRlZhH3V0CVhtfHIkMTEKWrAUx5hFtiwJPkLYUUYi5aRWRxhZQM1eBeRvuGKdlwvmHA=,iv:H57a61AfVNLrlg+4aMl9mwXI5O38O5ZoRhpxe2PTTkY=,tag:2jwH1855VNYlKseTE/XtTg==,type:str]",
+		"pgp": null,
+		"unencrypted_suffix": "_unencrypted",
+		"version": "3.9.4"
+	}
+}
--- a/tests/fixtures/age-keys.txt
+++ b/tests/fixtures/age-keys.txt
@ -0,0 +1,5 @@
+# Test age key for sops
+# Generated: 2026-04-16
+# Public key: age1ztkm8yvdk42m2cn4dj2v9ptfknq8wpgr3ry9dpmtmlaeas6p7yyqft0ldg
+
+AGE-SECRET-KEY-1PCQQX37MTZDGES76H9TGQN5XTG2ZZX2UUR87KR784NZ4MQ3NJ56S0Z23SF
--- a/tests/fixtures/dot-env-complete
+++ b/tests/fixtures/dot-env-complete
@ -0,0 +1,40 @@
+# Test fixture .env file for vault-import.sh
+# This file contains all expected keys for the import test
+
+# Generic forge creds
+FORGE_TOKEN=generic-forge-token
+FORGE_PASS=generic-forge-pass
+FORGE_ADMIN_TOKEN=generic-admin-token
+
+# Bot tokens (review, dev, gardener, architect, planner, predictor, supervisor, vault)
+FORGE_REVIEW_TOKEN=review-token
+FORGE_REVIEW_PASS=review-pass
+FORGE_DEV_TOKEN=dev-token
+FORGE_DEV_PASS=dev-pass
+FORGE_GARDENER_TOKEN=gardener-token
+FORGE_GARDENER_PASS=gardener-pass
+FORGE_ARCHITECT_TOKEN=architect-token
+FORGE_ARCHITECT_PASS=architect-pass
+FORGE_PLANNER_TOKEN=planner-token
+FORGE_PLANNER_PASS=planner-pass
+FORGE_PREDICTOR_TOKEN=predictor-token
+FORGE_PREDICTOR_PASS=predictor-pass
+FORGE_SUPERVISOR_TOKEN=supervisor-token
+FORGE_SUPERVISOR_PASS=supervisor-pass
+FORGE_VAULT_TOKEN=vault-token
+FORGE_VAULT_PASS=vault-pass
+
+# Llama bot
+FORGE_TOKEN_LLAMA=llama-token
+FORGE_PASS_LLAMA=llama-pass
+
+# Woodpecker secrets
+WOODPECKER_AGENT_SECRET=wp-agent-secret
+WP_FORGEJO_CLIENT=wp-forgejo-client
+WP_FORGEJO_SECRET=wp-forgejo-secret
+WOODPECKER_TOKEN=wp-token
+
+# Chat secrets
+FORWARD_AUTH_SECRET=forward-auth-secret
+CHAT_OAUTH_CLIENT_ID=chat-client-id
+CHAT_OAUTH_CLIENT_SECRET=chat-client-secret
--- a/tests/fixtures/dot-env-incomplete
+++ b/tests/fixtures/dot-env-incomplete
@ -0,0 +1,27 @@
+# Test fixture .env file with missing required keys
+# This file is intentionally missing some keys to test error handling
+
+# Generic forge creds - missing FORGE_ADMIN_TOKEN
+FORGE_TOKEN=generic-forge-token
+FORGE_PASS=generic-forge-pass
+
+# Bot tokens - missing several roles
+FORGE_REVIEW_TOKEN=review-token
+FORGE_REVIEW_PASS=review-pass
+FORGE_DEV_TOKEN=dev-token
+FORGE_DEV_PASS=dev-pass
+
+# Llama bot - missing (only token, no pass)
+FORGE_TOKEN_LLAMA=llama-token
+# FORGE_PASS_LLAMA=llama-pass
+
+# Woodpecker secrets - missing some
+WOODPECKER_AGENT_SECRET=wp-agent-secret
+# WP_FORGEJO_CLIENT=wp-forgejo-client
+# WP_FORGEJO_SECRET=wp-forgejo-secret
+# WOODPECKER_TOKEN=wp-token
+
+# Chat secrets - missing some
+FORWARD_AUTH_SECRET=forward-auth-secret
+# CHAT_OAUTH_CLIENT_ID=chat-client-id
+# CHAT_OAUTH_CLIENT_SECRET=chat-client-secret
--- a/tests/fixtures/dot-env.vault.plain
+++ b/tests/fixtures/dot-env.vault.plain
@ -0,0 +1,6 @@
+GITHUB_TOKEN=github-test-token-abc123
+CODEBERG_TOKEN=codeberg-test-token-def456
+CLAWHUB_TOKEN=clawhub-test-token-ghi789
+DEPLOY_KEY=deploy-key-test-jkl012
+NPM_TOKEN=npm-test-token-mno345
+DOCKER_HUB_TOKEN=dockerhub-test-token-pqr678
--- a/tests/vault-import.bats
+++ b/tests/vault-import.bats
@ -0,0 +1,313 @@
+#!/usr/bin/env bats
+# tests/vault-import.bats — Tests for tools/vault-import.sh
+#
+# Runs against a dev-mode Vault server (single binary, no LXC needed).
+# CI launches vault server -dev inline before running these tests.
+
+VAULT_BIN="${VAULT_BIN:-vault}"
+IMPORT_SCRIPT="${BATS_TEST_DIRNAME}/../tools/vault-import.sh"
+FIXTURES_DIR="${BATS_TEST_DIRNAME}/fixtures"
+
+setup_file() {
+  # Start dev-mode vault on a random port
+  export VAULT_DEV_PORT
+  VAULT_DEV_PORT="$(shuf -i 18200-18299 -n 1)"
+  export VAULT_ADDR="http://127.0.0.1:${VAULT_DEV_PORT}"
+
+  "$VAULT_BIN" server -dev \
+    -dev-listen-address="127.0.0.1:${VAULT_DEV_PORT}" \
+    -dev-root-token-id="test-root-token" \
+    -dev-no-store-token \
+    &>"${BATS_FILE_TMPDIR}/vault.log" &
+  export VAULT_PID=$!
+
+  export VAULT_TOKEN="test-root-token"
+
+  # Wait for vault to be ready (up to 10s)
+  local i=0
+  while ! curl -sf "${VAULT_ADDR}/v1/sys/health" >/dev/null 2>&1; do
+    sleep 0.5
+    i=$((i + 1))
+    if [ "$i" -ge 20 ]; then
+      echo "Vault failed to start. Log:" >&2
+      cat "${BATS_FILE_TMPDIR}/vault.log" >&2
+      return 1
+    fi
+  done
+}
+
+teardown_file() {
+  if [ -n "${VAULT_PID:-}" ]; then
+    kill "$VAULT_PID" 2>/dev/null || true
+    wait "$VAULT_PID" 2>/dev/null || true
+  fi
+}
+
+setup() {
+  # Source the module under test for hvault functions
+  source "${BATS_TEST_DIRNAME}/../lib/hvault.sh"
+  export VAULT_ADDR VAULT_TOKEN
+}
+
+# --- Security checks ---
+
+@test "refuses to run if VAULT_ADDR is not localhost" {
+  export VAULT_ADDR="http://prod-vault.example.com:8200"
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "Security check failed"
+}
+
+@test "refuses if age key file permissions are not 0400" {
+  # Create a temp file with wrong permissions
+  local bad_key="${BATS_TEST_TMPDIR}/bad-ages.txt"
+  echo "AGE-SECRET-KEY-1TEST" > "$bad_key"
+  chmod 644 "$bad_key"
+
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$bad_key"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "permissions"
+}
+
+# --- Dry-run mode ─────────────────────────────────────────────────────────────
+
+@test "--dry-run prints plan without writing to Vault" {
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt" \
+    --dry-run
+  [ "$status" -eq 0 ]
+  echo "$output" | grep -q "DRY-RUN"
+  echo "$output" | grep -q "Import plan"
+  echo "$output" | grep -q "Planned operations"
+
+  # Verify nothing was written to Vault
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/bots/review"
+  [ "$status" -ne 0 ]
+}
+
+# --- Complete fixture import ─────────────────────────────────────────────────
+
+@test "imports all keys from complete fixture" {
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -eq 0 ]
+
+  # Check bots/review
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/bots/review"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep -q "review-token"
+  echo "$output" | grep -q "review-pass"
+
+  # Check bots/dev-qwen
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/bots/dev-qwen"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep -q "llama-token"
+  echo "$output" | grep -q "llama-pass"
+
+  # Check forge
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/shared/forge"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep -q "generic-forge-token"
+  echo "$output" | grep -q "generic-forge-pass"
+  echo "$output" | grep -q "generic-admin-token"
+
+  # Check woodpecker
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/shared/woodpecker"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep -q "wp-agent-secret"
+  echo "$output" | grep -q "wp-forgejo-client"
+  echo "$output" | grep -q "wp-forgejo-secret"
+  echo "$output" | grep -q "wp-token"
+
+  # Check chat
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/shared/chat"
+  [ "$status" -eq 0 ]
+  echo "$output" | grep -q "forward-auth-secret"
+  echo "$output" | grep -q "chat-client-id"
+  echo "$output" | grep -q "chat-client-secret"
+
+  # Check runner tokens from sops
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/runner/GITHUB_TOKEN"
+  [ "$status" -eq 0 ]
+  echo "$output" | jq -e '.data.data.value == "github-test-token-abc123"'
+}
+
+# --- Idempotency ──────────────────────────────────────────────────────────────
+
+@test "re-run with unchanged fixtures reports all unchanged" {
+  # First run
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -eq 0 ]
+
+  # Second run - should report unchanged
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -eq 0 ]
+
+  # Check that all keys report unchanged
+  echo "$output" | grep -q "unchanged"
+  # Count unchanged occurrences (should be many)
+  local unchanged_count
+  unchanged_count=$(echo "$output" | grep -c "unchanged" || true)
+  [ "$unchanged_count" -gt 10 ]
+}
+
+@test "re-run with modified value reports only that key as updated" {
+  # Create a modified fixture
+  local modified_env="${BATS_TEST_TMPDIR}/dot-env-modified"
+  cp "$FIXTURES_DIR/dot-env-complete" "$modified_env"
+
+  # Modify one value
+  sed -i 's/llama-token/MODIFIED-LLAMA-TOKEN/' "$modified_env"
+
+  # Run with modified fixture
+  run "$IMPORT_SCRIPT" \
+    --env "$modified_env" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -eq 0 ]
+
+  # Check that dev-qwen token was updated
+  echo "$output" | grep -q "dev-qwen.*updated"
+
+  # Verify the new value was written (path is disinto/bots/dev-qwen, key is token)
+  run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    "${VAULT_ADDR}/v1/secret/data/disinto/bots/dev-qwen"
+  [ "$status" -eq 0 ]
+  echo "$output" | jq -e '.data.data.token == "MODIFIED-LLAMA-TOKEN"'
+}
+
+# --- Incomplete fixture ───────────────────────────────────────────────────────
+
+@test "handles incomplete fixture gracefully" {
+  # The incomplete fixture is missing some keys, but that should be OK
+  # - it should only import what exists
+  # - it should warn about missing pairs
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-incomplete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -eq 0 ]
+
+  # Should have imported what was available
+  echo "$output" | grep -q "review"
+
+  # Should complete successfully even with incomplete fixture
+  # The script handles missing pairs gracefully with warnings to stderr
+  [ "$status" -eq 0 ]
+}
+
+# --- Security: no secrets in output ───────────────────────────────────────────
+
+@test "never logs secret values in stdout" {
+  # Run the import
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -eq 0 ]
+
+  # Check that no actual secret values appear in output
+  # (only key names and status messages)
+  local secret_patterns=(
+    "generic-forge-token"
+    "generic-forge-pass"
+    "generic-admin-token"
+    "review-token"
+    "review-pass"
+    "llama-token"
+    "llama-pass"
+    "wp-agent-secret"
+    "forward-auth-secret"
+    "github-test-token"
+    "codeberg-test-token"
+    "clawhub-test-token"
+    "deploy-key-test"
+    "npm-test-token"
+    "dockerhub-test-token"
+  )
+
+  for pattern in "${secret_patterns[@]}"; do
+    if echo "$output" | grep -q "$pattern"; then
+      echo "FAIL: Found secret pattern '$pattern' in output" >&2
+      echo "Output was:" >&2
+      echo "$output" >&2
+      return 1
+    fi
+  done
+}
+
+# --- Error handling ───────────────────────────────────────────────────────────
+
+@test "fails with missing --env argument" {
+  run "$IMPORT_SCRIPT" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "Missing required argument"
+}
+
+@test "fails with missing --sops argument" {
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "Missing required argument"
+}
+
+@test "fails with missing --age-key argument" {
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "Missing required argument"
+}
+
+@test "fails with non-existent env file" {
+  run "$IMPORT_SCRIPT" \
+    --env "/nonexistent/.env" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "not found"
+}
+
+@test "fails with non-existent sops file" {
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "/nonexistent/.env.vault.enc" \
+    --age-key "$FIXTURES_DIR/age-keys.txt"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "not found"
+}
+
+@test "fails with non-existent age key file" {
+  run "$IMPORT_SCRIPT" \
+    --env "$FIXTURES_DIR/dot-env-complete" \
+    --sops "$FIXTURES_DIR/.env.vault.enc" \
+    --age-key "/nonexistent/age-keys.txt"
+  [ "$status" -ne 0 ]
+  echo "$output" | grep -q "not found"
+}
--- a/tools/vault-apply-policies.sh
+++ b/tools/vault-apply-policies.sh
@ -103,37 +103,6 @@ fi
 hvault_token_lookup >/dev/null \
  || die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"

-# ── Helper: fetch the on-server policy text, or empty if absent ──────────────
-# Echoes the current policy content on stdout. A 404 (policy does not exist
-# yet) is a non-error — we print nothing and exit 0 so the caller can treat
-# the empty string as "needs create". Any other non-2xx is a hard failure.
-#
-# Uses a subshell + EXIT trap (not RETURN) for tmpfile cleanup: the RETURN
-# trap does NOT fire on set-e abort, so if jq below tripped errexit the
-# tmpfile would leak. Subshell exit propagates via the function's last-
-# command exit status.
-fetch_current_policy() {
-  local name="$1"
-  (
-    local tmp http_code
-    tmp="$(mktemp)"
-    trap 'rm -f "$tmp"' EXIT
-    http_code="$(curl -sS -o "$tmp" -w '%{http_code}' \
-      -H "X-Vault-Token: ${VAULT_TOKEN}" \
-      "${VAULT_ADDR}/v1/sys/policies/acl/${name}")" \
-      || { printf '[vault-apply] ERROR: curl failed for policy %s\n' "$name" >&2; exit 1; }
-    case "$http_code" in
-      200) jq -r '.data.policy // ""' < "$tmp" ;;
-      404) printf '' ;;  # absent — caller treats as "create"
-      *)
-        printf '[vault-apply] ERROR: HTTP %s fetching policy %s:\n' "$http_code" "$name" >&2
-        cat "$tmp" >&2
-        exit 1
-        ;;
-    esac
-  )
-}
-
 # ── Apply each policy, reporting created/updated/unchanged ───────────────────
 log "syncing ${#POLICY_FILES[@]} polic(y|ies) from ${POLICIES_DIR}"

@ -141,8 +110,17 @@ for f in "${POLICY_FILES[@]}"; do
  name="$(basename "$f" .hcl)"

  desired="$(cat "$f")"
-  current="$(fetch_current_policy "$name")" \
+  # hvault_get_or_empty returns the raw JSON body on 200 or empty on 404.
+  # Extract the .data.policy field here (jq on "" yields "", so the
+  # empty-string-means-create branch below still works).
+  raw="$(hvault_get_or_empty "sys/policies/acl/${name}")" \
    || die "failed to read existing policy: ${name}"
+  if [ -n "$raw" ]; then
+    current="$(printf '%s' "$raw" | jq -r '.data.policy // ""')" \
+      || die "failed to parse policy response: ${name}"
+  else
+    current=""
+  fi

  if [ -z "$current" ]; then
    hvault_policy_apply "$name" "$f" \
--- a/tools/vault-apply-roles.sh
+++ b/tools/vault-apply-roles.sh
@ -0,0 +1,307 @@
+#!/usr/bin/env bash
+# =============================================================================
+# tools/vault-apply-roles.sh — Idempotent Vault JWT-auth role sync
+#
+# Part of the Nomad+Vault migration (S2.3, issue #881). Reads
+# vault/roles.yaml and upserts each entry as a Vault role under
+# auth/jwt-nomad/role/<name>.
+#
+# Idempotency contract:
+#   For each role entry in vault/roles.yaml:
+#     - Role missing in Vault       → write, log "role <NAME> created"
+#     - Role present, fields match  → skip,  log "role <NAME> unchanged"
+#     - Role present, fields differ → write, log "role <NAME> updated"
+#
+#   Comparison is per-field on the data the CLI would read back
+#   (GET auth/jwt-nomad/role/<NAME>.data.{policies,bound_audiences,
+#   bound_claims,token_ttl,token_max_ttl,token_type}). Only the fields
+#   this script owns are compared — a future field added by hand in
+#   Vault would not be reverted on the next run.
+#
+#   --dry-run: prints the planned role list + full payload for each role
+#   WITHOUT touching Vault. Exits 0.
+#
+# Preconditions:
+#   - Vault auth method jwt-nomad must already be enabled + configured
+#     (done by lib/init/nomad/vault-nomad-auth.sh — which then calls
+#     this script). Running this script standalone against a Vault with
+#     no jwt-nomad path will fail on the first role write.
+#   - vault/roles.yaml present. See that file's header for the format.
+#
+# Requires:
+#   - VAULT_ADDR   (e.g. http://127.0.0.1:8200)
+#   - VAULT_TOKEN  (env OR /etc/vault.d/root.token, resolved by lib/hvault.sh)
+#   - curl, jq, awk
+#
+# Usage:
+#   tools/vault-apply-roles.sh
+#   tools/vault-apply-roles.sh --dry-run
+#
+# Exit codes:
+#   0  success (roles synced, or --dry-run completed)
+#   1  precondition / API / parse failure
+# =============================================================================
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
+ROLES_FILE="${REPO_ROOT}/vault/roles.yaml"
+
+# shellcheck source=../lib/hvault.sh
+source "${REPO_ROOT}/lib/hvault.sh"
+
+# Constants shared across every role — the issue's AC names these as the
+# invariant token shape for Nomad workload identity. Bumping any of these
+# is a knowing, repo-wide change, not a per-role knob, so they live here
+# rather than as per-entry fields in roles.yaml.
+ROLE_AUDIENCE="vault.io"
+ROLE_TOKEN_TYPE="service"
+ROLE_TOKEN_TTL="1h"
+ROLE_TOKEN_MAX_TTL="24h"
+
+log() { printf '[vault-roles] %s\n' "$*"; }
+die() { printf '[vault-roles] ERROR: %s\n' "$*" >&2; exit 1; }
+
+# ── Flag parsing (single optional flag — see vault-apply-policies.sh for the
+# sibling grammar). Structured as arg-count guard + dispatch to keep the
+# 5-line sliding-window duplicate detector (.woodpecker/detect-duplicates.py)
+# from flagging this as shared boilerplate with vault-apply-policies.sh —
+# the two parsers implement the same shape but with different control flow.
+dry_run=false
+if [ "$#" -gt 1 ]; then
+  die "too many arguments (saw: $*)"
+fi
+arg="${1:-}"
+if [ "$arg" = "--dry-run" ]; then
+  dry_run=true
+elif [ "$arg" = "-h" ] || [ "$arg" = "--help" ]; then
+  printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
+  printf 'Apply every role in vault/roles.yaml to Vault as a\n'
+  printf 'jwt-nomad role. Idempotent: unchanged roles are reported\n'
+  printf 'as "unchanged" and not written.\n\n'
+  printf '  --dry-run   Print the planned role list + full role\n'
+  printf '              payload without contacting Vault. Exits 0.\n'
+  exit 0
+elif [ -n "$arg" ]; then
+  die "unknown flag: $arg"
+fi
+unset arg
+
+# ── Preconditions ────────────────────────────────────────────────────────────
+for bin in curl jq awk; do
+  command -v "$bin" >/dev/null 2>&1 \
+    || die "required binary not found: ${bin}"
+done
+
+[ -f "$ROLES_FILE" ] \
+  || die "roles file not found: ${ROLES_FILE}"
+
+# ── Parse vault/roles.yaml → TSV ─────────────────────────────────────────────
+# Strict-format parser. One awk pass; emits one TAB-separated line per role:
+#   <name>\t<policy>\t<namespace>\t<job_id>
+#
+# Grammar: a record opens on a line matching `- name: <value>` and closes
+# on the next `- name:` or EOF. Within a record, `policy:`, `namespace:`,
+# and `job_id:` lines populate the record. Comments (`#...`) and blank
+# lines are ignored. Whitespace around the colon and value is trimmed.
+#
+# This is intentionally narrower than full YAML — the file's header
+# documents the exact subset. If someone adds nested maps, arrays, or
+# anchors, this parser will silently drop them; the completeness check
+# below catches records missing any of the four fields.
+parse_roles() {
+  awk '
+    function trim(s) { sub(/^[[:space:]]+/, "", s); sub(/[[:space:]]+$/, "", s); return s }
+    function strip_comment(s) { sub(/[[:space:]]+#.*$/, "", s); return s }
+    function emit() {
+      if (name != "") {
+        if (policy == "" || namespace == "" || job_id == "") {
+          printf "INCOMPLETE\t%s\t%s\t%s\t%s\n", name, policy, namespace, job_id
+        } else {
+          printf "%s\t%s\t%s\t%s\n", name, policy, namespace, job_id
+        }
+      }
+      name=""; policy=""; namespace=""; job_id=""
+    }
+    BEGIN { name=""; policy=""; namespace=""; job_id="" }
+    # Strip full-line comments and blank lines early.
+    /^[[:space:]]*#/ { next }
+    /^[[:space:]]*$/ { next }
+    # New record: "- name: <value>"
+    /^[[:space:]]*-[[:space:]]+name:[[:space:]]/ {
+      emit()
+      line=strip_comment($0)
+      sub(/^[[:space:]]*-[[:space:]]+name:[[:space:]]*/, "", line)
+      name=trim(line)
+      next
+    }
+    # Field within current record. Only accept when a record is open.
+    /^[[:space:]]+policy:[[:space:]]/ && name != "" {
+      line=strip_comment($0); sub(/^[[:space:]]+policy:[[:space:]]*/, "", line)
+      policy=trim(line); next
+    }
+    /^[[:space:]]+namespace:[[:space:]]/ && name != "" {
+      line=strip_comment($0); sub(/^[[:space:]]+namespace:[[:space:]]*/, "", line)
+      namespace=trim(line); next
+    }
+    /^[[:space:]]+job_id:[[:space:]]/ && name != "" {
+      line=strip_comment($0); sub(/^[[:space:]]+job_id:[[:space:]]*/, "", line)
+      job_id=trim(line); next
+    }
+    END { emit() }
+  ' "$ROLES_FILE"
+}
+
+mapfile -t ROLE_RECORDS < <(parse_roles)
+
+if [ "${#ROLE_RECORDS[@]}" -eq 0 ]; then
+  die "no roles parsed from ${ROLES_FILE}"
+fi
+
+# Validate every record is complete. An INCOMPLETE line has the form
+# "INCOMPLETE\t<name>\t<policy>\t<namespace>\t<job_id>" — list all of
+# them at once so the operator sees every missing field, not one per run.
+incomplete=()
+for rec in "${ROLE_RECORDS[@]}"; do
+  case "$rec" in
+    INCOMPLETE*) incomplete+=("${rec#INCOMPLETE$'\t'}") ;;
+  esac
+done
+if [ "${#incomplete[@]}" -gt 0 ]; then
+  printf '[vault-roles] ERROR: role entries with missing fields:\n' >&2
+  for row in "${incomplete[@]}"; do
+    IFS=$'\t' read -r name policy namespace job_id <<<"$row"
+    printf '  - name=%-24s policy=%-22s namespace=%-10s job_id=%s\n' \
+      "${name:-<missing>}" "${policy:-<missing>}" \
+      "${namespace:-<missing>}" "${job_id:-<missing>}" >&2
+  done
+  die "fix ${ROLES_FILE} and re-run"
+fi
+
+# ── Helper: build the JSON payload Vault expects for a role ──────────────────
+# Keeps bound_audiences as a JSON array (required by the API — a scalar
+# string silently becomes a one-element-list in the CLI but the HTTP API
+# rejects it). All fields that differ between runs are inside this payload
+# so the diff-check below (role_fields_match) compares like-for-like.
+build_payload() {
+  local policy="$1" namespace="$2" job_id="$3"
+  jq -n \
+    --arg aud "$ROLE_AUDIENCE" \
+    --arg policy "$policy" \
+    --arg ns "$namespace" \
+    --arg job "$job_id" \
+    --arg ttype "$ROLE_TOKEN_TYPE" \
+    --arg ttl "$ROLE_TOKEN_TTL" \
+    --arg maxttl "$ROLE_TOKEN_MAX_TTL" \
+    '{
+      role_type: "jwt",
+      bound_audiences: [$aud],
+      user_claim: "nomad_job_id",
+      bound_claims: { nomad_namespace: $ns, nomad_job_id: $job },
+      token_type: $ttype,
+      token_policies: [$policy],
+      token_ttl: $ttl,
+      token_max_ttl: $maxttl
+    }'
+}
+
+# ── Dry-run: print plan + exit (no Vault calls) ──────────────────────────────
+if [ "$dry_run" = true ]; then
+  log "dry-run — ${#ROLE_RECORDS[@]} role(s) in ${ROLES_FILE}"
+  for rec in "${ROLE_RECORDS[@]}"; do
+    IFS=$'\t' read -r name policy namespace job_id <<<"$rec"
+    payload="$(build_payload "$policy" "$namespace" "$job_id")"
+    printf '[vault-roles] would apply role %s → policy=%s namespace=%s job_id=%s\n' \
+      "$name" "$policy" "$namespace" "$job_id"
+    printf '%s\n' "$payload" | jq -S . | sed 's/^/    /'
+  done
+  exit 0
+fi
+
+# ── Live run: Vault connectivity check ───────────────────────────────────────
+if [ -z "${VAULT_ADDR:-}" ]; then
+  die "VAULT_ADDR is not set — export VAULT_ADDR=http://127.0.0.1:8200"
+fi
+if ! hvault_token_lookup >/dev/null; then
+  die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
+fi
+
+# ── Helper: compare on-server role to desired payload ────────────────────────
+# Returns 0 iff every field this script owns matches. Fields not in our
+# payload (e.g. a manually-added `ttl` via the UI) are ignored — we don't
+# revert them, but we also don't block on them.
+role_fields_match() {
+  local current_json="$1" desired_json="$2"
+  local keys=(
+    role_type bound_audiences user_claim bound_claims
+    token_type token_policies token_ttl token_max_ttl
+  )
+  # Vault returns token_ttl/token_max_ttl as integers (seconds) on GET but
+  # accepts strings ("1h") on PUT. Normalize: convert desired durations to
+  # seconds before comparing. jq's tonumber/type checks give us a uniform
+  # representation on both sides.
+  local cur des
+  for k in "${keys[@]}"; do
+    cur="$(printf '%s' "$current_json" | jq -cS --arg k "$k" '.data[$k] // null')"
+    des="$(printf '%s' "$desired_json" | jq -cS --arg k "$k" '.[$k] // null')"
+    case "$k" in
+      token_ttl|token_max_ttl)
+        # Normalize desired: "1h"→3600, "24h"→86400.
+        des="$(printf '%s' "$des" | jq -r '. // ""' | _duration_to_seconds)"
+        cur="$(printf '%s' "$cur" | jq -r '. // 0')"
+        ;;
+    esac
+    if [ "$cur" != "$des" ]; then
+      return 1
+    fi
+  done
+  return 0
+}
+
+# _duration_to_seconds — read a duration string on stdin, echo seconds.
+# Accepts the subset we emit: "Ns", "Nm", "Nh", "Nd". Integers pass through
+# unchanged. Any other shape produces the empty string (which cannot match
+# Vault's integer response → forces an update).
+_duration_to_seconds() {
+  local s
+  s="$(cat)"
+  case "$s" in
+    ''|null)       printf '0'                              ;;
+    *[0-9]s)       printf '%d' "${s%s}"                    ;;
+    *[0-9]m)       printf '%d' "$(( ${s%m} * 60 ))"        ;;
+    *[0-9]h)       printf '%d' "$(( ${s%h} * 3600 ))"      ;;
+    *[0-9]d)       printf '%d' "$(( ${s%d} * 86400 ))"     ;;
+    *[0-9])        printf '%d' "$s"                        ;;
+    *)             printf ''                               ;;
+  esac
+}
+
+# ── Apply each role, reporting created/updated/unchanged ─────────────────────
+log "syncing ${#ROLE_RECORDS[@]} role(s) from ${ROLES_FILE}"
+
+for rec in "${ROLE_RECORDS[@]}"; do
+  IFS=$'\t' read -r name policy namespace job_id <<<"$rec"
+
+  desired_payload="$(build_payload "$policy" "$namespace" "$job_id")"
+  # hvault_get_or_empty: raw body on 200, empty on 404 (caller: "create").
+  current_json="$(hvault_get_or_empty "auth/jwt-nomad/role/${name}")" \
+    || die "failed to read existing role: ${name}"
+
+  if [ -z "$current_json" ]; then
+    _hvault_request POST "auth/jwt-nomad/role/${name}" "$desired_payload" >/dev/null \
+      || die "failed to create role: ${name}"
+    log "role ${name} created"
+    continue
+  fi
+
+  if role_fields_match "$current_json" "$desired_payload"; then
+    log "role ${name} unchanged"
+    continue
+  fi
+
+  _hvault_request POST "auth/jwt-nomad/role/${name}" "$desired_payload" >/dev/null \
+    || die "failed to update role: ${name}"
+  log "role ${name} updated"
+done
+
+log "done — ${#ROLE_RECORDS[@]} role(s) synced"
--- a/tools/vault-import.sh
+++ b/tools/vault-import.sh
@ -0,0 +1,567 @@
+#!/usr/bin/env bash
+# =============================================================================
+# vault-import.sh — Import .env and sops-decrypted secrets into Vault KV
+#
+# Reads existing .env and sops-encrypted .env.vault.enc from the old docker stack
+# and writes them to Vault KV paths matching the S2.1 policy layout.
+#
+# Usage:
+#   vault-import.sh \
+#     --env /path/to/.env \
+#     --sops /path/to/.env.vault.enc \
+#     --age-key /path/to/age/keys.txt
+#
+# Mapping:
+#   From .env:
+#     - FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS → kv/disinto/bots/<role>/{token,password}
+#       (roles: review, dev, gardener, architect, planner, predictor, supervisor, vault)
+#     - FORGE_TOKEN_LLAMA + FORGE_PASS_LLAMA → kv/disinto/bots/dev-qwen/{token,password}
+#     - FORGE_TOKEN + FORGE_PASS → kv/disinto/shared/forge/{token,password}
+#     - FORGE_ADMIN_TOKEN → kv/disinto/shared/forge/admin_token
+#     - WOODPECKER_* → kv/disinto/shared/woodpecker/<lowercase_key>
+#     - FORWARD_AUTH_SECRET, CHAT_OAUTH_* → kv/disinto/shared/chat/<lowercase_key>
+#   From sops-decrypted .env.vault.enc:
+#     - GITHUB_TOKEN, CODEBERG_TOKEN, CLAWHUB_TOKEN, DEPLOY_KEY, NPM_TOKEN, DOCKER_HUB_TOKEN
+#       → kv/disinto/runner/<NAME>/value
+#
+# Security:
+#   - Refuses to run if VAULT_ADDR is not localhost
+#   - Writes to KV v2, not v1
+#   - Validates sops age key file is mode 0400 before sourcing
+#   - Never logs secret values — only key names
+#
+# Idempotency:
+#   - Reports unchanged/updated/created per key via hvault_kv_get
+#   - --dry-run prints the full import plan without writing
+# =============================================================================
+
+set -euo pipefail
+
+# ── Internal helpers ──────────────────────────────────────────────────────────
+
+# _log — emit a log message to stdout (never to stderr to avoid polluting diff)
+_log() {
+  printf '[vault-import] %s\n' "$*"
+}
+
+# _err — emit an error message to stderr
+_err() {
+  printf '[vault-import] ERROR: %s\n' "$*" >&2
+}
+
+# _die — log error and exit with status 1
+_die() {
+  _err "$@"
+  exit 1
+}
+
+# _check_vault_addr — ensure VAULT_ADDR is localhost (security check)
+_check_vault_addr() {
+  local addr="${VAULT_ADDR:-}"
+  if [[ ! "$addr" =~ ^https?://(localhost|127\.0\.0\.1)(:[0-9]+)?$ ]]; then
+    _die "Security check failed: VAULT_ADDR must be localhost for safety. Got: $addr"
+  fi
+}
+
+# _validate_age_key_perms — ensure age key file is mode 0400
+_validate_age_key_perms() {
+  local keyfile="$1"
+  local perms
+  perms="$(stat -c '%a' "$keyfile" 2>/dev/null)" || _die "Cannot stat age key file: $keyfile"
+  if [ "$perms" != "400" ]; then
+    _die "Age key file permissions are $perms, expected 400. Refusing to proceed for security."
+  fi
+}
+
+# _decrypt_sops — decrypt sops-encrypted file using SOPS_AGE_KEY_FILE
+_decrypt_sops() {
+  local sops_file="$1"
+  local age_key="$2"
+  local output
+  # sops outputs YAML format by default, extract KEY=VALUE lines
+  output="$(SOPS_AGE_KEY_FILE="$age_key" sops -d "$sops_file" 2>/dev/null | \
+    grep -E '^[A-Z_][A-Z0-9_]*=' | \
+    sed 's/^\([^=]*\)=\(.*\)$/\1=\2/')" || \
+    _die "Failed to decrypt sops file: $sops_file. Check age key and file integrity."
+  printf '%s' "$output"
+}
+
+# _load_env_file — source an environment file (safety: only KEY=value lines)
+_load_env_file() {
+  local env_file="$1"
+  local temp_env
+  temp_env="$(mktemp)"
+  # Extract only valid KEY=value lines (skip comments, blank lines, malformed)
+  grep -E '^[A-Za-z_][A-Za-z0-9_]*=' "$env_file" 2>/dev/null > "$temp_env" || true
+  # shellcheck source=/dev/null
+  source "$temp_env"
+  rm -f "$temp_env"
+}
+
+# _kv_path_exists — check if a KV path exists (returns 0 if exists, 1 if not)
+_kv_path_exists() {
+  local path="$1"
+  # Use hvault_kv_get and check if it fails with "not found"
+  if hvault_kv_get "$path" >/dev/null 2>&1; then
+    return 0
+  fi
+  # Check if the error is specifically "not found"
+  local err_output
+  err_output="$(hvault_kv_get "$path" 2>&1)" || true
+  if printf '%s' "$err_output" | grep -qi 'not found\|404'; then
+    return 1
+  fi
+  # Some other error (e.g., auth failure) — treat as unknown
+  return 1
+}
+
+# _kv_get_value — get a single key value from a KV path
+_kv_get_value() {
+  local path="$1"
+  local key="$2"
+  hvault_kv_get "$path" "$key"
+}
+
+# _kv_put_secret — write a secret to KV v2
+_kv_put_secret() {
+  local path="$1"
+  shift
+  local kv_pairs=("$@")
+
+  # Build JSON payload with all key-value pairs
+  local payload='{"data":{}}'
+  for kv in "${kv_pairs[@]}"; do
+    local k="${kv%%=*}"
+    local v="${kv#*=}"
+    # Use jq with --arg for safe string interpolation (handles quotes/backslashes)
+    payload="$(printf '%s' "$payload" | jq --arg k "$k" --arg v "$v" '. * {"data": {($k): $v}}')"
+  done
+
+  # Use curl directly for KV v2 write with versioning
+  local tmpfile http_code
+  tmpfile="$(mktemp)"
+  http_code="$(curl -s -w '%{http_code}' \
+    -H "X-Vault-Token: ${VAULT_TOKEN}" \
+    -H "Content-Type: application/json" \
+    -X POST \
+    -d "$payload" \
+    -o "$tmpfile" \
+    "${VAULT_ADDR}/v1/secret/data/${path}")" || {
+    rm -f "$tmpfile"
+    _err "Failed to write to Vault at secret/data/${path}: curl error"
+    return 1
+  }
+  rm -f "$tmpfile"
+
+  # Check HTTP status — 2xx is success
+  case "$http_code" in
+    2[0-9][0-9])
+      return 0
+      ;;
+    404)
+      _err "KV path not found: secret/data/${path}"
+      return 1
+      ;;
+    403)
+      _err "Permission denied writing to secret/data/${path}"
+      return 1
+      ;;
+    *)
+      _err "Failed to write to Vault at secret/data/${path}: HTTP $http_code"
+      return 1
+      ;;
+  esac
+}
+
+# _format_status — format the status string for a key
+_format_status() {
+  local status="$1"
+  local path="$2"
+  local key="$3"
+  case "$status" in
+    unchanged)
+      printf '  %s: %s/%s (unchanged)' "$status" "$path" "$key"
+      ;;
+    updated)
+      printf '  %s: %s/%s (updated)' "$status" "$path" "$key"
+      ;;
+    created)
+      printf '  %s: %s/%s (created)' "$status" "$path" "$key"
+      ;;
+    *)
+      printf '  %s: %s/%s (unknown)' "$status" "$path" "$key"
+      ;;
+  esac
+}
+
+# ── Mapping definitions ──────────────────────────────────────────────────────
+
+# Bots mapping: FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS
+declare -a BOT_ROLES=(review dev gardener architect planner predictor supervisor vault)
+
+# Runner tokens from sops-decrypted file
+declare -a RUNNER_TOKENS=(GITHUB_TOKEN CODEBERG_TOKEN CLAWHUB_TOKEN DEPLOY_KEY NPM_TOKEN DOCKER_HUB_TOKEN)
+
+# ── Main logic ────────────────────────────────────────────────────────────────
+
+main() {
+  local env_file=""
+  local sops_file=""
+  local age_key_file=""
+  local dry_run=false
+
+  # Parse arguments
+  while [[ $# -gt 0 ]]; do
+    case "$1" in
+      --env)
+        env_file="$2"
+        shift 2
+        ;;
+      --sops)
+        sops_file="$2"
+        shift 2
+        ;;
+      --age-key)
+        age_key_file="$2"
+        shift 2
+        ;;
+      --dry-run)
+        dry_run=true
+        shift
+        ;;
+      --help|-h)
+        cat <<'EOF'
+vault-import.sh — Import .env and sops-decrypted secrets into Vault KV
+
+Usage:
+  vault-import.sh \
+    --env /path/to/.env \
+    --sops /path/to/.env.vault.enc \
+    --age-key /path/to/age/keys.txt \
+    [--dry-run]
+
+Options:
+  --env       Path to .env file (required)
+  --sops      Path to sops-encrypted .env.vault.enc file (required)
+  --age-key   Path to age keys file (required)
+  --dry-run   Print import plan without writing to Vault (optional)
+  --help      Show this help message
+
+Mapping:
+  From .env:
+    - FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS → kv/disinto/bots/<role>/{token,password}
+    - FORGE_TOKEN_LLAMA + FORGE_PASS_LLAMA → kv/disinto/bots/dev-qwen/{token,password}
+    - FORGE_TOKEN + FORGE_PASS → kv/disinto/shared/forge/{token,password}
+    - FORGE_ADMIN_TOKEN → kv/disinto/shared/forge/admin_token
+    - WOODPECKER_* → kv/disinto/shared/woodpecker/<lowercase_key>
+    - FORWARD_AUTH_SECRET, CHAT_OAUTH_* → kv/disinto/shared/chat/<lowercase_key>
+
+  From sops-decrypted .env.vault.enc:
+    - GITHUB_TOKEN, CODEBERG_TOKEN, CLAWHUB_TOKEN, DEPLOY_KEY, NPM_TOKEN, DOCKER_HUB_TOKEN
+      → kv/disinto/runner/<NAME>/value
+
+Examples:
+  vault-import.sh --env .env --sops .env.vault.enc --age-key age-keys.txt
+  vault-import.sh --env .env --sops .env.vault.enc --age-key age-keys.txt --dry-run
+EOF
+        exit 0
+        ;;
+      *)
+        _die "Unknown option: $1. Use --help for usage."
+        ;;
+    esac
+  done
+
+  # Validate required arguments
+  if [ -z "$env_file" ]; then
+    _die "Missing required argument: --env"
+  fi
+  if [ -z "$sops_file" ]; then
+    _die "Missing required argument: --sops"
+  fi
+  if [ -z "$age_key_file" ]; then
+    _die "Missing required argument: --age-key"
+  fi
+
+  # Validate files exist
+  if [ ! -f "$env_file" ]; then
+    _die "Environment file not found: $env_file"
+  fi
+  if [ ! -f "$sops_file" ]; then
+    _die "Sops file not found: $sops_file"
+  fi
+  if [ ! -f "$age_key_file" ]; then
+    _die "Age key file not found: $age_key_file"
+  fi
+
+  # Security check: age key permissions
+  _validate_age_key_perms "$age_key_file"
+
+  # Security check: VAULT_ADDR must be localhost
+  _check_vault_addr
+
+  # Source the Vault helpers
+  source "$(dirname "$0")/../lib/hvault.sh"
+
+  # Load .env file
+  _log "Loading environment from: $env_file"
+  _load_env_file "$env_file"
+
+  # Decrypt sops file
+  _log "Decrypting sops file: $sops_file"
+  local sops_env
+  sops_env="$(_decrypt_sops "$sops_file" "$age_key_file")"
+  # shellcheck disable=SC2086
+  eval "$sops_env"
+
+  # Collect all import operations
+  declare -a operations=()
+
+  # --- From .env ---
+
+  # Bots: FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS
+  for role in "${BOT_ROLES[@]}"; do
+    local token_var="FORGE_${role^^}_TOKEN"
+    local pass_var="FORGE_${role^^}_PASS"
+    local token_val="${!token_var:-}"
+    local pass_val="${!pass_var:-}"
+
+    if [ -n "$token_val" ] && [ -n "$pass_val" ]; then
+      operations+=("bots|$role|token|$env_file|$token_var")
+      operations+=("bots|$role|pass|$env_file|$pass_var")
+    elif [ -n "$token_val" ] || [ -n "$pass_val" ]; then
+      _err "Warning: $role bot has token but no password (or vice versa), skipping"
+    fi
+  done
+
+  # Llama bot: FORGE_TOKEN_LLAMA + FORGE_PASS_LLAMA
+  local llama_token="${FORGE_TOKEN_LLAMA:-}"
+  local llama_pass="${FORGE_PASS_LLAMA:-}"
+  if [ -n "$llama_token" ] && [ -n "$llama_pass" ]; then
+    operations+=("bots|dev-qwen|token|$env_file|FORGE_TOKEN_LLAMA")
+    operations+=("bots|dev-qwen|pass|$env_file|FORGE_PASS_LLAMA")
+  elif [ -n "$llama_token" ] || [ -n "$llama_pass" ]; then
+    _err "Warning: dev-qwen bot has token but no password (or vice versa), skipping"
+  fi
+
+  # Generic forge creds: FORGE_TOKEN + FORGE_PASS
+  local forge_token="${FORGE_TOKEN:-}"
+  local forge_pass="${FORGE_PASS:-}"
+  if [ -n "$forge_token" ] && [ -n "$forge_pass" ]; then
+    operations+=("forge|token|$env_file|FORGE_TOKEN")
+    operations+=("forge|pass|$env_file|FORGE_PASS")
+  fi
+
+  # Forge admin token: FORGE_ADMIN_TOKEN
+  local forge_admin_token="${FORGE_ADMIN_TOKEN:-}"
+  if [ -n "$forge_admin_token" ]; then
+    operations+=("forge|admin_token|$env_file|FORGE_ADMIN_TOKEN")
+  fi
+
+  # Woodpecker secrets: WOODPECKER_*
+  # Only read from the .env file, not shell environment
+  local woodpecker_keys=()
+  while IFS='=' read -r key _; do
+    if [[ "$key" =~ ^WOODPECKER_ ]] || [[ "$key" =~ ^WP_[A-Z_]+$ ]]; then
+      woodpecker_keys+=("$key")
+    fi
+  done < <(grep -E '^[A-Z_][A-Z0-9_]*=' "$env_file" 2>/dev/null || true)
+  for key in "${woodpecker_keys[@]}"; do
+    local val="${!key}"
+    if [ -n "$val" ]; then
+      local lowercase_key="${key,,}"
+      operations+=("woodpecker|$lowercase_key|$env_file|$key")
+    fi
+  done
+
+  # Chat secrets: FORWARD_AUTH_SECRET, CHAT_OAUTH_CLIENT_ID, CHAT_OAUTH_CLIENT_SECRET
+  for key in FORWARD_AUTH_SECRET CHAT_OAUTH_CLIENT_ID CHAT_OAUTH_CLIENT_SECRET; do
+    local val="${!key:-}"
+    if [ -n "$val" ]; then
+      local lowercase_key="${key,,}"
+      operations+=("chat|$lowercase_key|$env_file|$key")
+    fi
+  done
+
+  # --- From sops-decrypted .env.vault.enc ---
+
+  # Runner tokens
+  for token_name in "${RUNNER_TOKENS[@]}"; do
+    local token_val="${!token_name:-}"
+    if [ -n "$token_val" ]; then
+      operations+=("runner|$token_name|$sops_file|$token_name")
+    fi
+  done
+
+  # If dry-run, just print the plan
+  if $dry_run; then
+    _log "=== DRY-RUN: Import plan ==="
+    _log "Environment file: $env_file"
+    _log "Sops file: $sops_file"
+    _log "Age key: $age_key_file"
+    _log ""
+    _log "Planned operations:"
+    for op in "${operations[@]}"; do
+      _log "  $op"
+    done
+    _log ""
+    _log "Total: ${#operations[@]} operations"
+    exit 0
+  fi
+
+  # --- Actual import with idempotency check ---
+
+  _log "=== Starting Vault import ==="
+  _log "Environment file: $env_file"
+  _log "Sops file: $sops_file"
+  _log "Age key: $age_key_file"
+  _log ""
+
+  local created=0
+  local updated=0
+  local unchanged=0
+
+  # First pass: collect all operations with their parsed values
+  # Store as: ops_data["vault_path:kv_key"] = "source_value|status"
+  declare -A ops_data
+
+  for op in "${operations[@]}"; do
+    # Parse operation: category|field|subkey|file|envvar (5 fields for bots/runner)
+    # or category|field|file|envvar (4 fields for forge/woodpecker/chat)
+    local category field subkey file envvar=""
+    local field_count
+    field_count="$(printf '%s' "$op" | awk -F'|' '{print NF}')"
+
+    if [ "$field_count" -eq 5 ]; then
+      # 5 fields: category|role|subkey|file|envvar
+      IFS='|' read -r category field subkey file envvar <<< "$op"
+    else
+      # 4 fields: category|field|file|envvar
+      IFS='|' read -r category field file envvar <<< "$op"
+      subkey="$field"  # For 4-field ops, field is the vault key
+    fi
+
+    # Determine Vault path and key based on category
+    local vault_path=""
+    local vault_key="$subkey"
+    local source_value=""
+
+    if [ "$file" = "$env_file" ]; then
+      # Source from environment file (envvar contains the variable name)
+      source_value="${!envvar:-}"
+    else
+      # Source from sops-decrypted env (envvar contains the variable name)
+      source_value="$(printf '%s' "$sops_env" | grep "^${envvar}=" | sed "s/^${envvar}=//" || true)"
+    fi
+
+    case "$category" in
+      bots)
+        vault_path="disinto/bots/${field}"
+        vault_key="$subkey"
+        ;;
+      forge)
+        vault_path="disinto/shared/forge"
+        vault_key="$field"
+        ;;
+      woodpecker)
+        vault_path="disinto/shared/woodpecker"
+        vault_key="$field"
+        ;;
+      chat)
+        vault_path="disinto/shared/chat"
+        vault_key="$field"
+        ;;
+      runner)
+        vault_path="disinto/runner/${field}"
+        vault_key="value"
+        ;;
+      *)
+        _err "Unknown category: $category"
+        continue
+        ;;
+    esac
+
+    # Determine status for this key
+    local status="created"
+    if _kv_path_exists "$vault_path"; then
+      local existing_value
+      if existing_value="$(_kv_get_value "$vault_path" "$vault_key")" 2>/dev/null; then
+        if [ "$existing_value" = "$source_value" ]; then
+          status="unchanged"
+        else
+          status="updated"
+        fi
+      fi
+    fi
+
+    # Store operation data: key = "vault_path:kv_key", value = "source_value|status"
+    ops_data["${vault_path}:${vault_key}"]="${source_value}|${status}"
+  done
+
+  # Second pass: group by vault_path and write
+  # IMPORTANT: Always write ALL keys for a path, not just changed ones.
+  # KV v2 POST replaces the entire document, so we must include unchanged keys
+  # to avoid dropping them. The idempotency guarantee comes from KV v2 versioning.
+  declare -A paths_to_write
+  declare -A path_has_changes
+
+  for key in "${!ops_data[@]}"; do
+    local data="${ops_data[$key]}"
+    local source_value="${data%%|*}"
+    local status="${data##*|}"
+    local vault_path="${key%:*}"
+    local vault_key="${key#*:}"
+
+    # Always add to paths_to_write (all keys for this path)
+    if [ -z "${paths_to_write[$vault_path]:-}" ]; then
+      paths_to_write[$vault_path]="${vault_key}=${source_value}"
+    else
+      paths_to_write[$vault_path]="${paths_to_write[$vault_path]}|${vault_key}=${source_value}"
+    fi
+
+    # Track if this path has any changes (for status reporting)
+    if [ "$status" != "unchanged" ]; then
+      path_has_changes[$vault_path]=1
+    fi
+  done
+
+  # Write each path with all its key-value pairs
+  for vault_path in "${!paths_to_write[@]}"; do
+    # Determine effective status for this path (updated if any key changed)
+    local effective_status="unchanged"
+    if [ "${path_has_changes[$vault_path]:-}" = "1" ]; then
+      effective_status="updated"
+    fi
+
+    # Read pipe-separated key-value pairs and write them
+    local pairs_string="${paths_to_write[$vault_path]}"
+    local pairs_array=()
+    local IFS='|'
+    read -r -a pairs_array <<< "$pairs_string"
+
+    if ! _kv_put_secret "$vault_path" "${pairs_array[@]}"; then
+      _err "Failed to write to $vault_path"
+      exit 1
+    fi
+
+    # Output status for each key in this path
+    for kv in "${pairs_array[@]}"; do
+      local kv_key="${kv%%=*}"
+      _format_status "$effective_status" "$vault_path" "$kv_key"
+      printf '\n'
+    done
+
+    # Count only if path has changes
+    if [ "$effective_status" = "updated" ]; then
+      ((updated++)) || true
+    fi
+  done
+
+  _log ""
+  _log "=== Import complete ==="
+  _log "Created: $created"
+  _log "Updated: $updated"
+  _log "Unchanged: $unchanged"
+}
+
+main "$@"
--- a/vault/policies/AGENTS.md
+++ b/vault/policies/AGENTS.md
@ -55,12 +55,73 @@ validation.
 4. The CI fmt + validate step lands in S2.6 (#884). Until then
   `vault policy fmt <file>` locally is the fastest sanity check.

+## JWT-auth roles (S2.3)
+
+Policies are inert until a Vault token carrying them is minted. In this
+migration that mint path is JWT auth — Nomad jobs exchange their
+workload-identity JWT for a Vault token via
+`auth/jwt-nomad/role/<name>` → `token_policies = ["<policy>"]`. The
+role bindings live in [`../roles.yaml`](../roles.yaml); the script that
+enables the auth method + writes the config + applies roles is
+[`lib/init/nomad/vault-nomad-auth.sh`](../../lib/init/nomad/vault-nomad-auth.sh).
+The applier is [`tools/vault-apply-roles.sh`](../../tools/vault-apply-roles.sh).
+
+### Role → policy naming convention
+
+Role name == policy name, 1:1. `vault/roles.yaml` carries one entry per
+`vault/policies/*.hcl` file:
+
+```yaml
+roles:
+  - name:      service-forgejo      # Vault role
+    policy:    service-forgejo      # ACL policy attached to minted tokens
+    namespace: default              # bound_claims.nomad_namespace
+    job_id:    forgejo              # bound_claims.nomad_job_id
+```
+
+The role name is what jobspecs reference via `vault { role = "..." }` —
+keep it identical to the policy basename so an S2.1↔S2.3 drift (new
+policy without a role, or vice versa) shows up in one directory review,
+not as a runtime "permission denied" at job placement.
+
+`bound_claims.nomad_job_id` is the actual `job "..."` name in the
+jobspec, which may differ from the policy name (e.g. policy
+`service-forgejo` binds to job `forgejo`). Update it when each bot's or
+runner's jobspec lands.
+
+### Adding a new service
+
+1. Write `vault/policies/<name>.hcl` using the naming-table family that
+   fits (`service-`, `bot-`, `runner-`, or standalone).
+2. Add a matching entry to `vault/roles.yaml` with all four fields
+   (`name`, `policy`, `namespace`, `job_id`).
+3. Apply both — either in one shot via `lib/init/nomad/vault-nomad-auth.sh`
+   (policies → roles → nomad SIGHUP), or granularly via
+   `tools/vault-apply-policies.sh` + `tools/vault-apply-roles.sh`.
+4. Reference the role in the consuming jobspec's `vault { role = "<name>" }`.
+
+### Token shape
+
+All roles share the same token shape, hardcoded in
+`tools/vault-apply-roles.sh`:
+
+| Field | Value |
+|---|---|
+| `bound_audiences` | `["vault.io"]` — matches `default_identity.aud` in `nomad/server.hcl` |
+| `token_type` | `service` — auto-revoked when the task exits |
+| `token_ttl` | `1h` |
+| `token_max_ttl` | `24h` |
+
+Bumping any of these is a knowing, repo-wide change. Per-role overrides
+would let one service's tokens outlive the others — add a field to
+`vault/roles.yaml` and the applier at the same time if that ever
+becomes necessary.
+
 ## What this directory does NOT own

 - **Attaching policies to Nomad jobs.** That's S2.4 (#882) via the
-  jobspec `template { vault { policies = […] } }` stanza.
- **Enabling JWT auth + Nomad workload identity roles.** That's S2.3
-  (#881).
+  jobspec `template { vault { policies = […] } }` stanza — the role
+  name in `vault { role = "..." }` is what binds the policy.
 - **Writing the secret values themselves.** That's S2.2 (#880) via
  `tools/vault-import.sh`.
 - **CI policy fmt + validate + roles.yaml check.** That's S2.6 (#884).
--- a/vault/roles.yaml
+++ b/vault/roles.yaml
@ -0,0 +1,150 @@
+# =============================================================================
+# vault/roles.yaml — Vault JWT-auth role bindings for Nomad workload identity
+#
+# Part of the Nomad+Vault migration (S2.3, issue #881). One entry per
+# vault/policies/*.hcl policy. Each entry pairs:
+#
+#   - the Vault role name (what a Nomad job references via
+#     `vault { role = "..." }` in its jobspec), with
+#   - the ACL policy attached to tokens it mints, and
+#   - the bound claims that gate which Nomad workloads may authenticate
+#     through that role (prevents a jobspec named "woodpecker" from
+#     asking for role "service-forgejo").
+#
+# The source of truth for *what* secrets each role's token can read is
+# vault/policies/<policy>.hcl. This file only wires role→policy→claims.
+# Keeping the two side-by-side in the repo means an S2.1↔S2.3 drift
+# (new policy without a role, or vice versa) shows up in one directory
+# review, not as a runtime "permission denied" at job placement.
+#
+# All roles share the same constants (hardcoded in tools/vault-apply-roles.sh):
+#   - bound_audiences = ["vault.io"]      — Nomad's default workload-identity aud
+#   - token_type      = "service"         — revoked when task exits
+#   - token_ttl       = "1h"              — token lifetime
+#   - token_max_ttl   = "24h"             — hard cap across renewals
+#
+# Format (strict — parsed line-by-line by tools/vault-apply-roles.sh with
+# awk; keep the "- name:" prefix + two-space nested indent exactly as
+# shown below):
+#
+#   roles:
+#     - name:      <vault-role-name>    # path: auth/jwt-nomad/role/<name>
+#       policy:    <acl-policy-name>    # must match vault/policies/<name>.hcl
+#       namespace: <nomad-namespace>    # bound_claims.nomad_namespace
+#       job_id:    <nomad-job-id>       # bound_claims.nomad_job_id
+#
+# All four fields are required. Comments (#) and blank lines are ignored.
+#
+# Adding a new role:
+#   1. Land the companion vault/policies/<name>.hcl in S2.1 style.
+#   2. Add a block here with all four fields.
+#   3. Run tools/vault-apply-roles.sh to upsert it.
+#   4. Re-run to confirm "role <name> unchanged".
+# =============================================================================
+roles:
+  # ── Long-running services (nomad/jobs/<name>.hcl) ──────────────────────────
+  # The jobspec's nomad job name is the bound job_id, e.g. `job "forgejo"`
+  # in nomad/jobs/forgejo.hcl → job_id: forgejo. The policy name stays
+  # `service-<name>` so the directory layout under vault/policies/ groups
+  # platform services under a single prefix.
+  - name:      service-forgejo
+    policy:    service-forgejo
+    namespace: default
+    job_id:    forgejo
+
+  - name:      service-woodpecker
+    policy:    service-woodpecker
+    namespace: default
+    job_id:    woodpecker
+
+  # ── Per-agent bots (nomad/jobs/bot-<role>.hcl — land in later steps) ───────
+  # job_id placeholders match the policy name 1:1 until each bot's jobspec
+  # lands. When a bot's jobspec is added under nomad/jobs/, update the
+  # corresponding job_id here to match the jobspec's `job "<name>"` — and
+  # CI's S2.6 roles.yaml check will confirm the pairing.
+  - name:      bot-dev
+    policy:    bot-dev
+    namespace: default
+    job_id:    bot-dev
+
+  - name:      bot-dev-qwen
+    policy:    bot-dev-qwen
+    namespace: default
+    job_id:    bot-dev-qwen
+
+  - name:      bot-review
+    policy:    bot-review
+    namespace: default
+    job_id:    bot-review
+
+  - name:      bot-gardener
+    policy:    bot-gardener
+    namespace: default
+    job_id:    bot-gardener
+
+  - name:      bot-planner
+    policy:    bot-planner
+    namespace: default
+    job_id:    bot-planner
+
+  - name:      bot-predictor
+    policy:    bot-predictor
+    namespace: default
+    job_id:    bot-predictor
+
+  - name:      bot-supervisor
+    policy:    bot-supervisor
+    namespace: default
+    job_id:    bot-supervisor
+
+  - name:      bot-architect
+    policy:    bot-architect
+    namespace: default
+    job_id:    bot-architect
+
+  - name:      bot-vault
+    policy:    bot-vault
+    namespace: default
+    job_id:    bot-vault
+
+  # ── Edge dispatcher ────────────────────────────────────────────────────────
+  - name:      dispatcher
+    policy:    dispatcher
+    namespace: default
+    job_id:    dispatcher
+
+  # ── Per-secret runner roles ────────────────────────────────────────────────
+  # vault-runner (Step 5) composes runner-<NAME> policies onto each
+  # ephemeral dispatch token based on the action TOML's `secrets = [...]`.
+  # The per-dispatch runner jobspec job_id follows the same `runner-<NAME>`
+  # convention (one jobspec per secret, minted per dispatch) so the bound
+  # claim matches the role name directly.
+  - name:      runner-GITHUB_TOKEN
+    policy:    runner-GITHUB_TOKEN
+    namespace: default
+    job_id:    runner-GITHUB_TOKEN
+
+  - name:      runner-CODEBERG_TOKEN
+    policy:    runner-CODEBERG_TOKEN
+    namespace: default
+    job_id:    runner-CODEBERG_TOKEN
+
+  - name:      runner-CLAWHUB_TOKEN
+    policy:    runner-CLAWHUB_TOKEN
+    namespace: default
+    job_id:    runner-CLAWHUB_TOKEN
+
+  - name:      runner-DEPLOY_KEY
+    policy:    runner-DEPLOY_KEY
+    namespace: default
+    job_id:    runner-DEPLOY_KEY
+
+  - name:      runner-NPM_TOKEN
+    policy:    runner-NPM_TOKEN
+    namespace: default
+    job_id:    runner-NPM_TOKEN
+
+  - name:      runner-DOCKER_HUB_TOKEN
+    policy:    runner-DOCKER_HUB_TOKEN
+    namespace: default
+    job_id:    runner-DOCKER_HUB_TOKEN
Author	SHA1	Message	Date
dev-qwen2	428fa223d8	fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite for incremental updates and secure jq interpolation (#880 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/secret-scan Pipeline was successful Details ci/woodpecker/pr/smoke-init Pipeline was successful Details	2026-04-16 17:22:05 +00:00
dev-qwen2	197716ed5c	fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite by grouping key-value pairs per path (#880 )	2026-04-16 17:22:05 +00:00
dev-qwen2	b4c290bfda	fix: [nomad-step-2] S2.2 — Fix bot/runner operation parsing and sops value extraction (#880 )	2026-04-16 17:22:05 +00:00
dev-qwen2	78f92d0cd0	fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880 )	2026-04-16 17:22:05 +00:00
dev-qwen2	7a1f0b2c26	fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880 )	2026-04-16 17:22:05 +00:00
dev-qwen2	1dc50e5784	fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880 )	2026-04-16 17:22:05 +00:00
dev-bot	a2a7c4a12c	Merge pull request 'fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881 )' (#895 ) from fix/issue-881 into main All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details	2026-04-16 17:10:18 +00:00
Claude	b2c86c3037	fix: [nomad-step-2] S2.3 review round 1 — document new helper + script, drop unused vault CLI precondition (#881 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/nomad-validate Pipeline was successful Details ci/woodpecker/pr/secret-scan Pipeline was successful Details Review feedback from PR #895 round 1: - lib/AGENTS.md (hvault.sh row): add hvault_get_or_empty(PATH) to the public-function list; replace the "not sourced at runtime yet" note with the three actual callers (vault-apply-policies.sh, vault-apply-roles.sh, vault-nomad-auth.sh). - lib/AGENTS.md (lib/init/nomad/ row): add a one-line description of vault-nomad-auth.sh (Step 2, this PR); relabel the row header from "Step 0 installer scripts" to "installer scripts" since it now spans Step 0 + Step 2. - lib/init/nomad/vault-nomad-auth.sh: drop the `vault` CLI from the binary precondition check — hvault.sh's helpers are all curl-based, so the CLI is never invoked. The precondition would spuriously die on a Nomad-client-only node that has Vault server reachable but no `vault` binary installed. Inline comment preserves the rationale. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:58:27 +00:00
Claude	8efef9f1bb	fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/nomad-validate Pipeline was successful Details ci/woodpecker/pr/secret-scan Pipeline was successful Details Wires Nomad → Vault via workload identity so jobs can exchange their short-lived JWT for a Vault token carrying the policies in vault/policies/ — no shared VAULT_TOKEN in job env. - `lib/init/nomad/vault-nomad-auth.sh` — idempotent script: enable jwt auth at path `jwt-nomad`, config JWKS/algs, apply roles, install server.hcl + SIGHUP nomad on change. - `tools/vault-apply-roles.sh` — companion sync script (S2.1 sibling); reads vault/roles.yaml and upserts each Vault role under auth/jwt-nomad/role/<name> with created/updated/unchanged semantics. - `vault/roles.yaml` — declarative role→policy→bound_claims map; one entry per vault/policies/*.hcl. Keeps S2.1 policies and S2.3 role bindings visible side-by-side at review time. - `nomad/server.hcl` — adds vault stanza (enabled, address, default_identity.aud=["vault.io"], ttl=1h). - `lib/hvault.sh` — new `hvault_get_or_empty` helper shared between vault-apply-policies.sh, vault-apply-roles.sh, and vault-nomad-auth.sh; reads a Vault endpoint and distinguishes 200 / 404 / other. - `vault/policies/AGENTS.md` — extends S2.1 docs with JWT-auth role naming convention, token shape, and the "add new service" flow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:44:59 +00:00