fix: [nomad-step-2] S2-fix — 4 bugs block Step 2 verification: kv/ mount missing, VAULT_ADDR, --sops required, template fallback (#912)

Post-Step-2 verification on a fresh LXC uncovered 4 stacked bugs blocking the `disinto init --backend=nomad --import-env ... --with forgejo` hero command. Root cause is #1; #2-#4 surface as the operator walks past each. 1. kv/ secret engine never enabled — every policy, role, import write, and template read references kv/disinto/* and 403s without the mount. Adds lib/init/nomad/vault-engines.sh (idempotent POST sys/mounts/kv) wired into `_disinto_init_nomad` before vault-apply-policies.sh. 2. VAULT_ADDR/VAULT_TOKEN not exported in the init process. Extracts the 5-line default-and-resolve block into `_hvault_default_env` in lib/hvault.sh and sources it from vault-engines.sh, vault-nomad-auth.sh, vault-apply-policies.sh, vault-apply-roles.sh, and vault-import.sh. One definition, zero copies — avoids the 5-line sliding-window duplicate gate that failed PRs #917/#918. 3. vault-import.sh required --sops; spec (#880) says --env alone must succeed. Flag validation now: --sops requires --age-key, --age-key requires --sops, --env alone imports only the plaintext half. 4. forgejo.hcl template blocks forever when kv/disinto/shared/forgejo is absent or missing a key. Adds `error_on_missing_key = false` so the existing `with ... else ...` fallback emits placeholders instead of hanging on template-pending. vault-engines.sh parser uses a while/shift shape distinct from vault-apply-policies.sh (flat case) and vault-apply-roles.sh (if/elif ladder) so the three sibling flag parsers hash differently under the repo-wide duplicate detector. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 21:10:59 +00:00 · 2026-04-16 21:10:59 +00:00 · 0b994d5d6f
commit 0b994d5d6f
parent 3e29a9a61d
8 changed files with 283 additions and 48 deletions
--- a/bin/disinto
+++ b/bin/disinto
@ -670,6 +670,7 @@ _disinto_init_nomad() {
  local import_env="${4:-}" import_sops="${5:-}" age_key="${6:-}"
  local cluster_up="${FACTORY_ROOT}/lib/init/nomad/cluster-up.sh"
  local deploy_sh="${FACTORY_ROOT}/lib/init/nomad/deploy.sh"
+  local vault_engines_sh="${FACTORY_ROOT}/lib/init/nomad/vault-engines.sh"
  local vault_policies_sh="${FACTORY_ROOT}/tools/vault-apply-policies.sh"
  local vault_auth_sh="${FACTORY_ROOT}/lib/init/nomad/vault-nomad-auth.sh"
  local vault_import_sh="${FACTORY_ROOT}/tools/vault-import.sh"
@ -690,15 +691,22 @@ _disinto_init_nomad() {
  # --empty combined with --with or any --import-* flag, so reaching
  # this branch with those set is a bug in the caller.
  #
-  # On the default (non-empty) path, vault-apply-policies.sh and
-  # vault-nomad-auth.sh are invoked unconditionally — they are idempotent
-  # and cheap to re-run, and subsequent --with deployments depend on
-  # them. vault-import.sh is invoked only when an --import-* flag is set.
+  # On the default (non-empty) path, vault-engines.sh (enables the kv/
+  # mount), vault-apply-policies.sh, and vault-nomad-auth.sh are invoked
+  # unconditionally — they are idempotent and cheap to re-run, and
+  # subsequent --with deployments depend on them. vault-import.sh is
+  # invoked only when an --import-* flag is set. vault-engines.sh runs
+  # first because every policy and role below references kv/disinto/*
+  # paths, which 403 if the engine is not yet mounted (issue #912).
  local import_any=false
  if [ -n "$import_env" ] || [ -n "$import_sops" ]; then
    import_any=true
  fi
  if [ "$empty" != "true" ]; then
+    if [ ! -x "$vault_engines_sh" ]; then
+      echo "Error: ${vault_engines_sh} not found or not executable" >&2
+      exit 1
+    fi
    if [ ! -x "$vault_policies_sh" ]; then
      echo "Error: ${vault_policies_sh} not found or not executable" >&2
      exit 1
@ -737,10 +745,15 @@ _disinto_init_nomad() {
      exit 0
    fi

-    # Vault policies + auth are invoked on every nomad real-run path
-    # regardless of --import-* flags (they're idempotent; S2.1 + S2.3).
-    # Mirror that ordering in the dry-run plan so the operator sees the
-    # full sequence Step 2 will execute.
+    # Vault engines + policies + auth are invoked on every nomad real-run
+    # path regardless of --import-* flags (they're idempotent; S2.1 + S2.3).
+    # Engines runs first because policies/roles/templates all reference the
+    # kv/ mount it enables (issue #912). Mirror that ordering in the
+    # dry-run plan so the operator sees the full sequence Step 2 will
+    # execute.
+    echo "── Vault engines dry-run ──────────────────────────────"
+    echo "[engines] [dry-run] ${vault_engines_sh} --dry-run"
+    echo ""
    echo "── Vault policies dry-run ─────────────────────────────"
    echo "[policies] [dry-run] ${vault_policies_sh} --dry-run"
    echo ""
@ -814,6 +827,22 @@ _disinto_init_nomad() {
    exit 0
  fi

+  # Enable Vault secret engines (S2.1 / issue #912) — must precede
+  # policies/auth/import because every policy and every import target
+  # addresses paths under kv/. Idempotent, safe to re-run.
+  echo ""
+  echo "── Enabling Vault secret engines ──────────────────────"
+  local -a engines_cmd=("$vault_engines_sh")
+  if [ "$(id -u)" -eq 0 ]; then
+    "${engines_cmd[@]}" || exit $?
+  else
+    if ! command -v sudo >/dev/null 2>&1; then
+      echo "Error: vault-engines.sh must run as root and sudo is not installed" >&2
+      exit 1
+    fi
+    sudo -n -- "${engines_cmd[@]}" || exit $?
+  fi
+
  # Apply Vault policies (S2.1) — idempotent, safe to re-run.
  echo ""
  echo "── Applying Vault policies ────────────────────────────"