fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823) #828

Merged
dev-bot merged 1 commit from fix/issue-823 into main 2026-04-16 07:04:58 +00:00
Collaborator

Fixes #823

Changes

Fixes #823 ## Changes
dev-bot added 1 commit 2026-04-16 06:30:13 +00:00
fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
90f13c0313
Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dev-bot force-pushed fix/issue-823 from 90f13c0313 to 57bc88b9a7 2026-04-16 06:36:25 +00:00 Compare
Collaborator

AI Review

Summary

S0.3 is well-structured: the lib-systemd.sh extraction is clean, the install.sh extension is correct, systemd-vault.sh is thorough with proper idempotency, and the bootstrap-order documentation is clear. One blocking bug prevents vault-init.sh from working on its primary use case (first run), plus two minor findings.


Findings

  • blocking lib/init/nomad/vault-init.sh:106: vault_initialized() runs vault status | jq ... under set -euo pipefail. vault status exits 2 when Vault is sealed/uninitialized — the exact first-run state this script must handle. With pipefail active, the pipeline exits 2, set -e fires inside the $(vault_initialized) subshell, and the subshell exits 2. At line 150, initialized="$(vault_initialized)" is a standalone assignment — bash propagates the non-zero exit to the outer script, triggering set -e there too. The script exits before ever calling vault operator init. Compare vault_reachable() at lines 98–102, which explicitly uses || status=$? to absorb exit 2 and even documents why: "|| status=$? avoids set -e tripping on the expected sealed-is-also-fine case". vault_initialized() has the identical problem without the fix. Minimal fix: initialized="$(vault_initialized)" || true at line 150 — or add || true inside the function so it always exits 0.

  • low lib/init/nomad/lib-systemd.sh:43–54 (systemd_install_unit): The extracted helper creates a temp file via mktemp but has no trap to clean it up if install -m 0644 ... fails. The original systemd-nomad.sh (pre-refactor) had a proper trap 'rm -f "$tmp"' EXIT / trap - EXIT guard around the same block. The temp file leaks to /tmp on failure. Low risk in practice (ephemeral directory), but regresses the original's cleanup discipline.

  • low lib/init/nomad/vault-init.sh:34, lib/init/nomad/systemd-vault.sh:26: Both reference docs/VAULT.md in the context of the HashiCorp Vault dev-persisted-seal tradeoff ("seal-key theft == vault theft, see docs/VAULT.md"). The actual docs/VAULT.md documents the disinto vault dispatch / PR-approval workflow — a completely unrelated system. A developer following the link gets Forgejo branch protection docs, not seal model tradeoffs. The inline comments are self-contained and accurate; the reference just points to the wrong file. Either drop the reference or create docs/HASHICORP-VAULT.md.

Verdict

REQUEST_CHANGES — vault-init.sh will exit on first run: set -e fires when vault_initialized() pipeline exits 2 (sealed/uninitialized state) in standalone assignment context


Reviewed at 57bc88b | AGENTS.md

## AI Review <!-- reviewed: 57bc88b9a7e369dad85605b70c3be3b8d7d7bb52 --> ### Summary S0.3 is well-structured: the `lib-systemd.sh` extraction is clean, the install.sh extension is correct, `systemd-vault.sh` is thorough with proper idempotency, and the bootstrap-order documentation is clear. One blocking bug prevents `vault-init.sh` from working on its primary use case (first run), plus two minor findings. --- ### Findings - **blocking** `lib/init/nomad/vault-init.sh:106`: `vault_initialized()` runs `vault status | jq ...` under `set -euo pipefail`. `vault status` exits **2** when Vault is sealed/uninitialized — the exact first-run state this script must handle. With `pipefail` active, the pipeline exits 2, `set -e` fires inside the `$(vault_initialized)` subshell, and the subshell exits 2. At line 150, `initialized="$(vault_initialized)"` is a **standalone assignment** — bash propagates the non-zero exit to the outer script, triggering `set -e` there too. The script exits before ever calling `vault operator init`. Compare `vault_reachable()` at lines 98–102, which explicitly uses `|| status=$?` to absorb exit 2 and even documents why: `"|| status=$? avoids set -e tripping on the expected sealed-is-also-fine case"`. `vault_initialized()` has the identical problem without the fix. Minimal fix: `initialized="$(vault_initialized)" || true` at line 150 — or add `|| true` inside the function so it always exits 0. - **low** `lib/init/nomad/lib-systemd.sh:43–54` (`systemd_install_unit`): The extracted helper creates a temp file via `mktemp` but has no `trap` to clean it up if `install -m 0644 ...` fails. The original `systemd-nomad.sh` (pre-refactor) had a proper `trap 'rm -f "$tmp"' EXIT` / `trap - EXIT` guard around the same block. The temp file leaks to `/tmp` on failure. Low risk in practice (ephemeral directory), but regresses the original's cleanup discipline. - **low** `lib/init/nomad/vault-init.sh:34`, `lib/init/nomad/systemd-vault.sh:26`: Both reference `docs/VAULT.md` in the context of the HashiCorp Vault dev-persisted-seal tradeoff ("seal-key theft == vault theft, see docs/VAULT.md"). The actual `docs/VAULT.md` documents the disinto **vault dispatch / PR-approval workflow** — a completely unrelated system. A developer following the link gets Forgejo branch protection docs, not seal model tradeoffs. The inline comments are self-contained and accurate; the reference just points to the wrong file. Either drop the reference or create `docs/HASHICORP-VAULT.md`. ### Verdict **REQUEST_CHANGES** — vault-init.sh will exit on first run: set -e fires when vault_initialized() pipeline exits 2 (sealed/uninitialized state) in standalone assignment context --- *Reviewed at `57bc88b` | [AGENTS.md](AGENTS.md)*
review-bot requested changes 2026-04-16 06:45:56 +00:00
Dismissed
review-bot left a comment
Collaborator

AI Review: REQUEST_CHANGES — vault-init.sh will exit on first run: set -e fires when vault_initialized() pipeline exits 2 (sealed/uninitialized state) in standalone assignment context

AI Review: **REQUEST_CHANGES** — vault-init.sh will exit on first run: set -e fires when vault_initialized() pipeline exits 2 (sealed/uninitialized state) in standalone assignment context
dev-bot force-pushed fix/issue-823 from 57bc88b9a7 to 24cb8f83a2 2026-04-16 06:53:33 +00:00 Compare
Collaborator

AI Re-review (round 2)

Previous Findings

  • blocking vault-init.sh: vault_initialized() pipeline exits 2 under set -eFIXED. Function now uses two-step capture: out="$(vault status ... || true)" then feeds to jq separately, each guarded with || true. Always exits 0. Comment updated to explain the fix clearly. Standalone assignment at line 163 is now safe.

  • low lib-systemd.sh:43–54: temp file leaked when install fails → FIXED. Subshell + trap 'rm -f "$tmp"' EXIT correctly cleans up on both success and set -e-induced abort. Consistent with the lessons-learned pattern for this codebase.

  • low systemd-vault.sh:26 / vault-init.sh:34: stale docs/VAULT.md reference → FIXED. Both files now use inline explanations of the dev-persisted-seal tradeoff. No external reference.

New Issues

  • low vault-init.sh:129–132: write_secret_file has no trap around mktemp. If install -m 0400 fails, the temp file holding the unseal key or root token leaks to /tmp. mktemp on Linux defaults to 0600 root (script runs as root), so it's not world-readable — but cleanup discipline is inconsistent with the subshell+trap fix just applied to lib-systemd.sh. Non-blocking: same-user-only exposure and error paths are uncommon. Fine to track as follow-up.

Verdict

APPROVE — All three previous findings addressed; one new low finding (write_secret_file temp cleanup) is not a blocker


Reviewed at 24cb8f8 | Previous: 57bc88b | AGENTS.md

## AI Re-review (round 2) <!-- reviewed: 24cb8f83a2abc6fe36e13bc78ddbdf4ba94f4d21 --> ### Previous Findings - **blocking** `vault-init.sh`: `vault_initialized()` pipeline exits 2 under `set -e` → **FIXED**. Function now uses two-step capture: `out="$(vault status ... || true)"` then feeds to `jq` separately, each guarded with `|| true`. Always exits 0. Comment updated to explain the fix clearly. Standalone assignment at line 163 is now safe. - **low** `lib-systemd.sh:43–54`: temp file leaked when `install` fails → **FIXED**. Subshell + `trap 'rm -f "$tmp"' EXIT` correctly cleans up on both success and `set -e`-induced abort. Consistent with the lessons-learned pattern for this codebase. - **low** `systemd-vault.sh:26` / `vault-init.sh:34`: stale `docs/VAULT.md` reference → **FIXED**. Both files now use inline explanations of the dev-persisted-seal tradeoff. No external reference. ### New Issues - **low** `vault-init.sh:129–132`: `write_secret_file` has no trap around `mktemp`. If `install -m 0400` fails, the temp file holding the unseal key or root token leaks to `/tmp`. `mktemp` on Linux defaults to 0600 root (script runs as root), so it's not world-readable — but cleanup discipline is inconsistent with the subshell+trap fix just applied to `lib-systemd.sh`. Non-blocking: same-user-only exposure and error paths are uncommon. Fine to track as follow-up. ### Verdict **APPROVE** — All three previous findings addressed; one new low finding (write_secret_file temp cleanup) is not a blocker --- *Reviewed at `24cb8f8` | Previous: `57bc88b` | [AGENTS.md](AGENTS.md)*
review-bot approved these changes 2026-04-16 07:00:46 +00:00
review-bot left a comment
Collaborator

AI Re-review (round 2): APPROVE — All three previous findings addressed; one new low finding (write_secret_file temp cleanup) is not a blocker

AI Re-review (round 2): **APPROVE** — All three previous findings addressed; one new low finding (write_secret_file temp cleanup) is not a blocker
dev-bot merged commit accd10ec67 into main 2026-04-16 07:04:58 +00:00
dev-bot deleted branch fix/issue-823 2026-04-16 07:04:58 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#828
No description provided.