fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful

Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude 2026-04-16 06:29:55 +00:00
parent 75bec43c4a
commit 57bc88b9a7
6 changed files with 519 additions and 68 deletions

View file

@ -0,0 +1,70 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/lib-systemd.sh — Shared idempotent systemd-unit installer
#
# Sourced by lib/init/nomad/systemd-nomad.sh and lib/init/nomad/systemd-vault.sh
# (and any future sibling) to collapse the "write unit if content differs,
# daemon-reload, enable (never start)" boilerplate.
#
# Install-but-don't-start is the invariant this helper enforces — mid-migration
# installers land files and enable units; the orchestrator (S0.4) starts them.
#
# Public API (sourced into caller scope):
#
# systemd_require_preconditions UNIT_PATH
# Asserts the caller is uid 0 and `systemctl` is on $PATH. Calls the
# caller's die() with a UNIT_PATH-scoped message on failure.
#
# systemd_install_unit UNIT_PATH UNIT_NAME UNIT_CONTENT
# Writes UNIT_CONTENT to UNIT_PATH (0644 root:root) only if on-disk
# content differs. If written, runs `systemctl daemon-reload`. Then
# enables UNIT_NAME (no-op if already enabled). Never starts the unit.
#
# Caller contract:
# - Callers MUST define `log()` and `die()` before sourcing this file (we
# call log() for status chatter and rely on the caller's error-handling
# stance; `set -e` propagates install/cmp/systemctl failures).
# =============================================================================
# systemd_require_preconditions UNIT_PATH
systemd_require_preconditions() {
local unit_path="$1"
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs write access to ${unit_path})"
fi
command -v systemctl >/dev/null 2>&1 \
|| die "systemctl not found (systemd is required)"
}
# systemd_install_unit UNIT_PATH UNIT_NAME UNIT_CONTENT
systemd_install_unit() {
local unit_path="$1"
local unit_name="$2"
local unit_content="$3"
local needs_reload=0
if [ ! -f "$unit_path" ] \
|| ! printf '%s\n' "$unit_content" | cmp -s - "$unit_path"; then
log "writing unit → ${unit_path}"
local tmp
tmp="$(mktemp)"
printf '%s\n' "$unit_content" > "$tmp"
install -m 0644 -o root -g root "$tmp" "$unit_path"
rm -f "$tmp"
needs_reload=1
else
log "unit file already up to date"
fi
if [ "$needs_reload" -eq 1 ]; then
log "systemctl daemon-reload"
systemctl daemon-reload
fi
if systemctl is-enabled --quiet "$unit_name" 2>/dev/null; then
log "${unit_name} already enabled"
else
log "systemctl enable ${unit_name}"
systemctl enable "$unit_name" >/dev/null
fi
}