fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful

Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude 2026-04-16 06:29:55 +00:00
parent 75bec43c4a
commit 57bc88b9a7
6 changed files with 519 additions and 68 deletions

View file

@ -33,13 +33,11 @@ NOMAD_DATA_DIR="/var/lib/nomad"
log() { printf '[systemd-nomad] %s\n' "$*"; }
die() { printf '[systemd-nomad] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Preconditions ────────────────────────────────────────────────────────────
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs write access to ${UNIT_PATH})"
fi
# shellcheck source=lib-systemd.sh
. "$(dirname "${BASH_SOURCE[0]}")/lib-systemd.sh"
command -v systemctl >/dev/null 2>&1 \
|| die "systemctl not found (systemd is required)"
# ── Preconditions ────────────────────────────────────────────────────────────
systemd_require_preconditions "$UNIT_PATH"
NOMAD_BIN="$(command -v nomad 2>/dev/null || true)"
[ -n "$NOMAD_BIN" ] \
@ -98,33 +96,7 @@ for d in "$NOMAD_CONFIG_DIR" "$NOMAD_DATA_DIR"; do
fi
done
# ── Install unit file only if content differs ────────────────────────────────
needs_reload=0
if [ ! -f "$UNIT_PATH" ] \
|| ! printf '%s\n' "$DESIRED_UNIT" | cmp -s - "$UNIT_PATH"; then
log "writing unit → ${UNIT_PATH}"
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
printf '%s\n' "$DESIRED_UNIT" > "$tmp"
install -m 0644 -o root -g root "$tmp" "$UNIT_PATH"
rm -f "$tmp"
trap - EXIT
needs_reload=1
else
log "unit file already up to date"
fi
# ── Reload + enable ──────────────────────────────────────────────────────────
if [ "$needs_reload" -eq 1 ]; then
log "systemctl daemon-reload"
systemctl daemon-reload
fi
if systemctl is-enabled --quiet nomad.service 2>/dev/null; then
log "nomad.service already enabled"
else
log "systemctl enable nomad"
systemctl enable nomad.service >/dev/null
fi
# ── Install + reload + enable (shared with systemd-vault.sh via lib-systemd) ─
systemd_install_unit "$UNIT_PATH" "nomad.service" "$DESIRED_UNIT"
log "done — unit installed and enabled (NOT started; S0.4 brings the cluster up)"