Compare commits

..

1 commit

Author SHA1 Message Date
Claude
90f13c0313 fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:29:55 +00:00
3 changed files with 68 additions and 82 deletions

View file

@ -1,70 +0,0 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/lib-systemd.sh — Shared idempotent systemd-unit installer
#
# Sourced by lib/init/nomad/systemd-nomad.sh and lib/init/nomad/systemd-vault.sh
# (and any future sibling) to collapse the "write unit if content differs,
# daemon-reload, enable (never start)" boilerplate.
#
# Install-but-don't-start is the invariant this helper enforces — mid-migration
# installers land files and enable units; the orchestrator (S0.4) starts them.
#
# Public API (sourced into caller scope):
#
# systemd_require_preconditions UNIT_PATH
# Asserts the caller is uid 0 and `systemctl` is on $PATH. Calls the
# caller's die() with a UNIT_PATH-scoped message on failure.
#
# systemd_install_unit UNIT_PATH UNIT_NAME UNIT_CONTENT
# Writes UNIT_CONTENT to UNIT_PATH (0644 root:root) only if on-disk
# content differs. If written, runs `systemctl daemon-reload`. Then
# enables UNIT_NAME (no-op if already enabled). Never starts the unit.
#
# Caller contract:
# - Callers MUST define `log()` and `die()` before sourcing this file (we
# call log() for status chatter and rely on the caller's error-handling
# stance; `set -e` propagates install/cmp/systemctl failures).
# =============================================================================
# systemd_require_preconditions UNIT_PATH
systemd_require_preconditions() {
local unit_path="$1"
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs write access to ${unit_path})"
fi
command -v systemctl >/dev/null 2>&1 \
|| die "systemctl not found (systemd is required)"
}
# systemd_install_unit UNIT_PATH UNIT_NAME UNIT_CONTENT
systemd_install_unit() {
local unit_path="$1"
local unit_name="$2"
local unit_content="$3"
local needs_reload=0
if [ ! -f "$unit_path" ] \
|| ! printf '%s\n' "$unit_content" | cmp -s - "$unit_path"; then
log "writing unit → ${unit_path}"
local tmp
tmp="$(mktemp)"
printf '%s\n' "$unit_content" > "$tmp"
install -m 0644 -o root -g root "$tmp" "$unit_path"
rm -f "$tmp"
needs_reload=1
else
log "unit file already up to date"
fi
if [ "$needs_reload" -eq 1 ]; then
log "systemctl daemon-reload"
systemctl daemon-reload
fi
if systemctl is-enabled --quiet "$unit_name" 2>/dev/null; then
log "${unit_name} already enabled"
else
log "systemctl enable ${unit_name}"
systemctl enable "$unit_name" >/dev/null
fi
}

View file

@ -33,11 +33,13 @@ NOMAD_DATA_DIR="/var/lib/nomad"
log() { printf '[systemd-nomad] %s\n' "$*"; } log() { printf '[systemd-nomad] %s\n' "$*"; }
die() { printf '[systemd-nomad] ERROR: %s\n' "$*" >&2; exit 1; } die() { printf '[systemd-nomad] ERROR: %s\n' "$*" >&2; exit 1; }
# shellcheck source=lib-systemd.sh
. "$(dirname "${BASH_SOURCE[0]}")/lib-systemd.sh"
# ── Preconditions ──────────────────────────────────────────────────────────── # ── Preconditions ────────────────────────────────────────────────────────────
systemd_require_preconditions "$UNIT_PATH" if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs write access to ${UNIT_PATH})"
fi
command -v systemctl >/dev/null 2>&1 \
|| die "systemctl not found (systemd is required)"
NOMAD_BIN="$(command -v nomad 2>/dev/null || true)" NOMAD_BIN="$(command -v nomad 2>/dev/null || true)"
[ -n "$NOMAD_BIN" ] \ [ -n "$NOMAD_BIN" ] \
@ -96,7 +98,33 @@ for d in "$NOMAD_CONFIG_DIR" "$NOMAD_DATA_DIR"; do
fi fi
done done
# ── Install + reload + enable (shared with systemd-vault.sh via lib-systemd) ─ # ── Install unit file only if content differs ────────────────────────────────
systemd_install_unit "$UNIT_PATH" "nomad.service" "$DESIRED_UNIT" needs_reload=0
if [ ! -f "$UNIT_PATH" ] \
|| ! printf '%s\n' "$DESIRED_UNIT" | cmp -s - "$UNIT_PATH"; then
log "writing unit → ${UNIT_PATH}"
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
printf '%s\n' "$DESIRED_UNIT" > "$tmp"
install -m 0644 -o root -g root "$tmp" "$UNIT_PATH"
rm -f "$tmp"
trap - EXIT
needs_reload=1
else
log "unit file already up to date"
fi
# ── Reload + enable ──────────────────────────────────────────────────────────
if [ "$needs_reload" -eq 1 ]; then
log "systemctl daemon-reload"
systemctl daemon-reload
fi
if systemctl is-enabled --quiet nomad.service 2>/dev/null; then
log "nomad.service already enabled"
else
log "systemctl enable nomad"
systemctl enable nomad.service >/dev/null
fi
log "done — unit installed and enabled (NOT started; S0.4 brings the cluster up)" log "done — unit installed and enabled (NOT started; S0.4 brings the cluster up)"

View file

@ -56,11 +56,13 @@ VAULT_HCL_SRC="${REPO_ROOT}/nomad/vault.hcl"
log() { printf '[systemd-vault] %s\n' "$*"; } log() { printf '[systemd-vault] %s\n' "$*"; }
die() { printf '[systemd-vault] ERROR: %s\n' "$*" >&2; exit 1; } die() { printf '[systemd-vault] ERROR: %s\n' "$*" >&2; exit 1; }
# shellcheck source=lib-systemd.sh
. "${SCRIPT_DIR}/lib-systemd.sh"
# ── Preconditions ──────────────────────────────────────────────────────────── # ── Preconditions ────────────────────────────────────────────────────────────
systemd_require_preconditions "$UNIT_PATH" if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs write access to ${UNIT_PATH})"
fi
command -v systemctl >/dev/null 2>&1 \
|| die "systemctl not found (systemd is required)"
VAULT_BIN="$(command -v vault 2>/dev/null || true)" VAULT_BIN="$(command -v vault 2>/dev/null || true)"
[ -n "$VAULT_BIN" ] \ [ -n "$VAULT_BIN" ] \
@ -144,7 +146,33 @@ else
log "config already up to date" log "config already up to date"
fi fi
# ── Install + reload + enable (shared with systemd-nomad.sh via lib-systemd) ─ # ── Install unit file only if content differs ────────────────────────────────
systemd_install_unit "$UNIT_PATH" "vault.service" "$DESIRED_UNIT" needs_reload=0
if [ ! -f "$UNIT_PATH" ] \
|| ! printf '%s\n' "$DESIRED_UNIT" | cmp -s - "$UNIT_PATH"; then
log "writing unit → ${UNIT_PATH}"
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
printf '%s\n' "$DESIRED_UNIT" > "$tmp"
install -m 0644 -o root -g root "$tmp" "$UNIT_PATH"
rm -f "$tmp"
trap - EXIT
needs_reload=1
else
log "unit file already up to date"
fi
# ── Reload + enable ──────────────────────────────────────────────────────────
if [ "$needs_reload" -eq 1 ]; then
log "systemctl daemon-reload"
systemctl daemon-reload
fi
if systemctl is-enabled --quiet vault.service 2>/dev/null; then
log "vault.service already enabled"
else
log "systemctl enable vault"
systemctl enable vault.service >/dev/null
fi
log "done — unit+config installed and enabled (NOT started; vault-init.sh next)" log "done — unit+config installed and enabled (NOT started; vault-init.sh next)"