Addresses review #907 blocker: docs/nomad-migration.md claimed --empty "skips policies/auth/import/deploy" but _disinto_init_nomad had no $empty gate around those blocks — operators reaching the "cluster-only escape hatch" would still invoke vault-apply-policies.sh and vault-nomad-auth.sh, contradicting the runbook. Changes: - _disinto_init_nomad: exit 0 immediately after cluster-up when --empty is set, in both dry-run and real-run branches. Only the cluster-up plan appears; no policies, no auth, no import, no deploy. Matches the docs. - disinto_init: reject --empty combined with any --import-* flag. --empty discards the import step, so the combination silently does nothing (worse failure mode than a clear error up front). Symmetric to the existing --empty vs --with check. - Pre-flight existence check for policies/auth scripts now runs unconditionally on the non-empty path (previously gated on --import-*), matching the unconditional invocation. Import-script check stays gated on --import-*. Non-blocking observation also addressed: the pre-flight guard comment + actual predicate were inconsistent ("unconditionally invoke policies+auth" but only checked on import). Now the predicate matches: [ "$empty" != "true" ] gates policies/auth, and an inner --import-* guard gates the import script. Tests (+3): - --empty --dry-run shows no S2.x sections (negative assertions) - --empty --import-env rejected - --empty --import-sops --age-key rejected 30/30 nomad tests pass; shellcheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.2 KiB
Nomad+Vault migration — cutover-day runbook
disinto init --backend=nomad is the single entry-point that turns a fresh
LXC (with the disinto repo cloned) into a running Nomad+Vault cluster with
policies applied, JWT workload-identity auth configured, secrets imported
from the old docker stack, and services deployed.
Cutover-day invocation
On the new LXC, as root (or an operator with NOPASSWD sudo):
# Copy the plaintext .env + sops-encrypted .env.vault.enc + age keyfile
# from the old box first (out of band — SSH, USB, whatever your ops
# procedure allows). Then:
sudo ./bin/disinto init \
--backend=nomad \
--import-env /tmp/.env \
--import-sops /tmp/.env.vault.enc \
--age-key /tmp/keys.txt \
--with forgejo
This runs, in order:
lib/init/nomad/cluster-up.sh(S0) — installs Nomad + Vault binaries, writes/etc/nomad.d/*, initializes Vault, starts both services, waits for the Nomad node to become ready.tools/vault-apply-policies.sh(S2.1) — syncs everyvault/policies/*.hclinto Vault as an ACL policy. Idempotent.lib/init/nomad/vault-nomad-auth.sh(S2.3) — enables Vault's JWT auth method atjwt-nomad, points it at Nomad's JWKS, writes one role per policy, reloads Nomad so jobs can exchange workload-identity tokens for Vault tokens. Idempotent.tools/vault-import.sh(S2.2) — reads/tmp/.envand the sops-decrypted/tmp/.env.vault.enc, writes them to the KV paths matching the S2.1 policy layout (kv/disinto/bots/*,kv/disinto/shared/*,kv/disinto/runner/*). Idempotent (overwrites KV v2 data in place).lib/init/nomad/deploy.sh forgejo(S1) — validates + runs thenomad/jobs/forgejo.hcljobspec. Forgejo reads its admin creds from Vault via thetemplatestanza (S2.4).
Flag summary
| Flag | Meaning |
|---|---|
--backend=nomad |
Switch the init dispatcher to the Nomad+Vault path (instead of docker compose). |
--empty |
Bring the cluster up, skip policies/auth/import/deploy. Escape hatch for debugging. |
--with forgejo[,…] |
Deploy these services after the cluster is up. |
--import-env PATH |
Plaintext .env from the old stack. Optional. |
--import-sops PATH |
Sops-encrypted .env.vault.enc from the old stack. Requires --age-key. |
--age-key PATH |
Age keyfile used to decrypt --import-sops. Requires --import-sops. |
--dry-run |
Print the full plan (cluster-up + policies + auth + import + deploy) and exit. Touches nothing. |
Flag validation
--import-sopswithout--age-key→ error.--age-keywithout--import-sops→ error.--import-envalone (no sops) → OK (imports just the plaintext.env).--backend=dockerwith any--import-*flag → error.--emptywith any--import-*flag → error (mutually exclusive:--emptyskips the import step, so pairing them silently discards the import intent).
Idempotency
Every layer is idempotent by design. Re-running the same command on an already-provisioned box is a no-op at every step:
- Cluster-up: second run detects running
nomad/vaultsystemd units and state files, skips re-init. - Policies: byte-for-byte compare against on-server policy text; "unchanged" for every untouched file.
- Auth: skips auth-method create if
jwt-nomad/already enabled, skips config write if the JWKS + algs match, skips server.hcl write if the file on disk is identical to the repo copy. - Import: KV v2 writes overwrite in place (same path, same keys, same values → no new version).
- Deploy:
nomad job runis declarative; same jobspec → no new allocation.
Dry-run
./bin/disinto init --backend=nomad \
--import-env /tmp/.env \
--import-sops /tmp/.env.vault.enc \
--age-key /tmp/keys.txt \
--with forgejo \
--dry-run
Prints the five-section plan — cluster-up, policies, auth, import,
deploy — with every path and every argv that would be executed. No
network, no sudo, no state mutation. See
tests/disinto-init-nomad.bats for the exact output shape.
No-import path
If you already have kv/disinto/* seeded by other means (manual
vault kv put, a replica, etc.), omit all three --import-* flags.
disinto init --backend=nomad --with forgejo still applies policies,
configures auth, and deploys — but skips the import step with:
[import] no --import-env/--import-sops — skipping; set them or seed kv/disinto/* manually before deploying secret-dependent services
Forgejo's template stanza will fail to render (and thus the allocation will stall) until those KV paths exist — so either import them or seed them first.
Secret hygiene
- Never log a secret value. The CLI only prints paths (
--import-env,--age-key) and KV paths (kv/disinto/bots/review/token), never the values themselves.tools/vault-import.shis the only thing that reads the values, and it pipes them directly into Vault's HTTP API. - The age keyfile must be mode 0400 —
vault-import.shrefuses to source a keyfile with looser permissions. VAULT_ADDRmust be localhost during import — the import tool refuses to run against a remote Vault, preventing accidental exposure.