disinto/docs/nomad-migration.md

<!-- last-reviewed: (new file, S2.5 #883) -->
# Nomad+Vault migration — cutover-day runbook

`disinto init --backend=nomad` is the single entry-point that turns a fresh
LXC (with the disinto repo cloned) into a running Nomad+Vault cluster with
policies applied, JWT workload-identity auth configured, secrets imported
from the old docker stack, and services deployed.

## Cutover-day invocation

On the new LXC, as root (or an operator with NOPASSWD sudo):

```bash
# Copy the plaintext .env + sops-encrypted .env.vault.enc + age keyfile
# from the old box first (out of band — SSH, USB, whatever your ops
# procedure allows). Then:

sudo ./bin/disinto init \
  --backend=nomad \
  --import-env   /tmp/.env \
  --import-sops  /tmp/.env.vault.enc \
  --age-key      /tmp/keys.txt \
  --with         forgejo
```

This runs, in order:

1. **`lib/init/nomad/cluster-up.sh`** (S0) — installs Nomad + Vault
   binaries, writes `/etc/nomad.d/*`, initializes Vault, starts both
   services, waits for the Nomad node to become ready.
2. **`tools/vault-apply-policies.sh`** (S2.1) — syncs every
   `vault/policies/*.hcl` into Vault as an ACL policy. Idempotent.
3. **`lib/init/nomad/vault-nomad-auth.sh`** (S2.3) — enables Vault's
   JWT auth method at `jwt-nomad`, points it at Nomad's JWKS, writes
   one role per policy, reloads Nomad so jobs can exchange
   workload-identity tokens for Vault tokens. Idempotent.
4. **`tools/vault-import.sh`** (S2.2) — reads `/tmp/.env` and the
   sops-decrypted `/tmp/.env.vault.enc`, writes them to the KV paths
   matching the S2.1 policy layout (`kv/disinto/bots/*`, `kv/disinto/shared/*`,
   `kv/disinto/runner/*`). Idempotent (overwrites KV v2 data in place).
5. **`lib/init/nomad/deploy.sh forgejo`** (S1) — validates + runs the
   `nomad/jobs/forgejo.hcl` jobspec. Forgejo reads its admin creds from
   Vault via the `template` stanza (S2.4).

## Flag summary

| Flag | Meaning |
|---|---|
| `--backend=nomad` | Switch the init dispatcher to the Nomad+Vault path (instead of docker compose). |
| `--empty` | Bring the cluster up, skip policies/auth/import/deploy. Escape hatch for debugging. |
| `--with forgejo[,…]` | Deploy these services after the cluster is up. |
| `--import-env PATH` | Plaintext `.env` from the old stack. Optional. |
| `--import-sops PATH` | Sops-encrypted `.env.vault.enc` from the old stack. Requires `--age-key`. |
| `--age-key PATH` | Age keyfile used to decrypt `--import-sops`. Requires `--import-sops`. |
| `--dry-run` | Print the full plan (cluster-up + policies + auth + import + deploy) and exit. Touches nothing. |

### Flag validation

- `--import-sops` without `--age-key` → error.
- `--age-key` without `--import-sops` → error.
- `--import-env` alone (no sops) → OK (imports just the plaintext `.env`).
- `--backend=docker` with any `--import-*` flag → error.
- `--empty` with any `--import-*` flag → error (mutually exclusive: `--empty`
  skips the import step, so pairing them silently discards the import
  intent).

## Idempotency

Every layer is idempotent by design. Re-running the same command on an
already-provisioned box is a no-op at every step:

- **Cluster-up:** second run detects running `nomad`/`vault` systemd
  units and state files, skips re-init.
- **Policies:** byte-for-byte compare against on-server policy text;
  "unchanged" for every untouched file.
- **Auth:** skips auth-method create if `jwt-nomad/` already enabled,
  skips config write if the JWKS + algs match, skips server.hcl write if
  the file on disk is identical to the repo copy.
- **Import:** KV v2 writes overwrite in place (same path, same keys,
  same values → no new version).
- **Deploy:** `nomad job run` is declarative; same jobspec → no new
  allocation.

## Dry-run

```bash
./bin/disinto init --backend=nomad \
  --import-env /tmp/.env \
  --import-sops /tmp/.env.vault.enc \
  --age-key /tmp/keys.txt \
  --with forgejo \
  --dry-run
```

Prints the five-section plan — cluster-up, policies, auth, import,
deploy — with every path and every argv that would be executed. No
network, no sudo, no state mutation. See
`tests/disinto-init-nomad.bats` for the exact output shape.

## No-import path

If you already have `kv/disinto/*` seeded by other means (manual
`vault kv put`, a replica, etc.), omit all three `--import-*` flags.
`disinto init --backend=nomad --with forgejo` still applies policies,
configures auth, and deploys — but skips the import step with:

```
[import] no --import-env/--import-sops — skipping; set them or seed kv/disinto/* manually before deploying secret-dependent services
```

Forgejo's template stanza will fail to render (and thus the allocation
will stall) until those KV paths exist — so either import them or seed
them first.

## Secret hygiene

- Never log a secret value. The CLI only prints paths (`--import-env`,
  `--age-key`) and KV *paths* (`kv/disinto/bots/review/token`), never
  the values themselves. `tools/vault-import.sh` is the only thing that
  reads the values, and it pipes them directly into Vault's HTTP API.
- The age keyfile must be mode 0400 — `vault-import.sh` refuses to
  source a keyfile with looser permissions.
- `VAULT_ADDR` must be localhost during import — the import tool
  refuses to run against a remote Vault, preventing accidental exposure.
fix: [nomad-step-2] S2.5 — bin/disinto init --import-env / --import-sops / --age-key wire-up (#883) Wire the Step-2 building blocks (import, auth, policies) into `disinto init --backend=nomad` so a single command on a fresh LXC provisions cluster + policies + auth + imports secrets + deploys services. Adds three flags to `disinto init --backend=nomad`: --import-env PATH plaintext .env from old stack --import-sops PATH sops-encrypted .env.vault.enc (requires --age-key) --age-key PATH age keyfile to decrypt --import-sops Flow: cluster-up.sh → vault-apply-policies.sh → vault-nomad-auth.sh → (optional) vault-import.sh → deploy.sh. Policies + auth run on every nomad real-run path (idempotent); import runs only when --import-* is set; all layers safe to re-run. Flag validation: --import-sops without --age-key → error --age-key without --import-sops → error --import-env alone (no sops) → OK --backend=docker + any --import-* → error Dry-run prints a five-section plan (cluster-up + policies + auth + import + deploy) with every argv that would be executed; touches nothing, logs no secret values. Dry-run output prints one line per --import-* flag that is actually set — not in an if/elif chain — so all three paths appear when all three flags are passed. Prior attempts regressed this invariant. Tests: tests/disinto-init-nomad.bats +10 cases covering flag validation, dry-run plan shape (each flag prints its own path), policies+auth always-on (without --import-*), and --flag=value form. Docs: docs/nomad-migration.md new file — cutover-day runbook with invocation shape, flag summary, idempotency contract, dry-run, and secret-hygiene notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-16 19:04:04 +00:00			`<!-- last-reviewed: (new file, S2.5 #883) -->`
			`# Nomad+Vault migration — cutover-day runbook`

			`disinto init --backend=nomad` is the single entry-point that turns a fresh
			`LXC (with the disinto repo cloned) into a running Nomad+Vault cluster with`
			`policies applied, JWT workload-identity auth configured, secrets imported`
			`from the old docker stack, and services deployed.`

			`## Cutover-day invocation`

			`On the new LXC, as root (or an operator with NOPASSWD sudo):`

			```bash
			`# Copy the plaintext .env + sops-encrypted .env.vault.enc + age keyfile`
			`# from the old box first (out of band — SSH, USB, whatever your ops`
			`# procedure allows). Then:`

			`sudo ./bin/disinto init \`
			`--backend=nomad \`
			`--import-env /tmp/.env \`
			`--import-sops /tmp/.env.vault.enc \`
			`--age-key /tmp/keys.txt \`
			`--with forgejo`
			```

			`This runs, in order:`

			1. `lib/init/nomad/cluster-up.sh` (S0) — installs Nomad + Vault
			binaries, writes `/etc/nomad.d/*`, initializes Vault, starts both
			`services, waits for the Nomad node to become ready.`
			2. `tools/vault-apply-policies.sh` (S2.1) — syncs every
			`vault/policies/*.hcl` into Vault as an ACL policy. Idempotent.
			3. `lib/init/nomad/vault-nomad-auth.sh` (S2.3) — enables Vault's
			JWT auth method at `jwt-nomad`, points it at Nomad's JWKS, writes
			`one role per policy, reloads Nomad so jobs can exchange`
			`workload-identity tokens for Vault tokens. Idempotent.`
			4. `tools/vault-import.sh` (S2.2) — reads `/tmp/.env` and the
			sops-decrypted `/tmp/.env.vault.enc`, writes them to the KV paths
			matching the S2.1 policy layout (`kv/disinto/bots/`, `kv/disinto/shared/`,
			`kv/disinto/runner/*`). Idempotent (overwrites KV v2 data in place).
			5. `lib/init/nomad/deploy.sh forgejo` (S1) — validates + runs the
			`nomad/jobs/forgejo.hcl` jobspec. Forgejo reads its admin creds from
			Vault via the `template` stanza (S2.4).

			`## Flag summary`

			`\| Flag \| Meaning \|`
			`\|---\|---\|`
			\| `--backend=nomad` \| Switch the init dispatcher to the Nomad+Vault path (instead of docker compose). \|
			\| `--empty` \| Bring the cluster up, skip policies/auth/import/deploy. Escape hatch for debugging. \|
			\| `--with forgejo[,…]` \| Deploy these services after the cluster is up. \|
			\| `--import-env PATH` \| Plaintext `.env` from the old stack. Optional. \|
			\| `--import-sops PATH` \| Sops-encrypted `.env.vault.enc` from the old stack. Requires `--age-key`. \|
			\| `--age-key PATH` \| Age keyfile used to decrypt `--import-sops`. Requires `--import-sops`. \|
			\| `--dry-run` \| Print the full plan (cluster-up + policies + auth + import + deploy) and exit. Touches nothing. \|

			`### Flag validation`

			- `--import-sops` without `--age-key` → error.
			- `--age-key` without `--import-sops` → error.
			- `--import-env` alone (no sops) → OK (imports just the plaintext `.env`).
			- `--backend=docker` with any `--import-*` flag → error.
fix: [nomad-step-2] S2.5 review — gate policies/auth/import on --empty; reject --empty + --import-* (#883) Addresses review #907 blocker: docs/nomad-migration.md claimed --empty "skips policies/auth/import/deploy" but _disinto_init_nomad had no $empty gate around those blocks — operators reaching the "cluster-only escape hatch" would still invoke vault-apply-policies.sh and vault-nomad-auth.sh, contradicting the runbook. Changes: - _disinto_init_nomad: exit 0 immediately after cluster-up when --empty is set, in both dry-run and real-run branches. Only the cluster-up plan appears; no policies, no auth, no import, no deploy. Matches the docs. - disinto_init: reject --empty combined with any --import-* flag. --empty discards the import step, so the combination silently does nothing (worse failure mode than a clear error up front). Symmetric to the existing --empty vs --with check. - Pre-flight existence check for policies/auth scripts now runs unconditionally on the non-empty path (previously gated on --import-), matching the unconditional invocation. Import-script check stays gated on --import-. Non-blocking observation also addressed: the pre-flight guard comment + actual predicate were inconsistent ("unconditionally invoke policies+auth" but only checked on import). Now the predicate matches: [ "$empty" != "true" ] gates policies/auth, and an inner --import-* guard gates the import script. Tests (+3): - --empty --dry-run shows no S2.x sections (negative assertions) - --empty --import-env rejected - --empty --import-sops --age-key rejected 30/30 nomad tests pass; shellcheck clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-16 19:25:27 +00:00			- `--empty` with any `--import-*` flag → error (mutually exclusive: `--empty`
			`skips the import step, so pairing them silently discards the import`
			`intent).`
fix: [nomad-step-2] S2.5 — bin/disinto init --import-env / --import-sops / --age-key wire-up (#883) Wire the Step-2 building blocks (import, auth, policies) into `disinto init --backend=nomad` so a single command on a fresh LXC provisions cluster + policies + auth + imports secrets + deploys services. Adds three flags to `disinto init --backend=nomad`: --import-env PATH plaintext .env from old stack --import-sops PATH sops-encrypted .env.vault.enc (requires --age-key) --age-key PATH age keyfile to decrypt --import-sops Flow: cluster-up.sh → vault-apply-policies.sh → vault-nomad-auth.sh → (optional) vault-import.sh → deploy.sh. Policies + auth run on every nomad real-run path (idempotent); import runs only when --import-* is set; all layers safe to re-run. Flag validation: --import-sops without --age-key → error --age-key without --import-sops → error --import-env alone (no sops) → OK --backend=docker + any --import-* → error Dry-run prints a five-section plan (cluster-up + policies + auth + import + deploy) with every argv that would be executed; touches nothing, logs no secret values. Dry-run output prints one line per --import-* flag that is actually set — not in an if/elif chain — so all three paths appear when all three flags are passed. Prior attempts regressed this invariant. Tests: tests/disinto-init-nomad.bats +10 cases covering flag validation, dry-run plan shape (each flag prints its own path), policies+auth always-on (without --import-*), and --flag=value form. Docs: docs/nomad-migration.md new file — cutover-day runbook with invocation shape, flag summary, idempotency contract, dry-run, and secret-hygiene notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-16 19:04:04 +00:00
			`## Idempotency`

			`Every layer is idempotent by design. Re-running the same command on an`
			`already-provisioned box is a no-op at every step:`

			- Cluster-up: second run detects running `nomad`/`vault` systemd
			`units and state files, skips re-init.`
			`- Policies: byte-for-byte compare against on-server policy text;`
			`"unchanged" for every untouched file.`
			- Auth: skips auth-method create if `jwt-nomad/` already enabled,
			`skips config write if the JWKS + algs match, skips server.hcl write if`
			`the file on disk is identical to the repo copy.`
			`- Import: KV v2 writes overwrite in place (same path, same keys,`
			`same values → no new version).`
			- Deploy: `nomad job run` is declarative; same jobspec → no new
			`allocation.`

			`## Dry-run`

			```bash
			`./bin/disinto init --backend=nomad \`
			`--import-env /tmp/.env \`
			`--import-sops /tmp/.env.vault.enc \`
			`--age-key /tmp/keys.txt \`
			`--with forgejo \`
			`--dry-run`
			```

			`Prints the five-section plan — cluster-up, policies, auth, import,`
			`deploy — with every path and every argv that would be executed. No`
			`network, no sudo, no state mutation. See`
			`tests/disinto-init-nomad.bats` for the exact output shape.

			`## No-import path`

			If you already have `kv/disinto/*` seeded by other means (manual
			`vault kv put`, a replica, etc.), omit all three `--import-*` flags.
			`disinto init --backend=nomad --with forgejo` still applies policies,
			`configures auth, and deploys — but skips the import step with:`

			```
			`[import] no --import-env/--import-sops — skipping; set them or seed kv/disinto/* manually before deploying secret-dependent services`
			```

			`Forgejo's template stanza will fail to render (and thus the allocation`
			`will stall) until those KV paths exist — so either import them or seed`
			`them first.`

			`## Secret hygiene`

			- Never log a secret value. The CLI only prints paths (`--import-env`,
			`--age-key`) and KV paths (`kv/disinto/bots/review/token`), never
			the values themselves. `tools/vault-import.sh` is the only thing that
			`reads the values, and it pipes them directly into Vault's HTTP API.`
			- The age keyfile must be mode 0400 — `vault-import.sh` refuses to
			`source a keyfile with looser permissions.`
			- `VAULT_ADDR` must be localhost during import — the import tool
			`refuses to run against a remote Vault, preventing accidental exposure.`