`tools/vault-seed-forgejo.sh` existed and worked, but `bin/disinto init
--backend=nomad --with forgejo` never invoked it, so a fresh LXC with an
empty Vault hit `Template Missing: vault.read(kv/data/disinto/shared/
forgejo)` and the forgejo alloc timed out inside deploy.sh's 240s
healthy_deadline — operator had to run the seeder + `nomad alloc
restart` by hand to recover.
In `_disinto_init_nomad`, after `vault-import.sh` (or its skip branch)
and before `deploy.sh`, iterate `--with <svc>` and auto-invoke
`tools/vault-seed-<svc>.sh` when the file exists + is executable.
Services without a seeder are silently skipped — Step 3+ services
(woodpecker, chat, etc.) can ship their own seeder without touching
`bin/disinto`. VAULT_ADDR is passed explicitly because cluster-up.sh
writes the profile.d export during this same init run (current shell
hasn't sourced it yet) and `vault-seed-forgejo.sh` — unlike its
sibling vault-* scripts — requires the caller to set VAULT_ADDR
instead of defaulting it via `_hvault_default_env`. Mirror the loop in
the --dry-run plan so the operator-visible plan matches the real run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses review #907 blocker: docs/nomad-migration.md claimed
--empty "skips policies/auth/import/deploy" but _disinto_init_nomad
had no $empty gate around those blocks — operators reaching the
"cluster-only escape hatch" would still invoke vault-apply-policies.sh
and vault-nomad-auth.sh, contradicting the runbook.
Changes:
- _disinto_init_nomad: exit 0 immediately after cluster-up when
--empty is set, in both dry-run and real-run branches. Only the
cluster-up plan appears; no policies, no auth, no import, no
deploy. Matches the docs.
- disinto_init: reject --empty combined with any --import-* flag.
--empty discards the import step, so the combination silently
does nothing (worse failure mode than a clear error up front).
Symmetric to the existing --empty vs --with check.
- Pre-flight existence check for policies/auth scripts now runs
unconditionally on the non-empty path (previously gated on
--import-*), matching the unconditional invocation. Import-script
check stays gated on --import-*.
Non-blocking observation also addressed: the pre-flight guard
comment + actual predicate were inconsistent ("unconditionally
invoke policies+auth" but only checked on import). Now the
predicate matches: [ "$empty" != "true" ] gates policies/auth,
and an inner --import-* guard gates the import script.
Tests (+3):
- --empty --dry-run shows no S2.x sections (negative assertions)
- --empty --import-env rejected
- --empty --import-sops --age-key rejected
30/30 nomad tests pass; shellcheck clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the Step-2 building blocks (import, auth, policies) into
`disinto init --backend=nomad` so a single command on a fresh LXC
provisions cluster + policies + auth + imports secrets + deploys
services.
Adds three flags to `disinto init --backend=nomad`:
--import-env PATH plaintext .env from old stack
--import-sops PATH sops-encrypted .env.vault.enc (requires --age-key)
--age-key PATH age keyfile to decrypt --import-sops
Flow: cluster-up.sh → vault-apply-policies.sh → vault-nomad-auth.sh →
(optional) vault-import.sh → deploy.sh. Policies + auth run on every
nomad real-run path (idempotent); import runs only when --import-* is
set; all layers safe to re-run.
Flag validation:
--import-sops without --age-key → error
--age-key without --import-sops → error
--import-env alone (no sops) → OK
--backend=docker + any --import-* → error
Dry-run prints a five-section plan (cluster-up + policies + auth +
import + deploy) with every argv that would be executed; touches
nothing, logs no secret values.
Dry-run output prints one line per --import-* flag that is actually
set — not in an if/elif chain — so all three paths appear when all
three flags are passed. Prior attempts regressed this invariant.
Tests:
tests/disinto-init-nomad.bats +10 cases covering flag validation,
dry-run plan shape (each flag prints its own path), policies+auth
always-on (without --import-*), and --flag=value form.
Docs: docs/nomad-migration.md new file — cutover-day runbook with
invocation shape, flag summary, idempotency contract, dry-run, and
secret-hygiene notes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Nomad's docker task driver reports Healthy=false without a running
dockerd. On the factory dev box docker was pre-installed so Step 0's
cluster-up passed silently, but a fresh ubuntu:24.04 LXC hit "missing
drivers" placement failures the moment Step 1 tried to deploy forgejo
(the first docker-driver consumer).
Fix install.sh to also install docker.io + enable --now docker.service
when absent, and add a poll for the nomad self-node's docker driver
Detected+Healthy before declaring Step 8 done — otherwise the race
between dockerd startup and nomad driver fingerprinting lets the node
reach "ready" while docker is still unhealthy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Why: disinto_init() consumed $1 as repo_url before the argparse loop ran,
so `disinto init --backend=nomad --empty` had --backend=nomad swallowed
into repo_url, backend stayed at its "docker" default, and the --empty
validation then produced the nonsense "--empty is only valid with
--backend=nomad" error — flagged during S0.1 end-to-end verification on
a fresh LXC. nomad backend takes no positional anyway; the LXC already
has the repo cloned by the operator.
Change: only consume $1 as repo_url if it doesn't start with "--", then
defer the "repo URL required" check to after argparse (so the docker
path still errors with a helpful message on a missing positional, not
"Unknown option: --backend=docker").
Verified acceptance criteria:
1. init --backend=nomad --empty → dispatches to nomad
2. init --backend=nomad --empty --dry-run → 9-step plan, exit 0
3. init <repo-url> → docker path unchanged
4. init → "repo URL required"
5. init --backend=docker → "repo URL required"
(not "Unknown option")
6. shellcheck clean
Tests: 4 new regression cases in tests/disinto-init-nomad.bats covering
flag-first nomad invocation (both --flag=value and --flag value forms),
no-args docker default, and --backend=docker missing-positional error
path. Full suite: 10/10 pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bin/disinto flag loop has separate cases for `--backend value`
(space-separated) and `--backend=value`; a regression in either would
silently route to the docker default path. Per the "stub-first dispatch"
lesson, silent misrouting during a migration is the worst failure mode —
covering both forms closes that gap.
Also triggers a retry of the smoke-init pipeline step, which hit a known
Forgejo branch-indexing flake on pipeline #913 (same flake cleared on
retry for PR #829 pipelines #906 → #908); unrelated to the nomad-validate
changes, which went all-green in #913.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Locks in static validation for every Nomad+Vault artifact before it can
merge. Four fail-closed steps in .woodpecker/nomad-validate.yml, gated
to PRs touching nomad/, lib/init/nomad/, or bin/disinto:
1. nomad config validate nomad/server.hcl nomad/client.hcl
2. vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener
3. shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
4. bats tests/disinto-init-nomad.bats — dispatcher smoke tests
bin/disinto picks up pre-existing SC2120 warnings on three passthrough
wrappers (generate_agent_docker, generate_caddyfile, generate_staging_index);
annotated with shellcheck disable=SC2120 so the new pipeline is clean
without narrowing the warning for future code.
Pinned image versions (hashicorp/nomad:1.9.5, hashicorp/vault:1.18.5)
match lib/init/nomad/install.sh — bump both or neither.
nomad/AGENTS.md documents the stack layout, how to add a jobspec in
Step 1, how CI validates it, and the two-place version pinning rule.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>