fix: [nomad-step-0] S0.4 — disinto init --backend=nomad --empty orchestrator (cluster-up) (#824) #829
No reviewers
Labels
No labels
action
backlog
blocked
bug-report
cannot-reproduce
in-progress
in-triage
needs-triage
prediction/actioned
prediction/dismissed
prediction/unreviewed
priority
rejected
reproduced
tech-debt
underspecified
vision
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: disinto-admin/disinto#829
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/issue-824"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #824
Changes
Wires S0.1–S0.3 into a single idempotent bring-up script and replaces the S0.1 stub in _disinto_init_nomad so `disinto init --backend=nomad --empty` produces a running empty single-node cluster on a fresh box. lib/init/nomad/cluster-up.sh (new): 1. install.sh (nomad + vault binaries) 2. systemd-nomad.sh (unit + enable, not started) 3. systemd-vault.sh (unit + vault.hcl + enable) 4. host-volume dirs under /srv/disinto/* (matching nomad/client.hcl) 5. /etc/nomad.d/{server,client}.hcl (content-compare before write) 6. vault-init.sh (first-run init + unseal + persist keys) 7. systemctl start vault (poll until unsealed; fail-fast on is-failed) 8. systemctl start nomad (poll until ≥1 node ready) 9. /etc/profile.d/disinto-nomad.sh (VAULT_ADDR + NOMAD_ADDR for interactive shells) Re-running on a healthy box is a no-op — each sub-step is itself idempotent and steps 7/8 fast-path when already active + healthy. `--dry-run` prints the full step list and exits 0. bin/disinto: - _disinto_init_nomad: replaces the S0.1 stub. Invokes cluster-up.sh directly (as root) or via `sudo -n` otherwise. Both `--empty` and the default (no flag) call cluster-up.sh today; Step 1 will branch on $empty to gate job deployment. --dry-run forwards through. - disinto_init: adds `--empty` flag parsing; rejects `--empty` combined with `--backend=docker` explicitly instead of silently ignoring it. - usage: documents `--empty` and drops the "stub, S0.1" annotation from --backend. Closes #824. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>CI duplicate-detection flagged the in-line vault + nomad polling loops in cluster-up.sh as matching a 5-line window in vault-init.sh (the `ready=1 / break / fi / sleep 1 / done` boilerplate). Extracts the repeated pattern into three helpers at the top of the file: - nomad_has_ready_node wrapper so poll_until_healthy can take a bare command name. - _die_with_service_status shared "log + dump systemctl status + die" path (factored out of the two callsites + the timeout branch). - poll_until_healthy ticks once per second up to TIMEOUT, fail-fasts on systemd "failed" state, and returns 0 on first successful check. Step 7 (vault unseal) and Step 8 (nomad ready node) each collapse from ~15 lines of explicit for-loop bookkeeping to a one-line call. No behavioural change: same tick cadence, same fail-fast, same status dump on timeout. Local detect-duplicates.py run against main confirms no new duplicates introduced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>AI Review
Summary
S0.4 wires together the S0.1–S0.3 building blocks (install, systemd units, vault-init) into an idempotent 9-step cluster orchestrator, and upgrades
_disinto_init_nomadfrom a loud-failing stub to a real dispatcher. The scope is tight, the docker default path is byte-for-byte unchanged, and the implementation is solid.Correctness
cluster-up.sh:105–128):_disinto_init_nomadforwards--dry-runwithoutsudo;cluster-up.shexits inside the dry-run block (line 105) before the root precondition check (line 134). Non-root dry-runs work as intended.poll_until_healthyset-e safety (cluster-up.sh:230–244):systemctl is-failed --quiet "$svc" && _die_with_service_status— the&&putsis-failedin a conditional context, so a non-failed (non-zero) exit doesn't abort underset -e. Correct.vault_is_unsealedjq fallback (cluster-up.sh:187–196):jq ... || init=""correctly handles jq failure; final[ "$init" = "true" ] && [ "$sealed" = "false" ]returns non-zero for any missing/wrong value. Sound.vault-init.shspawns a temp server, persists keys, then its EXIT trap stops the temp server before returning — leaving port 8200 free for step 7'ssystemctl start vault. Ordering is sound.cluster-up.sh:328–333): Correctly applies the RETURN-trap-under-set-e lesson — cleanup fires on any exit, including errexit-abort.nomad/client.hclhost_volume declarations exactly (7 dirs, same paths). Good.--emptyguard (bin/disinto:737–741): Explicitly rejected for--backend=dockerwith a clear error message. Correct.sudo -n -- "${cmd[@]}" || rc=$?and the dry-run path'sexit $?correctly propagatecluster-up.sh's exit code.Security
No issues. No secrets echoed. No
eval. All variables quoted.sudo -ncorrectly prevents hanging TTY prompts in factory runs. No external actions (no vault dispatch needed for local systemd/init operations).Architecture / Documentation
AGENTS.md documents no nomad-init behavior — this is new capability, not a behavior change to existing documented behavior. No documentation update required. Stub-first dispatch pattern from S0.1 is correctly retired now that the real implementation lands.
Verdict
APPROVE — S0.4 orchestrator is correct, idempotent, and follows all project conventions; no blockers found.
Reviewed at
481175e| AGENTS.mdAI Review: APPROVE — S0.4 orchestrator is correct, idempotent, and follows all project conventions; no blockers found.