disinto

Author	SHA1	Message	Date
Claude	8efef9f1bb	fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/nomad-validate Pipeline was successful Details ci/woodpecker/pr/secret-scan Pipeline was successful Details Wires Nomad → Vault via workload identity so jobs can exchange their short-lived JWT for a Vault token carrying the policies in vault/policies/ — no shared VAULT_TOKEN in job env. - `lib/init/nomad/vault-nomad-auth.sh` — idempotent script: enable jwt auth at path `jwt-nomad`, config JWKS/algs, apply roles, install server.hcl + SIGHUP nomad on change. - `tools/vault-apply-roles.sh` — companion sync script (S2.1 sibling); reads vault/roles.yaml and upserts each Vault role under auth/jwt-nomad/role/<name> with created/updated/unchanged semantics. - `vault/roles.yaml` — declarative role→policy→bound_claims map; one entry per vault/policies/*.hcl. Keeps S2.1 policies and S2.3 role bindings visible side-by-side at review time. - `nomad/server.hcl` — adds vault stanza (enabled, address, default_identity.aud=["vault.io"], ttl=1h). - `lib/hvault.sh` — new `hvault_get_or_empty` helper shared between vault-apply-policies.sh, vault-apply-roles.sh, and vault-nomad-auth.sh; reads a Vault endpoint and distinguishes 200 / 404 / other. - `vault/policies/AGENTS.md` — extends S2.1 docs with JWT-auth role naming convention, token shape, and the "add new service" flow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:44:59 +00:00
Agent	3734920c0c	fix: [nomad-step-1] deploy.sh-fix — correct jq selectors for deployment status; add deployment ID retry All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/nomad-validate Pipeline was successful Details	2026-04-16 15:43:07 +00:00
Agent	dee05d21f8	fix: [nomad-step-1] deploy.sh-fix — poll deployment status not alloc status; bump timeout 120→240s (#878 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/nomad-validate Pipeline was successful Details	2026-04-16 15:29:41 +00:00
Claude	b77bae9c2a	fix: [nomad-step-0] S0.2-fix — install.sh must also install docker daemon (block step 1 placement) (#871 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/nomad-validate Pipeline was successful Details ci/woodpecker/pr/smoke-init Pipeline was successful Details Nomad's docker task driver reports Healthy=false without a running dockerd. On the factory dev box docker was pre-installed so Step 0's cluster-up passed silently, but a fresh ubuntu:24.04 LXC hit "missing drivers" placement failures the moment Step 1 tried to deploy forgejo (the first docker-driver consumer). Fix install.sh to also install docker.io + enable --now docker.service when absent, and add a poll for the nomad self-node's docker driver Detected+Healthy before declaring Step 8 done — otherwise the race between dockerd startup and nomad driver fingerprinting lets the node reach "ready" while docker is still unhealthy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:05:24 +00:00
Agent	6734887a0a	fix: [nomad-step-1] S1.2 — add lib/init/nomad/deploy.sh (dependency-ordered nomad job run + wait) (#841 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/push/nomad-validate Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/nomad-validate Pipeline was successful Details	2026-04-16 10:36:38 +00:00
Claude	481175e043	fix: dedupe cluster-up.sh polling via poll_until_healthy helper (#824 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/smoke-init Pipeline was successful Details CI duplicate-detection flagged the in-line vault + nomad polling loops in cluster-up.sh as matching a 5-line window in vault-init.sh (the `ready=1 / break / fi / sleep 1 / done` boilerplate). Extracts the repeated pattern into three helpers at the top of the file: - nomad_has_ready_node wrapper so poll_until_healthy can take a bare command name. - _die_with_service_status shared "log + dump systemctl status + die" path (factored out of the two callsites + the timeout branch). - poll_until_healthy ticks once per second up to TIMEOUT, fail-fasts on systemd "failed" state, and returns 0 on first successful check. Step 7 (vault unseal) and Step 8 (nomad ready node) each collapse from ~15 lines of explicit for-loop bookkeeping to a one-line call. No behavioural change: same tick cadence, same fail-fast, same status dump on timeout. Local detect-duplicates.py run against main confirms no new duplicates introduced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:26:54 +00:00
Claude	d2c6b33271	fix: [nomad-step-0] S0.4 — disinto init --backend=nomad --empty orchestrator (cluster-up) (#824 ) Some checks failed ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/pr/ci Pipeline failed Details ci/woodpecker/pr/smoke-init Pipeline failed Details Wires S0.1–S0.3 into a single idempotent bring-up script and replaces the S0.1 stub in _disinto_init_nomad so `disinto init --backend=nomad --empty` produces a running empty single-node cluster on a fresh box. lib/init/nomad/cluster-up.sh (new): 1. install.sh (nomad + vault binaries) 2. systemd-nomad.sh (unit + enable, not started) 3. systemd-vault.sh (unit + vault.hcl + enable) 4. host-volume dirs under /srv/disinto/* (matching nomad/client.hcl) 5. /etc/nomad.d/{server,client}.hcl (content-compare before write) 6. vault-init.sh (first-run init + unseal + persist keys) 7. systemctl start vault (poll until unsealed; fail-fast on is-failed) 8. systemctl start nomad (poll until ≥1 node ready) 9. /etc/profile.d/disinto-nomad.sh (VAULT_ADDR + NOMAD_ADDR for interactive shells) Re-running on a healthy box is a no-op — each sub-step is itself idempotent and steps 7/8 fast-path when already active + healthy. `--dry-run` prints the full step list and exits 0. bin/disinto: - _disinto_init_nomad: replaces the S0.1 stub. Invokes cluster-up.sh directly (as root) or via `sudo -n` otherwise. Both `--empty` and the default (no flag) call cluster-up.sh today; Step 1 will branch on $empty to gate job deployment. --dry-run forwards through. - disinto_init: adds `--empty` flag parsing; rejects `--empty` combined with `--backend=docker` explicitly instead of silently ignoring it. - usage: documents `--empty` and drops the "stub, S0.1" annotation from --backend. Closes #824. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:22:15 +00:00
Claude	24cb8f83a2	fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/secret-scan Pipeline was successful Details Adds the Vault half of the factory-dev-box bringup, landed but not started (per the install-but-don't-start pattern used for nomad in #822): - lib/init/nomad/install.sh — now also installs vault from the shared HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt entirely when both binaries are at their pins; partial upgrades only touch the package that drifted. - nomad/vault.hcl — single-node config: file storage backend at /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on. No TLS / HA / audit yet; those land in later steps. - lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key, CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the unit without starting it. Idempotent via content-compare. - lib/init/nomad/vault-init.sh — first-run init: spawns a temporary `vault server` if not already reachable, runs operator-init with key-shares=1/threshold=1, persists unseal.key + root.token (0400 root), unseals once in-process, shuts down the temp server. Re-run detects initialized + unseal.key present → no-op. Initialized but key missing is a hard failure (can't recover). lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token when the env var is absent, so no change needed there. Seal model: the single unseal key lives on disk; seal-key theft equals vault theft. Factory-dev-box-acceptable tradeoff — avoids running a second Vault to auto-unseal the first. Blocks S0.4 (#824). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 06:53:27 +00:00
Claude	06ead3a19d	fix: [nomad-step-0] S0.2 — install nomad + systemd unit + nomad/server.hcl/client.hcl (#822 ) All checks were successful ci/woodpecker/push/ci Pipeline was successful Details ci/woodpecker/pr/ci Pipeline was successful Details ci/woodpecker/pr/secret-scan Pipeline was successful Details Lands the Nomad install + baseline HCL config for the single-node factory dev box. Nothing is wired into `disinto init` yet — S0.4 does that. - lib/init/nomad/install.sh: idempotent apt install pinned to NOMAD_VERSION (default 1.9.5). Adds HashiCorp apt keyring and sources list only if absent; fast-paths when the pinned version is already installed. - lib/init/nomad/systemd-nomad.sh: writes /etc/systemd/system/nomad.service (rewrites only when content differs), creates /etc/nomad.d and /var/lib/nomad, runs `systemctl enable nomad` WITHOUT starting. - nomad/server.hcl: single-node combined server+client role. bootstrap_expect=1, localhost bind, default ports pinned explicitly, UI enabled. No TLS/ACL — factory dev box baseline. - nomad/client.hcl: Docker task driver (allow_privileged=false, volumes enabled) and host_volume pre-wiring for forgejo-data, woodpecker-data, agent-data, project-repos, caddy-data, chat-history, ops-repo under /srv/disinto/. Verified: `nomad config validate nomad/.hcl` reports "Configuration is valid!" (with expected TLS/bootstrap warnings for a dev box). Shellcheck clean across the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 06:04:02 +00:00

9 commits