Commit graph

8 commits

Author SHA1 Message Date
Agent
719fdaeac4 fix: [nomad-step-1] S1.3 — wire --with forgejo into bin/disinto init --backend=nomad (#842) 2026-04-16 12:19:51 +00:00
Claude
93018b3db6 fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.

Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.

Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
  nullglob, so an empty jobs/ dir would otherwise leave the literal
  glob in $f and fail nomad job validate with "no such file". Not
  reachable today (forgejo.nomad.hcl exists), but keeps the step safe
  against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
  (default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
  specific jobspec on failure.

Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
  review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
  catches (unknown stanzas, missing required fields, wrong value
  types, bad driver config) and what it does NOT catch (cross-file
  host_volume name resolution against client.hcl — that's a
  scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
  long as the file follows the `*.nomad.hcl` naming convention. The
  suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
  host_volume scheduling-time check, so contributors know the
  paired-write rule (client.hcl + cluster-up.sh) is the real
  guardrail for that class of drift.

Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
  2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
  shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
  no change needed to `when:` block).

Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
Claude
db64f2fdae fix: address review — rename forgejo.nomad.hcl + wire nomad job validate CI
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Two blockers from the #844 review:

1. Rename nomad/jobs/forgejo.hcl → nomad/jobs/forgejo.nomad.hcl to match
   the convention documented in nomad/AGENTS.md:38 (*.nomad.hcl suffix).
   First jobspec sets the pattern for all future ones; keeps any glob-
   based tooling over nomad/jobs/*.nomad.hcl working.
2. Add a dedicated `nomad-job-validate` step to .woodpecker/nomad-validate.yml.
   `nomad config validate` (step 1) parses agent configs only — it rejects
   jobspec HCL as "unknown block 'job'". `nomad job validate` is the
   correct offline validator for jobspec HCL. Per the Hashicorp docs it
   does not require a running agent (exit 0 clean, 1 on syntax/semantic
   error). New jobspecs will add an explicit line alongside forgejo's,
   matching step 1's enumeration pattern and this file's "no-ad-hoc-steps"
   principle.

Also updated the file header comment and the pipeline's top-of-file step
index to reflect the new step ordering (2. nomad-job-validate inserted;
old 2-4 renumbered to 3-5).

Refs: #840 (S1.1), PR #844
2026-04-16 10:11:34 +00:00
Claude
2ad4bdc624 fix: [nomad-step-1] S1.1 — add nomad/jobs/forgejo.hcl (service job, host_volume, port 3000) (#840)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
First Nomad jobspec to land under nomad/jobs/ as part of the Nomad+Vault
migration. Proves the docker driver + host_volume plumbing wired up in
Step 0 (client.hcl) by defining a real factory service:

  - job type=service, datacenters=["dc1"], 1 group × 1 task
  - docker driver, image pinned to codeberg.org/forgejo/forgejo:11.0
    (matches docker-compose.yml)
  - network port "http" static=3000, to=3000 (same host:port as compose,
    so agents/woodpecker/caddy reach forgejo unchanged across cutover)
  - mounts the forgejo-data host_volume from nomad/client.hcl at /data
  - non-secret env subset from docker-compose's forgejo service (DB
    type, ROOT_URL, HTTP_PORT, INSTALL_LOCK, DISABLE_REGISTRATION,
    webhook allow-list); OAuth/secret env vars land in Step 2 via Vault
  - Nomad-native service discovery (provider="nomad", no Consul) with
    HTTP check on /api/v1/version (10s interval, 3s timeout). No
    initial_status override — Nomad waits for first probe to pass.
  - restart: 3 attempts / 5m / 15s delay / mode=delay
  - resources: cpu=300 memory=512 baseline

No changes to docker-compose.yml — the docker stack remains the
factory's runtime until cutover. CI integration (`nomad job validate`)
is tracked by #843.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:55:35 +00:00
Claude
e9c144a511 chore: gardener housekeeping 2026-04-16
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline failed
2026-04-16 08:38:31 +00:00
Claude
5150f8c486 fix: [nomad-step-0] S0.5 — Woodpecker CI validation for nomad/vault artifacts (#825)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Locks in static validation for every Nomad+Vault artifact before it can
merge. Four fail-closed steps in .woodpecker/nomad-validate.yml, gated
to PRs touching nomad/, lib/init/nomad/, or bin/disinto:

  1. nomad config validate nomad/server.hcl nomad/client.hcl
  2. vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener
  3. shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
  4. bats tests/disinto-init-nomad.bats — dispatcher smoke tests

bin/disinto picks up pre-existing SC2120 warnings on three passthrough
wrappers (generate_agent_docker, generate_caddyfile, generate_staging_index);
annotated with shellcheck disable=SC2120 so the new pipeline is clean
without narrowing the warning for future code.

Pinned image versions (hashicorp/nomad:1.9.5, hashicorp/vault:1.18.5)
match lib/init/nomad/install.sh — bump both or neither.

nomad/AGENTS.md documents the stack layout, how to add a jobspec in
Step 1, how CI validates it, and the two-place version pinning rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:54:06 +00:00
Claude
24cb8f83a2 fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:53:27 +00:00
Claude
06ead3a19d fix: [nomad-step-0] S0.2 — install nomad + systemd unit + nomad/server.hcl/client.hcl (#822)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Lands the Nomad install + baseline HCL config for the single-node factory
dev box. Nothing is wired into `disinto init` yet — S0.4 does that.

- lib/init/nomad/install.sh: idempotent apt install pinned to
  NOMAD_VERSION (default 1.9.5). Adds HashiCorp apt keyring and sources
  list only if absent; fast-paths when the pinned version is already
  installed.
- lib/init/nomad/systemd-nomad.sh: writes /etc/systemd/system/nomad.service
  (rewrites only when content differs), creates /etc/nomad.d and
  /var/lib/nomad, runs `systemctl enable nomad` WITHOUT starting.
- nomad/server.hcl: single-node combined server+client role. bootstrap_expect=1,
  localhost bind, default ports pinned explicitly, UI enabled. No TLS/ACL —
  factory dev box baseline.
- nomad/client.hcl: Docker task driver (allow_privileged=false, volumes
  enabled) and host_volume pre-wiring for forgejo-data, woodpecker-data,
  agent-data, project-repos, caddy-data, chat-history, ops-repo under
  /srv/disinto/*.

Verified: `nomad config validate nomad/*.hcl` reports "Configuration is
valid!" (with expected TLS/bootstrap warnings for a dev box). Shellcheck
clean across the repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:04:02 +00:00