diff --git a/.woodpecker/nomad-validate.yml b/.woodpecker/nomad-validate.yml index d5828e9..83946c3 100644 --- a/.woodpecker/nomad-validate.yml +++ b/.woodpecker/nomad-validate.yml @@ -14,10 +14,8 @@ # .woodpecker/nomad-validate.yml — the pipeline definition # # Steps (all fail-closed — any error blocks merge): -# 1. nomad-config-validate — `nomad config validate` on server + client HCL -# 2. nomad-job-validate — `nomad job validate` looped over every -# nomad/jobs/*.nomad.hcl (new jobspecs get -# CI coverage automatically) +# 1. nomad-config-validate — `nomad config validate` on server + client HCL +# 2. nomad-job-validate — `nomad job validate` on every nomad/jobs/*.nomad.hcl # 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl # 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto # 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests @@ -67,35 +65,14 @@ steps: # with "unknown block 'job'", and vice versa. Hence two separate steps. # # Validation is offline: no running Nomad server is required (exit 0 on - # valid HCL, 1 on syntax/semantic error). The CLI takes a single path - # argument so we loop over every `*.nomad.hcl` file under nomad/jobs/ — - # that way a new jobspec PR gets CI coverage automatically (no separate - # "edit the pipeline" step to forget). The `.nomad.hcl` suffix is the - # naming convention documented in nomad/AGENTS.md; anything else in - # nomad/jobs/ is deliberately not validated by this step. - # - # `[ -f "$f" ]` guards against the no-match case: POSIX sh does not - # nullglob, so an empty jobs/ directory would leave the literal glob in - # "$f" and fail. Today forgejo.nomad.hcl exists, but the guard keeps the - # step safe during any future transient empty state. - # - # Scope note: offline validate catches jobspec-level errors (unknown - # stanzas, missing required fields, wrong value types, invalid driver - # config). It does NOT resolve cross-file references like host_volume - # source names against nomad/client.hcl — that mismatch surfaces at - # scheduling time on the live cluster, not here. The paired-write rule - # in nomad/AGENTS.md ("add to both client.hcl and cluster-up.sh") is the - # primary guardrail for that class of drift. + # valid HCL, 1 on syntax/semantic error). One invocation per file — the + # CLI takes a single path argument. New jobspecs get explicit lines here + # so bringing one up is a conscious CI edit, matching step 1's pattern + # and this file's "no-ad-hoc-steps" principle. - name: nomad-job-validate image: hashicorp/nomad:1.9.5 commands: - - | - set -e - for f in nomad/jobs/*.nomad.hcl; do - [ -f "$f" ] || continue - echo "validating jobspec: $f" - nomad job validate "$f" - done + - nomad job validate nomad/jobs/forgejo.nomad.hcl # ── 3. Vault HCL syntax check ──────────────────────────────────────────── # `vault operator diagnose` loads the config and runs a suite of checks. diff --git a/nomad/AGENTS.md b/nomad/AGENTS.md index d80780f..ef7a43b 100644 --- a/nomad/AGENTS.md +++ b/nomad/AGENTS.md @@ -35,69 +35,41 @@ it owns. ## Adding a jobspec (Step 1 and later) -1. Drop a file in `nomad/jobs/.nomad.hcl`. The `.nomad.hcl` - suffix is load-bearing: `.woodpecker/nomad-validate.yml` globs on - exactly that suffix to auto-pick up new jobspecs (see step 2 in - "How CI validates these files" below). Anything else in - `nomad/jobs/` is silently skipped by CI. +1. Drop a file in `nomad/jobs/.nomad.hcl`. 2. If it needs persistent state, reference a `host_volume` already declared in `client.hcl` — *don't* add ad-hoc host paths in the jobspec. If a new volume is needed, add it to **both**: - `nomad/client.hcl` — the `host_volume "" { path = … }` block - `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array The two must stay in sync or nomad fingerprinting will fail and the - node stays in "initializing". Note that offline `nomad job validate` - will NOT catch a typo in the jobspec's `source = "..."` against the - client.hcl host_volume list (see step 2 below) — the scheduler - rejects the mismatch at placement time instead. + node stays in "initializing". 3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`. -4. No pipeline edit required — step 2 of `nomad-validate.yml` globs - over `nomad/jobs/*.nomad.hcl` and validates every match. Just make - sure the existing `nomad/**` trigger path still covers your file - (it does for anything under `nomad/jobs/`). +4. Add the jobspec path to `.woodpecker/nomad-validate.yml`'s trigger + list so CI validates it. ## How CI validates these files -`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/` -(including `nomad/jobs/`), `lib/init/nomad/`, or `bin/disinto`. Five -fail-closed steps: +`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`, +`lib/init/nomad/`, or `bin/disinto`. Four fail-closed steps: 1. **`nomad config validate nomad/server.hcl nomad/client.hcl`** — parses the HCL, fails on unknown blocks, bad port ranges, invalid - driver config. Vault HCL is excluded (different tool). Jobspecs are - excluded too — agent-config and jobspec are disjoint HCL grammars; - running this step on a jobspec rejects it with "unknown block 'job'". -2. **`nomad job validate nomad/jobs/*.nomad.hcl`** (loop, one call per file) - — parses each jobspec's HCL, fails on unknown stanzas, missing - required fields, wrong value types, invalid driver config. Runs - offline (no Nomad server needed) so CI exit 0 ≠ "this will schedule - successfully"; it means "the HCL itself is well-formed". What this - step does NOT catch: - - cross-file references (`source = "forgejo-data"` typo against the - `host_volume` list in `client.hcl`) — that's a scheduling-time - check on the live cluster, not validate-time. - - image reachability — `image = "codeberg.org/forgejo/forgejo:11.0"` - is accepted even if the registry is down or the tag is wrong. - New jobspecs are picked up automatically by the glob — no pipeline - edit needed as long as the file is named `.nomad.hcl`. -3. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`** + driver config. Vault HCL is excluded (different tool). +2. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`** — Vault's equivalent syntax + schema check. `-skip=storage/listener` disables the runtime checks (CI containers don't have - `/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only, - e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge. -4. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`** + `/var/lib/vault/data` or port 8200). +3. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`** — all init/dispatcher shell clean. `bin/disinto` has no `.sh` extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips it — this is the one place it gets checked. -5. **`bats tests/disinto-init-nomad.bats`** +4. **`bats tests/disinto-init-nomad.bats`** — exercises the dispatcher: `disinto init --backend=nomad --dry-run`, `… --empty --dry-run`, and the `--backend=docker` regression guard. If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1 -fails with a clear error; if it breaks a jobspec (e.g. misspells -`task` as `tsak`, or adds a `volume` stanza without a `source`), step -2 fails instead. The fix makes it pass. PRs that don't touch any of -the trigger paths skip this pipeline entirely. +fails with a clear error; the fix makes it pass. PRs that don't touch +any of the trigger paths skip this pipeline entirely. ## Version pinning