fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful

Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.

Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.

Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
  nullglob, so an empty jobs/ dir would otherwise leave the literal
  glob in $f and fail nomad job validate with "no such file". Not
  reachable today (forgejo.nomad.hcl exists), but keeps the step safe
  against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
  (default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
  specific jobspec on failure.

Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
  review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
  catches (unknown stanzas, missing required fields, wrong value
  types, bad driver config) and what it does NOT catch (cross-file
  host_volume name resolution against client.hcl — that's a
  scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
  long as the file follows the `*.nomad.hcl` naming convention. The
  suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
  host_volume scheduling-time check, so contributors know the
  paired-write rule (client.hcl + cluster-up.sh) is the real
  guardrail for that class of drift.

Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
  2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
  shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
  no change needed to `when:` block).

Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude 2026-04-16 10:32:08 +00:00
parent 24dfa6c8d5
commit 93018b3db6
2 changed files with 71 additions and 20 deletions

View file

@ -14,8 +14,10 @@
# .woodpecker/nomad-validate.yml — the pipeline definition
#
# Steps (all fail-closed — any error blocks merge):
# 1. nomad-config-validate — `nomad config validate` on server + client HCL
# 2. nomad-job-validate — `nomad job validate` on every nomad/jobs/*.nomad.hcl
# 1. nomad-config-validate — `nomad config validate` on server + client HCL
# 2. nomad-job-validate — `nomad job validate` looped over every
# nomad/jobs/*.nomad.hcl (new jobspecs get
# CI coverage automatically)
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
# 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
# 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
@ -65,14 +67,35 @@ steps:
# with "unknown block 'job'", and vice versa. Hence two separate steps.
#
# Validation is offline: no running Nomad server is required (exit 0 on
# valid HCL, 1 on syntax/semantic error). One invocation per file — the
# CLI takes a single path argument. New jobspecs get explicit lines here
# so bringing one up is a conscious CI edit, matching step 1's pattern
# and this file's "no-ad-hoc-steps" principle.
# valid HCL, 1 on syntax/semantic error). The CLI takes a single path
# argument so we loop over every `*.nomad.hcl` file under nomad/jobs/ —
# that way a new jobspec PR gets CI coverage automatically (no separate
# "edit the pipeline" step to forget). The `.nomad.hcl` suffix is the
# naming convention documented in nomad/AGENTS.md; anything else in
# nomad/jobs/ is deliberately not validated by this step.
#
# `[ -f "$f" ]` guards against the no-match case: POSIX sh does not
# nullglob, so an empty jobs/ directory would leave the literal glob in
# "$f" and fail. Today forgejo.nomad.hcl exists, but the guard keeps the
# step safe during any future transient empty state.
#
# Scope note: offline validate catches jobspec-level errors (unknown
# stanzas, missing required fields, wrong value types, invalid driver
# config). It does NOT resolve cross-file references like host_volume
# source names against nomad/client.hcl — that mismatch surfaces at
# scheduling time on the live cluster, not here. The paired-write rule
# in nomad/AGENTS.md ("add to both client.hcl and cluster-up.sh") is the
# primary guardrail for that class of drift.
- name: nomad-job-validate
image: hashicorp/nomad:1.9.5
commands:
- nomad job validate nomad/jobs/forgejo.nomad.hcl
- |
set -e
for f in nomad/jobs/*.nomad.hcl; do
[ -f "$f" ] || continue
echo "validating jobspec: $f"
nomad job validate "$f"
done
# ── 3. Vault HCL syntax check ────────────────────────────────────────────
# `vault operator diagnose` loads the config and runs a suite of checks.