fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
All checks were successful
All checks were successful
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
24dfa6c8d5
commit
93018b3db6
2 changed files with 71 additions and 20 deletions
|
|
@ -14,8 +14,10 @@
|
|||
# .woodpecker/nomad-validate.yml — the pipeline definition
|
||||
#
|
||||
# Steps (all fail-closed — any error blocks merge):
|
||||
# 1. nomad-config-validate — `nomad config validate` on server + client HCL
|
||||
# 2. nomad-job-validate — `nomad job validate` on every nomad/jobs/*.nomad.hcl
|
||||
# 1. nomad-config-validate — `nomad config validate` on server + client HCL
|
||||
# 2. nomad-job-validate — `nomad job validate` looped over every
|
||||
# nomad/jobs/*.nomad.hcl (new jobspecs get
|
||||
# CI coverage automatically)
|
||||
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
|
||||
# 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
|
||||
# 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
|
||||
|
|
@ -65,14 +67,35 @@ steps:
|
|||
# with "unknown block 'job'", and vice versa. Hence two separate steps.
|
||||
#
|
||||
# Validation is offline: no running Nomad server is required (exit 0 on
|
||||
# valid HCL, 1 on syntax/semantic error). One invocation per file — the
|
||||
# CLI takes a single path argument. New jobspecs get explicit lines here
|
||||
# so bringing one up is a conscious CI edit, matching step 1's pattern
|
||||
# and this file's "no-ad-hoc-steps" principle.
|
||||
# valid HCL, 1 on syntax/semantic error). The CLI takes a single path
|
||||
# argument so we loop over every `*.nomad.hcl` file under nomad/jobs/ —
|
||||
# that way a new jobspec PR gets CI coverage automatically (no separate
|
||||
# "edit the pipeline" step to forget). The `.nomad.hcl` suffix is the
|
||||
# naming convention documented in nomad/AGENTS.md; anything else in
|
||||
# nomad/jobs/ is deliberately not validated by this step.
|
||||
#
|
||||
# `[ -f "$f" ]` guards against the no-match case: POSIX sh does not
|
||||
# nullglob, so an empty jobs/ directory would leave the literal glob in
|
||||
# "$f" and fail. Today forgejo.nomad.hcl exists, but the guard keeps the
|
||||
# step safe during any future transient empty state.
|
||||
#
|
||||
# Scope note: offline validate catches jobspec-level errors (unknown
|
||||
# stanzas, missing required fields, wrong value types, invalid driver
|
||||
# config). It does NOT resolve cross-file references like host_volume
|
||||
# source names against nomad/client.hcl — that mismatch surfaces at
|
||||
# scheduling time on the live cluster, not here. The paired-write rule
|
||||
# in nomad/AGENTS.md ("add to both client.hcl and cluster-up.sh") is the
|
||||
# primary guardrail for that class of drift.
|
||||
- name: nomad-job-validate
|
||||
image: hashicorp/nomad:1.9.5
|
||||
commands:
|
||||
- nomad job validate nomad/jobs/forgejo.nomad.hcl
|
||||
- |
|
||||
set -e
|
||||
for f in nomad/jobs/*.nomad.hcl; do
|
||||
[ -f "$f" ] || continue
|
||||
echo "validating jobspec: $f"
|
||||
nomad job validate "$f"
|
||||
done
|
||||
|
||||
# ── 3. Vault HCL syntax check ────────────────────────────────────────────
|
||||
# `vault operator diagnose` loads the config and runs a suite of checks.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue