Merge pull request 'fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)' (#857) from fix/issue-843 into main
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/nomad-validate Pipeline was successful

This commit is contained in:
dev-qwen2 2026-04-16 10:38:34 +00:00
commit 5324a6b119
2 changed files with 71 additions and 20 deletions

View file

@ -15,7 +15,9 @@
# #
# Steps (all fail-closed — any error blocks merge): # Steps (all fail-closed — any error blocks merge):
# 1. nomad-config-validate — `nomad config validate` on server + client HCL # 1. nomad-config-validate — `nomad config validate` on server + client HCL
# 2. nomad-job-validate — `nomad job validate` on every nomad/jobs/*.nomad.hcl # 2. nomad-job-validate — `nomad job validate` looped over every
# nomad/jobs/*.nomad.hcl (new jobspecs get
# CI coverage automatically)
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl # 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
# 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto # 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
# 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests # 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
@ -65,14 +67,35 @@ steps:
# with "unknown block 'job'", and vice versa. Hence two separate steps. # with "unknown block 'job'", and vice versa. Hence two separate steps.
# #
# Validation is offline: no running Nomad server is required (exit 0 on # Validation is offline: no running Nomad server is required (exit 0 on
# valid HCL, 1 on syntax/semantic error). One invocation per file — the # valid HCL, 1 on syntax/semantic error). The CLI takes a single path
# CLI takes a single path argument. New jobspecs get explicit lines here # argument so we loop over every `*.nomad.hcl` file under nomad/jobs/ —
# so bringing one up is a conscious CI edit, matching step 1's pattern # that way a new jobspec PR gets CI coverage automatically (no separate
# and this file's "no-ad-hoc-steps" principle. # "edit the pipeline" step to forget). The `.nomad.hcl` suffix is the
# naming convention documented in nomad/AGENTS.md; anything else in
# nomad/jobs/ is deliberately not validated by this step.
#
# `[ -f "$f" ]` guards against the no-match case: POSIX sh does not
# nullglob, so an empty jobs/ directory would leave the literal glob in
# "$f" and fail. Today forgejo.nomad.hcl exists, but the guard keeps the
# step safe during any future transient empty state.
#
# Scope note: offline validate catches jobspec-level errors (unknown
# stanzas, missing required fields, wrong value types, invalid driver
# config). It does NOT resolve cross-file references like host_volume
# source names against nomad/client.hcl — that mismatch surfaces at
# scheduling time on the live cluster, not here. The paired-write rule
# in nomad/AGENTS.md ("add to both client.hcl and cluster-up.sh") is the
# primary guardrail for that class of drift.
- name: nomad-job-validate - name: nomad-job-validate
image: hashicorp/nomad:1.9.5 image: hashicorp/nomad:1.9.5
commands: commands:
- nomad job validate nomad/jobs/forgejo.nomad.hcl - |
set -e
for f in nomad/jobs/*.nomad.hcl; do
[ -f "$f" ] || continue
echo "validating jobspec: $f"
nomad job validate "$f"
done
# ── 3. Vault HCL syntax check ──────────────────────────────────────────── # ── 3. Vault HCL syntax check ────────────────────────────────────────────
# `vault operator diagnose` loads the config and runs a suite of checks. # `vault operator diagnose` loads the config and runs a suite of checks.

View file

@ -35,41 +35,69 @@ it owns.
## Adding a jobspec (Step 1 and later) ## Adding a jobspec (Step 1 and later)
1. Drop a file in `nomad/jobs/<service>.nomad.hcl`. 1. Drop a file in `nomad/jobs/<service>.nomad.hcl`. The `.nomad.hcl`
suffix is load-bearing: `.woodpecker/nomad-validate.yml` globs on
exactly that suffix to auto-pick up new jobspecs (see step 2 in
"How CI validates these files" below). Anything else in
`nomad/jobs/` is silently skipped by CI.
2. If it needs persistent state, reference a `host_volume` already 2. If it needs persistent state, reference a `host_volume` already
declared in `client.hcl`*don't* add ad-hoc host paths in the declared in `client.hcl`*don't* add ad-hoc host paths in the
jobspec. If a new volume is needed, add it to **both**: jobspec. If a new volume is needed, add it to **both**:
- `nomad/client.hcl` — the `host_volume "<name>" { path = … }` block - `nomad/client.hcl` — the `host_volume "<name>" { path = … }` block
- `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array - `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array
The two must stay in sync or nomad fingerprinting will fail and the The two must stay in sync or nomad fingerprinting will fail and the
node stays in "initializing". node stays in "initializing". Note that offline `nomad job validate`
will NOT catch a typo in the jobspec's `source = "..."` against the
client.hcl host_volume list (see step 2 below) — the scheduler
rejects the mismatch at placement time instead.
3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`. 3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`.
4. Add the jobspec path to `.woodpecker/nomad-validate.yml`'s trigger 4. No pipeline edit required — step 2 of `nomad-validate.yml` globs
list so CI validates it. over `nomad/jobs/*.nomad.hcl` and validates every match. Just make
sure the existing `nomad/**` trigger path still covers your file
(it does for anything under `nomad/jobs/`).
## How CI validates these files ## How CI validates these files
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`, `.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`
`lib/init/nomad/`, or `bin/disinto`. Four fail-closed steps: (including `nomad/jobs/`), `lib/init/nomad/`, or `bin/disinto`. Five
fail-closed steps:
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`** 1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
— parses the HCL, fails on unknown blocks, bad port ranges, invalid — parses the HCL, fails on unknown blocks, bad port ranges, invalid
driver config. Vault HCL is excluded (different tool). driver config. Vault HCL is excluded (different tool). Jobspecs are
2. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`** excluded too — agent-config and jobspec are disjoint HCL grammars;
running this step on a jobspec rejects it with "unknown block 'job'".
2. **`nomad job validate nomad/jobs/*.nomad.hcl`** (loop, one call per file)
— parses each jobspec's HCL, fails on unknown stanzas, missing
required fields, wrong value types, invalid driver config. Runs
offline (no Nomad server needed) so CI exit 0 ≠ "this will schedule
successfully"; it means "the HCL itself is well-formed". What this
step does NOT catch:
- cross-file references (`source = "forgejo-data"` typo against the
`host_volume` list in `client.hcl`) — that's a scheduling-time
check on the live cluster, not validate-time.
- image reachability — `image = "codeberg.org/forgejo/forgejo:11.0"`
is accepted even if the registry is down or the tag is wrong.
New jobspecs are picked up automatically by the glob — no pipeline
edit needed as long as the file is named `<name>.nomad.hcl`.
3. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`**
— Vault's equivalent syntax + schema check. `-skip=storage/listener` — Vault's equivalent syntax + schema check. `-skip=storage/listener`
disables the runtime checks (CI containers don't have disables the runtime checks (CI containers don't have
`/var/lib/vault/data` or port 8200). `/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only,
3. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`** e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge.
4. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
— all init/dispatcher shell clean. `bin/disinto` has no `.sh` — all init/dispatcher shell clean. `bin/disinto` has no `.sh`
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
it — this is the one place it gets checked. it — this is the one place it gets checked.
4. **`bats tests/disinto-init-nomad.bats`** 5. **`bats tests/disinto-init-nomad.bats`**
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`, — exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
`… --empty --dry-run`, and the `--backend=docker` regression guard. `… --empty --dry-run`, and the `--backend=docker` regression guard.
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1 If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
fails with a clear error; the fix makes it pass. PRs that don't touch fails with a clear error; if it breaks a jobspec (e.g. misspells
any of the trigger paths skip this pipeline entirely. `task` as `tsak`, or adds a `volume` stanza without a `source`), step
2 fails instead. The fix makes it pass. PRs that don't touch any of
the trigger paths skip this pipeline entirely.
## Version pinning ## Version pinning