fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843) #857
2 changed files with 71 additions and 20 deletions
|
|
@ -14,8 +14,10 @@
|
||||||
# .woodpecker/nomad-validate.yml — the pipeline definition
|
# .woodpecker/nomad-validate.yml — the pipeline definition
|
||||||
#
|
#
|
||||||
# Steps (all fail-closed — any error blocks merge):
|
# Steps (all fail-closed — any error blocks merge):
|
||||||
# 1. nomad-config-validate — `nomad config validate` on server + client HCL
|
# 1. nomad-config-validate — `nomad config validate` on server + client HCL
|
||||||
# 2. nomad-job-validate — `nomad job validate` on every nomad/jobs/*.nomad.hcl
|
# 2. nomad-job-validate — `nomad job validate` looped over every
|
||||||
|
# nomad/jobs/*.nomad.hcl (new jobspecs get
|
||||||
|
# CI coverage automatically)
|
||||||
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
|
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
|
||||||
# 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
|
# 4. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
|
||||||
# 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
|
# 5. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
|
||||||
|
|
@ -65,14 +67,35 @@ steps:
|
||||||
# with "unknown block 'job'", and vice versa. Hence two separate steps.
|
# with "unknown block 'job'", and vice versa. Hence two separate steps.
|
||||||
#
|
#
|
||||||
# Validation is offline: no running Nomad server is required (exit 0 on
|
# Validation is offline: no running Nomad server is required (exit 0 on
|
||||||
# valid HCL, 1 on syntax/semantic error). One invocation per file — the
|
# valid HCL, 1 on syntax/semantic error). The CLI takes a single path
|
||||||
# CLI takes a single path argument. New jobspecs get explicit lines here
|
# argument so we loop over every `*.nomad.hcl` file under nomad/jobs/ —
|
||||||
# so bringing one up is a conscious CI edit, matching step 1's pattern
|
# that way a new jobspec PR gets CI coverage automatically (no separate
|
||||||
# and this file's "no-ad-hoc-steps" principle.
|
# "edit the pipeline" step to forget). The `.nomad.hcl` suffix is the
|
||||||
|
# naming convention documented in nomad/AGENTS.md; anything else in
|
||||||
|
# nomad/jobs/ is deliberately not validated by this step.
|
||||||
|
#
|
||||||
|
# `[ -f "$f" ]` guards against the no-match case: POSIX sh does not
|
||||||
|
# nullglob, so an empty jobs/ directory would leave the literal glob in
|
||||||
|
# "$f" and fail. Today forgejo.nomad.hcl exists, but the guard keeps the
|
||||||
|
# step safe during any future transient empty state.
|
||||||
|
#
|
||||||
|
# Scope note: offline validate catches jobspec-level errors (unknown
|
||||||
|
# stanzas, missing required fields, wrong value types, invalid driver
|
||||||
|
# config). It does NOT resolve cross-file references like host_volume
|
||||||
|
# source names against nomad/client.hcl — that mismatch surfaces at
|
||||||
|
# scheduling time on the live cluster, not here. The paired-write rule
|
||||||
|
# in nomad/AGENTS.md ("add to both client.hcl and cluster-up.sh") is the
|
||||||
|
# primary guardrail for that class of drift.
|
||||||
- name: nomad-job-validate
|
- name: nomad-job-validate
|
||||||
image: hashicorp/nomad:1.9.5
|
image: hashicorp/nomad:1.9.5
|
||||||
commands:
|
commands:
|
||||||
- nomad job validate nomad/jobs/forgejo.nomad.hcl
|
- |
|
||||||
|
set -e
|
||||||
|
for f in nomad/jobs/*.nomad.hcl; do
|
||||||
|
[ -f "$f" ] || continue
|
||||||
|
echo "validating jobspec: $f"
|
||||||
|
nomad job validate "$f"
|
||||||
|
done
|
||||||
|
|
||||||
# ── 3. Vault HCL syntax check ────────────────────────────────────────────
|
# ── 3. Vault HCL syntax check ────────────────────────────────────────────
|
||||||
# `vault operator diagnose` loads the config and runs a suite of checks.
|
# `vault operator diagnose` loads the config and runs a suite of checks.
|
||||||
|
|
|
||||||
|
|
@ -35,41 +35,69 @@ it owns.
|
||||||
|
|
||||||
## Adding a jobspec (Step 1 and later)
|
## Adding a jobspec (Step 1 and later)
|
||||||
|
|
||||||
1. Drop a file in `nomad/jobs/<service>.nomad.hcl`.
|
1. Drop a file in `nomad/jobs/<service>.nomad.hcl`. The `.nomad.hcl`
|
||||||
|
suffix is load-bearing: `.woodpecker/nomad-validate.yml` globs on
|
||||||
|
exactly that suffix to auto-pick up new jobspecs (see step 2 in
|
||||||
|
"How CI validates these files" below). Anything else in
|
||||||
|
`nomad/jobs/` is silently skipped by CI.
|
||||||
2. If it needs persistent state, reference a `host_volume` already
|
2. If it needs persistent state, reference a `host_volume` already
|
||||||
declared in `client.hcl` — *don't* add ad-hoc host paths in the
|
declared in `client.hcl` — *don't* add ad-hoc host paths in the
|
||||||
jobspec. If a new volume is needed, add it to **both**:
|
jobspec. If a new volume is needed, add it to **both**:
|
||||||
- `nomad/client.hcl` — the `host_volume "<name>" { path = … }` block
|
- `nomad/client.hcl` — the `host_volume "<name>" { path = … }` block
|
||||||
- `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array
|
- `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array
|
||||||
The two must stay in sync or nomad fingerprinting will fail and the
|
The two must stay in sync or nomad fingerprinting will fail and the
|
||||||
node stays in "initializing".
|
node stays in "initializing". Note that offline `nomad job validate`
|
||||||
|
will NOT catch a typo in the jobspec's `source = "..."` against the
|
||||||
|
client.hcl host_volume list (see step 2 below) — the scheduler
|
||||||
|
rejects the mismatch at placement time instead.
|
||||||
3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`.
|
3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`.
|
||||||
4. Add the jobspec path to `.woodpecker/nomad-validate.yml`'s trigger
|
4. No pipeline edit required — step 2 of `nomad-validate.yml` globs
|
||||||
list so CI validates it.
|
over `nomad/jobs/*.nomad.hcl` and validates every match. Just make
|
||||||
|
sure the existing `nomad/**` trigger path still covers your file
|
||||||
|
(it does for anything under `nomad/jobs/`).
|
||||||
|
|
||||||
## How CI validates these files
|
## How CI validates these files
|
||||||
|
|
||||||
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`,
|
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`
|
||||||
`lib/init/nomad/`, or `bin/disinto`. Four fail-closed steps:
|
(including `nomad/jobs/`), `lib/init/nomad/`, or `bin/disinto`. Five
|
||||||
|
fail-closed steps:
|
||||||
|
|
||||||
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
|
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
|
||||||
— parses the HCL, fails on unknown blocks, bad port ranges, invalid
|
— parses the HCL, fails on unknown blocks, bad port ranges, invalid
|
||||||
driver config. Vault HCL is excluded (different tool).
|
driver config. Vault HCL is excluded (different tool). Jobspecs are
|
||||||
2. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`**
|
excluded too — agent-config and jobspec are disjoint HCL grammars;
|
||||||
|
running this step on a jobspec rejects it with "unknown block 'job'".
|
||||||
|
2. **`nomad job validate nomad/jobs/*.nomad.hcl`** (loop, one call per file)
|
||||||
|
— parses each jobspec's HCL, fails on unknown stanzas, missing
|
||||||
|
required fields, wrong value types, invalid driver config. Runs
|
||||||
|
offline (no Nomad server needed) so CI exit 0 ≠ "this will schedule
|
||||||
|
successfully"; it means "the HCL itself is well-formed". What this
|
||||||
|
step does NOT catch:
|
||||||
|
- cross-file references (`source = "forgejo-data"` typo against the
|
||||||
|
`host_volume` list in `client.hcl`) — that's a scheduling-time
|
||||||
|
check on the live cluster, not validate-time.
|
||||||
|
- image reachability — `image = "codeberg.org/forgejo/forgejo:11.0"`
|
||||||
|
is accepted even if the registry is down or the tag is wrong.
|
||||||
|
New jobspecs are picked up automatically by the glob — no pipeline
|
||||||
|
edit needed as long as the file is named `<name>.nomad.hcl`.
|
||||||
|
3. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`**
|
||||||
— Vault's equivalent syntax + schema check. `-skip=storage/listener`
|
— Vault's equivalent syntax + schema check. `-skip=storage/listener`
|
||||||
disables the runtime checks (CI containers don't have
|
disables the runtime checks (CI containers don't have
|
||||||
`/var/lib/vault/data` or port 8200).
|
`/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only,
|
||||||
3. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
|
e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge.
|
||||||
|
4. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
|
||||||
— all init/dispatcher shell clean. `bin/disinto` has no `.sh`
|
— all init/dispatcher shell clean. `bin/disinto` has no `.sh`
|
||||||
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
|
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
|
||||||
it — this is the one place it gets checked.
|
it — this is the one place it gets checked.
|
||||||
4. **`bats tests/disinto-init-nomad.bats`**
|
5. **`bats tests/disinto-init-nomad.bats`**
|
||||||
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
|
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
|
||||||
`… --empty --dry-run`, and the `--backend=docker` regression guard.
|
`… --empty --dry-run`, and the `--backend=docker` regression guard.
|
||||||
|
|
||||||
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
|
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
|
||||||
fails with a clear error; the fix makes it pass. PRs that don't touch
|
fails with a clear error; if it breaks a jobspec (e.g. misspells
|
||||||
any of the trigger paths skip this pipeline entirely.
|
`task` as `tsak`, or adds a `volume` stanza without a `source`), step
|
||||||
|
2 fails instead. The fix makes it pass. PRs that don't touch any of
|
||||||
|
the trigger paths skip this pipeline entirely.
|
||||||
|
|
||||||
## Version pinning
|
## Version pinning
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue