2026-04-16 08:38:31 +00:00
|
|
|
|
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
|
2026-04-16 07:54:06 +00:00
|
|
|
|
# nomad/ — Agent Instructions
|
|
|
|
|
|
|
|
|
|
|
|
Nomad + Vault HCL for the factory's single-node cluster. These files are
|
|
|
|
|
|
the source of truth that `lib/init/nomad/cluster-up.sh` copies onto a
|
|
|
|
|
|
factory box under `/etc/nomad.d/` and `/etc/vault.d/` at init time.
|
|
|
|
|
|
|
|
|
|
|
|
This directory is part of the **Nomad+Vault migration (Step 0)** —
|
|
|
|
|
|
see issues #821–#825 for the step breakdown. Jobspecs land in Step 1.
|
|
|
|
|
|
|
|
|
|
|
|
## What lives here
|
|
|
|
|
|
|
|
|
|
|
|
| File | Deployed to | Owned by |
|
|
|
|
|
|
|---|---|---|
|
|
|
|
|
|
| `server.hcl` | `/etc/nomad.d/server.hcl` | agent role, bind, ports, `data_dir` (S0.2) |
|
|
|
|
|
|
| `client.hcl` | `/etc/nomad.d/client.hcl` | Docker driver cfg + `host_volume` declarations (S0.2) |
|
|
|
|
|
|
| `vault.hcl` | `/etc/vault.d/vault.hcl` | Vault storage, listener, UI, `disable_mlock` (S0.3) |
|
|
|
|
|
|
|
|
|
|
|
|
Nomad auto-merges every `*.hcl` under `-config=/etc/nomad.d/`, so the
|
|
|
|
|
|
split between `server.hcl` and `client.hcl` is for readability, not
|
|
|
|
|
|
semantics. The top-of-file header in each config documents which blocks
|
|
|
|
|
|
it owns.
|
|
|
|
|
|
|
|
|
|
|
|
## What does NOT live here yet
|
|
|
|
|
|
|
|
|
|
|
|
- **Jobspecs.** Step 0 brings up an *empty* cluster. Step 1 (and later)
|
2026-04-16 12:39:09 +00:00
|
|
|
|
adds `*.hcl` job files for forgejo, woodpecker, agents, caddy,
|
2026-04-16 07:54:06 +00:00
|
|
|
|
etc. When that lands, jobspecs will live in `nomad/jobs/` and each
|
|
|
|
|
|
will get its own header comment pointing to the `host_volume` names
|
|
|
|
|
|
it consumes (`volume = "forgejo-data"`, etc. — declared in
|
|
|
|
|
|
`client.hcl`).
|
|
|
|
|
|
- **TLS, ACLs, gossip encryption.** Deliberately absent in Step 0 —
|
|
|
|
|
|
factory traffic stays on localhost. These land in later migration
|
|
|
|
|
|
steps alongside multi-node support.
|
|
|
|
|
|
|
|
|
|
|
|
## Adding a jobspec (Step 1 and later)
|
|
|
|
|
|
|
2026-04-16 12:39:09 +00:00
|
|
|
|
1. Drop a file in `nomad/jobs/<service>.hcl`. The `.hcl` suffix is
|
|
|
|
|
|
load-bearing: `.woodpecker/nomad-validate.yml` globs on exactly that
|
|
|
|
|
|
suffix to auto-pick up new jobspecs (see step 2 in "How CI validates
|
|
|
|
|
|
these files" below). Anything else in `nomad/jobs/` is silently
|
|
|
|
|
|
skipped by CI.
|
2026-04-16 07:54:06 +00:00
|
|
|
|
2. If it needs persistent state, reference a `host_volume` already
|
|
|
|
|
|
declared in `client.hcl` — *don't* add ad-hoc host paths in the
|
|
|
|
|
|
jobspec. If a new volume is needed, add it to **both**:
|
|
|
|
|
|
- `nomad/client.hcl` — the `host_volume "<name>" { path = … }` block
|
|
|
|
|
|
- `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array
|
|
|
|
|
|
The two must stay in sync or nomad fingerprinting will fail and the
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
node stays in "initializing". Note that offline `nomad job validate`
|
|
|
|
|
|
will NOT catch a typo in the jobspec's `source = "..."` against the
|
|
|
|
|
|
client.hcl host_volume list (see step 2 below) — the scheduler
|
|
|
|
|
|
rejects the mismatch at placement time instead.
|
2026-04-16 07:54:06 +00:00
|
|
|
|
3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`.
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
4. No pipeline edit required — step 2 of `nomad-validate.yml` globs
|
2026-04-16 12:39:09 +00:00
|
|
|
|
over `nomad/jobs/*.hcl` and validates every match. Just make sure
|
|
|
|
|
|
the existing `nomad/**` trigger path still covers your file (it
|
|
|
|
|
|
does for anything under `nomad/jobs/`).
|
2026-04-16 07:54:06 +00:00
|
|
|
|
|
|
|
|
|
|
## How CI validates these files
|
|
|
|
|
|
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`
|
|
|
|
|
|
(including `nomad/jobs/`), `lib/init/nomad/`, or `bin/disinto`. Five
|
|
|
|
|
|
fail-closed steps:
|
2026-04-16 07:54:06 +00:00
|
|
|
|
|
|
|
|
|
|
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
|
|
|
|
|
|
— parses the HCL, fails on unknown blocks, bad port ranges, invalid
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
driver config. Vault HCL is excluded (different tool). Jobspecs are
|
|
|
|
|
|
excluded too — agent-config and jobspec are disjoint HCL grammars;
|
|
|
|
|
|
running this step on a jobspec rejects it with "unknown block 'job'".
|
2026-04-16 12:39:09 +00:00
|
|
|
|
2. **`nomad job validate nomad/jobs/*.hcl`** (loop, one call per file)
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
— parses each jobspec's HCL, fails on unknown stanzas, missing
|
|
|
|
|
|
required fields, wrong value types, invalid driver config. Runs
|
|
|
|
|
|
offline (no Nomad server needed) so CI exit 0 ≠ "this will schedule
|
|
|
|
|
|
successfully"; it means "the HCL itself is well-formed". What this
|
|
|
|
|
|
step does NOT catch:
|
|
|
|
|
|
- cross-file references (`source = "forgejo-data"` typo against the
|
|
|
|
|
|
`host_volume` list in `client.hcl`) — that's a scheduling-time
|
|
|
|
|
|
check on the live cluster, not validate-time.
|
|
|
|
|
|
- image reachability — `image = "codeberg.org/forgejo/forgejo:11.0"`
|
|
|
|
|
|
is accepted even if the registry is down or the tag is wrong.
|
|
|
|
|
|
New jobspecs are picked up automatically by the glob — no pipeline
|
2026-04-16 12:39:09 +00:00
|
|
|
|
edit needed as long as the file is named `<name>.hcl`.
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
3. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`**
|
2026-04-16 07:54:06 +00:00
|
|
|
|
— Vault's equivalent syntax + schema check. `-skip=storage/listener`
|
|
|
|
|
|
disables the runtime checks (CI containers don't have
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
`/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only,
|
|
|
|
|
|
e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge.
|
|
|
|
|
|
4. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
|
2026-04-16 07:54:06 +00:00
|
|
|
|
— all init/dispatcher shell clean. `bin/disinto` has no `.sh`
|
|
|
|
|
|
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
|
|
|
|
|
|
it — this is the one place it gets checked.
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
5. **`bats tests/disinto-init-nomad.bats`**
|
2026-04-16 07:54:06 +00:00
|
|
|
|
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
|
|
|
|
|
|
`… --empty --dry-run`, and the `--backend=docker` regression guard.
|
|
|
|
|
|
|
|
|
|
|
|
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
|
fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.
Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.
Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
nullglob, so an empty jobs/ dir would otherwise leave the literal
glob in $f and fail nomad job validate with "no such file". Not
reachable today (forgejo.nomad.hcl exists), but keeps the step safe
against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
(default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
specific jobspec on failure.
Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
catches (unknown stanzas, missing required fields, wrong value
types, bad driver config) and what it does NOT catch (cross-file
host_volume name resolution against client.hcl — that's a
scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
long as the file follows the `*.nomad.hcl` naming convention. The
suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
host_volume scheduling-time check, so contributors know the
paired-write rule (client.hcl + cluster-up.sh) is the real
guardrail for that class of drift.
Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
no change needed to `when:` block).
Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
|
|
|
|
fails with a clear error; if it breaks a jobspec (e.g. misspells
|
|
|
|
|
|
`task` as `tsak`, or adds a `volume` stanza without a `source`), step
|
|
|
|
|
|
2 fails instead. The fix makes it pass. PRs that don't touch any of
|
|
|
|
|
|
the trigger paths skip this pipeline entirely.
|
2026-04-16 07:54:06 +00:00
|
|
|
|
|
|
|
|
|
|
## Version pinning
|
|
|
|
|
|
|
|
|
|
|
|
Nomad + Vault versions are pinned in **two** places — bumping one
|
|
|
|
|
|
without the other is a CI-caught drift:
|
|
|
|
|
|
|
|
|
|
|
|
- `lib/init/nomad/install.sh` — the apt-installed versions on factory
|
|
|
|
|
|
boxes (`NOMAD_VERSION`, `VAULT_VERSION`).
|
|
|
|
|
|
- `.woodpecker/nomad-validate.yml` — the `hashicorp/nomad:…` and
|
|
|
|
|
|
`hashicorp/vault:…` image tags used for static validation.
|
|
|
|
|
|
|
|
|
|
|
|
Bump both in the same PR. The CI pipeline will fail if the pinned
|
|
|
|
|
|
image's `config validate` rejects syntax the installed runtime would
|
|
|
|
|
|
accept (or vice versa).
|
|
|
|
|
|
|
|
|
|
|
|
## Related
|
|
|
|
|
|
|
|
|
|
|
|
- `lib/init/nomad/` — installer + systemd units + cluster-up orchestrator.
|
|
|
|
|
|
- `.woodpecker/nomad-validate.yml` — this directory's CI pipeline.
|
|
|
|
|
|
- Top-of-file headers in `server.hcl` / `client.hcl` / `vault.hcl`
|
|
|
|
|
|
document the per-file ownership contract.
|