fix: [nomad-step-0] S0.5 — Woodpecker CI validation for nomad/vault artifacts (#825)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Locks in static validation for every Nomad+Vault artifact before it can merge. Four fail-closed steps in .woodpecker/nomad-validate.yml, gated to PRs touching nomad/, lib/init/nomad/, or bin/disinto: 1. nomad config validate nomad/server.hcl nomad/client.hcl 2. vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener 3. shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto 4. bats tests/disinto-init-nomad.bats — dispatcher smoke tests bin/disinto picks up pre-existing SC2120 warnings on three passthrough wrappers (generate_agent_docker, generate_caddyfile, generate_staging_index); annotated with shellcheck disable=SC2120 so the new pipeline is clean without narrowing the warning for future code. Pinned image versions (hashicorp/nomad:1.9.5, hashicorp/vault:1.18.5) match lib/init/nomad/install.sh — bump both or neither. nomad/AGENTS.md documents the stack layout, how to add a jobspec in Step 1, how CI validates it, and the two-place version pinning rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
271ec9d8f5
commit
5150f8c486
4 changed files with 276 additions and 0 deletions
92
nomad/AGENTS.md
Normal file
92
nomad/AGENTS.md
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
# nomad/ — Agent Instructions
|
||||
|
||||
Nomad + Vault HCL for the factory's single-node cluster. These files are
|
||||
the source of truth that `lib/init/nomad/cluster-up.sh` copies onto a
|
||||
factory box under `/etc/nomad.d/` and `/etc/vault.d/` at init time.
|
||||
|
||||
This directory is part of the **Nomad+Vault migration (Step 0)** —
|
||||
see issues #821–#825 for the step breakdown. Jobspecs land in Step 1.
|
||||
|
||||
## What lives here
|
||||
|
||||
| File | Deployed to | Owned by |
|
||||
|---|---|---|
|
||||
| `server.hcl` | `/etc/nomad.d/server.hcl` | agent role, bind, ports, `data_dir` (S0.2) |
|
||||
| `client.hcl` | `/etc/nomad.d/client.hcl` | Docker driver cfg + `host_volume` declarations (S0.2) |
|
||||
| `vault.hcl` | `/etc/vault.d/vault.hcl` | Vault storage, listener, UI, `disable_mlock` (S0.3) |
|
||||
|
||||
Nomad auto-merges every `*.hcl` under `-config=/etc/nomad.d/`, so the
|
||||
split between `server.hcl` and `client.hcl` is for readability, not
|
||||
semantics. The top-of-file header in each config documents which blocks
|
||||
it owns.
|
||||
|
||||
## What does NOT live here yet
|
||||
|
||||
- **Jobspecs.** Step 0 brings up an *empty* cluster. Step 1 (and later)
|
||||
adds `*.nomad.hcl` job files for forgejo, woodpecker, agents, caddy,
|
||||
etc. When that lands, jobspecs will live in `nomad/jobs/` and each
|
||||
will get its own header comment pointing to the `host_volume` names
|
||||
it consumes (`volume = "forgejo-data"`, etc. — declared in
|
||||
`client.hcl`).
|
||||
- **TLS, ACLs, gossip encryption.** Deliberately absent in Step 0 —
|
||||
factory traffic stays on localhost. These land in later migration
|
||||
steps alongside multi-node support.
|
||||
|
||||
## Adding a jobspec (Step 1 and later)
|
||||
|
||||
1. Drop a file in `nomad/jobs/<service>.nomad.hcl`.
|
||||
2. If it needs persistent state, reference a `host_volume` already
|
||||
declared in `client.hcl` — *don't* add ad-hoc host paths in the
|
||||
jobspec. If a new volume is needed, add it to **both**:
|
||||
- `nomad/client.hcl` — the `host_volume "<name>" { path = … }` block
|
||||
- `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array
|
||||
The two must stay in sync or nomad fingerprinting will fail and the
|
||||
node stays in "initializing".
|
||||
3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`.
|
||||
4. Add the jobspec path to `.woodpecker/nomad-validate.yml`'s trigger
|
||||
list so CI validates it.
|
||||
|
||||
## How CI validates these files
|
||||
|
||||
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`,
|
||||
`lib/init/nomad/`, or `bin/disinto`. Four fail-closed steps:
|
||||
|
||||
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
|
||||
— parses the HCL, fails on unknown blocks, bad port ranges, invalid
|
||||
driver config. Vault HCL is excluded (different tool).
|
||||
2. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`**
|
||||
— Vault's equivalent syntax + schema check. `-skip=storage/listener`
|
||||
disables the runtime checks (CI containers don't have
|
||||
`/var/lib/vault/data` or port 8200).
|
||||
3. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
|
||||
— all init/dispatcher shell clean. `bin/disinto` has no `.sh`
|
||||
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
|
||||
it — this is the one place it gets checked.
|
||||
4. **`bats tests/disinto-init-nomad.bats`**
|
||||
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
|
||||
`… --empty --dry-run`, and the `--backend=docker` regression guard.
|
||||
|
||||
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
|
||||
fails with a clear error; the fix makes it pass. PRs that don't touch
|
||||
any of the trigger paths skip this pipeline entirely.
|
||||
|
||||
## Version pinning
|
||||
|
||||
Nomad + Vault versions are pinned in **two** places — bumping one
|
||||
without the other is a CI-caught drift:
|
||||
|
||||
- `lib/init/nomad/install.sh` — the apt-installed versions on factory
|
||||
boxes (`NOMAD_VERSION`, `VAULT_VERSION`).
|
||||
- `.woodpecker/nomad-validate.yml` — the `hashicorp/nomad:…` and
|
||||
`hashicorp/vault:…` image tags used for static validation.
|
||||
|
||||
Bump both in the same PR. The CI pipeline will fail if the pinned
|
||||
image's `config validate` rejects syntax the installed runtime would
|
||||
accept (or vice versa).
|
||||
|
||||
## Related
|
||||
|
||||
- `lib/init/nomad/` — installer + systemd units + cluster-up orchestrator.
|
||||
- `.woodpecker/nomad-validate.yml` — this directory's CI pipeline.
|
||||
- Top-of-file headers in `server.hcl` / `client.hcl` / `vault.hcl`
|
||||
document the per-file ownership contract.
|
||||
Loading…
Add table
Add a link
Reference in a new issue