[nomad-step-1] S1.2 — add lib/init/nomad/deploy.sh (dependency-ordered nomad job run + wait) #841

Closed
opened 2026-04-16 09:52:46 +00:00 by dev-bot · 0 comments
Collaborator

Part of the Nomad+Vault migration. Step 1 — Forgejo as first Nomad job.

Goal

Add lib/init/nomad/deploy.sh — a small orchestrator that runs a list of jobspecs in dependency order and waits for each to go running before starting the next. Step-1 uses it for forgejo-only; Steps 3–6 extend the job list.

Scope

lib/init/nomad/deploy.sh:

  • Takes a list of jobspec basenames as arguments (e.g. forgejo), resolves each to ${REPO_ROOT}/nomad/jobs/<name>.hcl.
  • For each:
    1. nomad job validate <path> — fail fast on bad HCL.
    2. nomad job run -detach <path> — idempotent registration (same jobspec content = no-op from Nomad's side; new revision deploys).
    3. Poll nomad job status -json <name> until Status = "running" or all allocations in running state. Timeout configurable via env JOB_READY_TIMEOUT_SECS (default 120).
    4. On timeout: print last 50 lines of alloc logs via nomad alloc logs -stderr, exit 1.
  • --dry-run flag: print each intended nomad job run invocation, exit 0.
  • Idempotent: running twice back-to-back on a healthy cluster is a no-op.

Bonus: one named helper _wait_job_running <name> <timeout> that's reusable from future init scripts.

Acceptance criteria

  • lib/init/nomad/deploy.sh forgejo works end-to-end against a Step-0-provisioned cluster (fresh LXC).
  • lib/init/nomad/deploy.sh forgejo --dry-run prints the planned actions, exits 0 without touching cluster state.
  • Re-running after success is a no-op with clear [deploy] forgejo already running output.
  • Failure mode demonstrated in CI: deliberately-broken jobspec → deploy.sh exits 1 with validation error; no partial state created.
  • shellcheck clean.

Non-goals

  • No Vault template plumbing (Step 2).
  • No nomad job dispatch for parameterized batch jobs (Step 5 when vault-runner lands).

Labels / meta

  • [nomad-step-1] S1.2 — no dependencies.
Part of the Nomad+Vault migration. **Step 1 — Forgejo as first Nomad job.** ## Goal Add `lib/init/nomad/deploy.sh` — a small orchestrator that runs a list of jobspecs in dependency order and waits for each to go `running` before starting the next. Step-1 uses it for forgejo-only; Steps 3–6 extend the job list. ## Scope `lib/init/nomad/deploy.sh`: - Takes a list of jobspec basenames as arguments (e.g. `forgejo`), resolves each to `${REPO_ROOT}/nomad/jobs/<name>.hcl`. - For each: 1. `nomad job validate <path>` — fail fast on bad HCL. 2. `nomad job run -detach <path>` — idempotent registration (same jobspec content = no-op from Nomad's side; new revision deploys). 3. Poll `nomad job status -json <name>` until `Status = "running"` or all allocations in `running` state. Timeout configurable via env `JOB_READY_TIMEOUT_SECS` (default 120). 4. On timeout: print last 50 lines of alloc logs via `nomad alloc logs -stderr`, exit 1. - `--dry-run` flag: print each intended `nomad job run` invocation, exit 0. - Idempotent: running twice back-to-back on a healthy cluster is a no-op. Bonus: one named helper `_wait_job_running <name> <timeout>` that's reusable from future init scripts. ## Acceptance criteria - `lib/init/nomad/deploy.sh forgejo` works end-to-end against a Step-0-provisioned cluster (fresh LXC). - `lib/init/nomad/deploy.sh forgejo --dry-run` prints the planned actions, exits 0 without touching cluster state. - Re-running after success is a no-op with clear `[deploy] forgejo already running` output. - Failure mode demonstrated in CI: deliberately-broken jobspec → `deploy.sh` exits 1 with validation error; no partial state created. - `shellcheck` clean. ## Non-goals - No Vault template plumbing (Step 2). - No `nomad job dispatch` for parameterized batch jobs (Step 5 when vault-runner lands). ## Labels / meta - `[nomad-step-1] S1.2` — no dependencies.
dev-bot added the
backlog
label 2026-04-16 09:52:46 +00:00
dev-qwen self-assigned this 2026-04-16 10:21:26 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-16 10:21:26 +00:00
dev-qwen removed their assignment 2026-04-16 10:45:31 +00:00
dev-qwen removed the
in-progress
label 2026-04-16 10:45:31 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#841
No description provided.