Commit graph

1833 commits

Author SHA1 Message Date
Agent
ffcadbfee0 fix: docs/agents-llama.md teaches the legacy activation flow (#848)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-16 12:53:03 +00:00
3465319ac5 Merge pull request 'fix: [nomad-step-1] S1.3 — wire --with forgejo into bin/disinto init --backend=nomad (#842)' (#868) from fix/issue-842-1 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 12:50:49 +00:00
4415eadce7 Merge pull request 'fix: hire-an-agent does not persist per-agent secrets to .env (#847)' (#866) from fix/issue-847 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 12:40:38 +00:00
Claude
c5a7b89a39 docs: [nomad-step-1] update nomad/AGENTS.md to *.hcl naming (#842)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Addresses review blocker on PR #868: the S1.3 PR renamed
nomad/jobs/forgejo.nomad.hcl → forgejo.hcl and changed the CI glob
from *.nomad.hcl to *.hcl, but nomad/AGENTS.md — the canonical spec
for the jobspec naming convention — still documented the old suffix
in six places. An agent following it would create <svc>.nomad.hcl
files (which match *.hcl and stay green) but the stated convention
would be wrong.

Updated all five references to use the new *.hcl / <service>.hcl
convention. Acceptance signal: `grep .nomad.hcl nomad/AGENTS.md`
returns zero matches.
2026-04-16 12:39:09 +00:00
Agent
a3eb33ccf7 fix: _validate_env_vars skips Anthropic-backend agents + missing sed escaping
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- bin/disinto: Remove '[ -n "$base_url" ] || continue' guard that caused
  all Anthropic-backend agents to be silently skipped during validation.
  The base_url check is now scoped only to backend-credential selection.

- lib/hire-agent.sh: Add sed escaping for ANTHROPIC_BASE_URL value before
  sed substitution (same pattern as ANTHROPIC_API_KEY at line 256).

Fixes AI review BLOCKER and MINOR issues on PR #866.
2026-04-16 12:29:00 +00:00
Agent
53a1fe397b fix: hire-an-agent does not persist per-agent secrets to .env (#847) 2026-04-16 12:29:00 +00:00
Claude
a835517aea fix: [nomad-step-1] S1.3 — restore --empty guard + drop hardcoded deploy --dry-run (#842)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Picks up from abandoned PR #859 (branch fix/issue-842 @ 6408023). Two
bugs in the prior art:

1. The `--empty is only valid with --backend=nomad` guard was removed
   when the `--with`/mutually-exclusive guards were added. This regressed
   test #6 in tests/disinto-init-nomad.bats:102 — `disinto init
   --backend=docker --empty --dry-run` was exiting 0 instead of failing.
   Restored alongside the new guards.

2. `_disinto_init_nomad` unconditionally appended `--dry-run` to the
   real-run deploy_cmd, so even `disinto init --backend=nomad --with
   forgejo` (no --dry-run) would only echo the deploy plan instead of
   actually running nomad job run. That violates the issue's acceptance
   criteria ("Forgejo job deploys", "curl http://localhost:3000/api/v1/version
   returns 200"). Removed.

All 17 tests in tests/disinto-init-nomad.bats now pass; shellcheck clean.
2026-04-16 12:21:28 +00:00
Agent
d898741283 fix: [nomad-validate] add nomad version check before config validate 2026-04-16 12:19:51 +00:00
Agent
dfe61b55fc fix: [nomad-validate] update glob to *.hcl for forgejo.hcl validation 2026-04-16 12:19:51 +00:00
Agent
719fdaeac4 fix: [nomad-step-1] S1.3 — wire --with forgejo into bin/disinto init --backend=nomad (#842) 2026-04-16 12:19:51 +00:00
9248c533d4 Merge pull request 'fix: bug: TOML [agents.X] section name with dash crashes load-project.sh (#862)' (#865) from fix/issue-862 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 12:16:55 +00:00
Claude
721d7a6077 fix: bug: TOML [agents.X] section name with dash crashes load-project.sh (#862)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
TOML allows dashes in bare keys, so `[agents.dev-qwen2]` is a valid
section. Before this fix, load-project.sh derived bash var names via
Python `.upper()` alone, which kept the dash and produced
`AGENT_DEV-QWEN2_BASE_URL` — an invalid shell identifier. Under
`set -euo pipefail` the subsequent `export` aborted the whole file,
silently taking the factory down on the N+1 run after a dashed agent
was hired via `disinto hire-an-agent`.

Normalize via `.upper().replace('-', '_')` to match the
`tr 'a-z-' 'A-Z_'` convention already used by hire-agent.sh (#834)
and generators.sh (#852). Also harden hire-agent.sh to reject invalid
agent names at hire time (before any Forgejo side effects), so
unparseable TOML sections never land on disk.

- `lib/load-project.sh` — dash-to-underscore in emitted shell var names
- `lib/hire-agent.sh` — validate agent name against
  `^[a-z]([a-z0-9]|-[a-z0-9])*$` up front
- `tests/lib-load-project.bats` — regression guard covering the parse
  path and the hire-time reject path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:55:59 +00:00
c63ca86a3c Merge pull request 'fix: bug: entrypoint clones project at /home/agent/repos/${COMPOSE_PROJECT_NAME} but TOML parse later rewrites PROJECT_REPO_ROOT — dev-agent cd fails silently (#861)' (#864) from fix/issue-861 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 11:51:00 +00:00
Agent
820ffafd0f fix: bug: entrypoint clones project at /home/agent/repos/${COMPOSE_PROJECT_NAME} but TOML parse later rewrites PROJECT_REPO_ROOT — dev-agent cd fails silently (#861)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-16 11:42:48 +00:00
342928bb32 Merge pull request 'fix: disinto up silently destroys profile-gated services (#845)' (#860) from fix/issue-845 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 11:33:02 +00:00
Claude
802a548783 fix: disinto up silently destroys profile-gated services (#845)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
TOML-driven agent services (emitted by `_generate_local_model_services`
for every `[agents.X]` entry) carried `profiles: ["agents-<name>"]`.
With `docker compose up -d --remove-orphans` and no `COMPOSE_PROFILES`
set, compose treated the hired agent container as an orphan and removed
it on every subsequent `disinto up` — silently killing dev-qwen and any
other TOML-declared local-model agent.

The profile gate was vestigial: the `[agents.X]` TOML entry is already
the activation gate — its presence is what drives emission of the
service block in the first place (#846). Drop the profile from emitted
services so they land in the default profile and survive `disinto up`.

Also update the "To start the agent, run" hint in `hire-an-agent` from
`docker compose --profile … up -d …` to `disinto up`, matching the new
activation model.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:22:29 +00:00
016d4fe8cc Merge pull request 'fix: bug: hire-an-agent does not add the new agent as collaborator on the project repo (#856)' (#858) from fix/issue-856 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 11:01:04 +00:00
Claude
9d5cbb4fa2 fix: bug: hire-an-agent does not add the new agent as collaborator on the project repo (#856)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
hire-an-agent now adds the new Forgejo user as a `write` collaborator on
`$FORGE_REPO` right after the token step, mirroring the collaborator setup
lib/forge-setup.sh applies to the canonical bot users. Without this, a
freshly hired agent's PATCH to assign itself an issue returned 403 Forbidden
and the dev-agent polled forever logging "claim lost to <none>".

issue_claim() now captures the PATCH HTTP status via `-w '%{http_code}'`
instead of swallowing failures with `curl -sf ... || return 1`. A 403 (or
any non-2xx) now surfaces a distinct log line naming the code — the missing
collaborator root cause would have been diagnosable in seconds instead of
minutes.

Also updates the lib-issue-claim bats mock to handle the new `-w` flag and
adds a regression test covering the HTTP-error log surfacing path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:47:53 +00:00
120bce5745 Merge pull request 'fix: [nomad-step-1] S1.2 — add lib/init/nomad/deploy.sh (dependency-ordered nomad job run + wait) (#841)' (#854) from fix/issue-841 into main
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 10:45:30 +00:00
5324a6b119 Merge pull request 'fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)' (#857) from fix/issue-843 into main
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 10:38:34 +00:00
Agent
6734887a0a fix: [nomad-step-1] S1.2 — add lib/init/nomad/deploy.sh (dependency-ordered nomad job run + wait) (#841)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-16 10:36:38 +00:00
Claude
93018b3db6 fix: [nomad-step-1] S1.4 — extend Woodpecker CI to nomad job validate nomad/jobs/*.hcl (#843)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Step 2 of .woodpecker/nomad-validate.yml previously ran
`nomad job validate` against a single explicit path
(nomad/jobs/forgejo.nomad.hcl, wired up during the S1.1 review). Replace
that with a POSIX-sh loop over nomad/jobs/*.nomad.hcl so every jobspec
gets CI coverage automatically — no "edit the pipeline" step to forget
when the next jobspec (woodpecker, caddy, agents, …) lands.

Why reverse S1.1's explicit-line approach: the "no-ad-hoc-steps"
principle that drove the explicit list was about keeping step *classes*
enumerated, not about re-listing every file of the same class. Globbing
over `*.nomad.hcl` still encodes a single class ("jobspec validation")
and is strictly stricter — a dropped jobspec can't silently bypass CI
because someone forgot to add its line. The `.nomad.hcl` suffix (set as
convention by S1.1 review) is what keeps non-jobspec HCL out of this
loop.

Implementation notes:
- `[ -f "$f" ] || continue` guards the no-match case. POSIX sh has no
  nullglob, so an empty jobs/ dir would otherwise leave the literal
  glob in $f and fail nomad job validate with "no such file". Not
  reachable today (forgejo.nomad.hcl exists), but keeps the step safe
  against any transient empty state during future refactors.
- `set -e` inside the block ensures the first failing jobspec aborts
  (default Woodpecker behavior, but explicit is cheap).
- Loop echoes the file being validated so CI logs point at the
  specific jobspec on failure.

Docs (nomad/AGENTS.md):
- "How CI validates these files" now lists all *five* steps (the S1.1
  review added step 2 but didn't update the doc; fixed in passing).
- Step 2 is documented with explicit scope: what offline validate
  catches (unknown stanzas, missing required fields, wrong value
  types, bad driver config) and what it does NOT catch (cross-file
  host_volume name resolution against client.hcl — that's a
  scheduling-time check; image reachability).
- "Adding a jobspec" step 4 updated: no pipeline edit required as
  long as the file follows the `*.nomad.hcl` naming convention. The
  suffix is now documented as load-bearing in step 1.
- Step 2 of the "Adding a jobspec" checklist cross-links the
  host_volume scheduling-time check, so contributors know the
  paired-write rule (client.hcl + cluster-up.sh) is the real
  guardrail for that class of drift.

Acceptance criteria:
- Broken jobspec (typo in stanza, missing required field) fails step
  2 with nomad's error message — covered by the loop over every file.
- Fixed jobspec passes — standard validate behavior.
- Step 1 (nomad config validate) untouched.
- No .sh changes, so no shellcheck impact; manual shellcheck pass
  shown clean.
- Trigger path `nomad/**` already covers `nomad/jobs/**` (confirmed,
  no change needed to `when:` block).

Refs: #843 (S1.4), #825 (S0.5 base pipeline), #840 (S1.1 first jobspec)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:32:08 +00:00
24dfa6c8d5 Merge pull request 'fix: [nomad-step-1] S1.1 — add nomad/jobs/forgejo.hcl (service job, host_volume, port 3000) (#840)' (#844) from fix/issue-840 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 10:18:21 +00:00
Claude
db64f2fdae fix: address review — rename forgejo.nomad.hcl + wire nomad job validate CI
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Two blockers from the #844 review:

1. Rename nomad/jobs/forgejo.hcl → nomad/jobs/forgejo.nomad.hcl to match
   the convention documented in nomad/AGENTS.md:38 (*.nomad.hcl suffix).
   First jobspec sets the pattern for all future ones; keeps any glob-
   based tooling over nomad/jobs/*.nomad.hcl working.
2. Add a dedicated `nomad-job-validate` step to .woodpecker/nomad-validate.yml.
   `nomad config validate` (step 1) parses agent configs only — it rejects
   jobspec HCL as "unknown block 'job'". `nomad job validate` is the
   correct offline validator for jobspec HCL. Per the Hashicorp docs it
   does not require a running agent (exit 0 clean, 1 on syntax/semantic
   error). New jobspecs will add an explicit line alongside forgejo's,
   matching step 1's enumeration pattern and this file's "no-ad-hoc-steps"
   principle.

Also updated the file header comment and the pipeline's top-of-file step
index to reflect the new step ordering (2. nomad-job-validate inserted;
old 2-4 renumbered to 3-5).

Refs: #840 (S1.1), PR #844
2026-04-16 10:11:34 +00:00
Claude
2ad4bdc624 fix: [nomad-step-1] S1.1 — add nomad/jobs/forgejo.hcl (service job, host_volume, port 3000) (#840)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
First Nomad jobspec to land under nomad/jobs/ as part of the Nomad+Vault
migration. Proves the docker driver + host_volume plumbing wired up in
Step 0 (client.hcl) by defining a real factory service:

  - job type=service, datacenters=["dc1"], 1 group × 1 task
  - docker driver, image pinned to codeberg.org/forgejo/forgejo:11.0
    (matches docker-compose.yml)
  - network port "http" static=3000, to=3000 (same host:port as compose,
    so agents/woodpecker/caddy reach forgejo unchanged across cutover)
  - mounts the forgejo-data host_volume from nomad/client.hcl at /data
  - non-secret env subset from docker-compose's forgejo service (DB
    type, ROOT_URL, HTTP_PORT, INSTALL_LOCK, DISABLE_REGISTRATION,
    webhook allow-list); OAuth/secret env vars land in Step 2 via Vault
  - Nomad-native service discovery (provider="nomad", no Consul) with
    HTTP check on /api/v1/version (10s interval, 3s timeout). No
    initial_status override — Nomad waits for first probe to pass.
  - restart: 3 attempts / 5m / 15s delay / mode=delay
  - resources: cpu=300 memory=512 baseline

No changes to docker-compose.yml — the docker stack remains the
factory's runtime until cutover. CI integration (`nomad job validate`)
is tracked by #843.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:55:35 +00:00
eefbec601b Merge pull request 'fix: [nomad-step-0] S0.1-fix — bin/disinto swallows --backend=nomad as repo_url positional (#835)' (#839) from fix/issue-835 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 09:31:23 +00:00
Claude
72ed1f112d fix: [nomad-step-0] S0.1-fix — bin/disinto swallows --backend=nomad as repo_url positional (#835)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Why: disinto_init() consumed $1 as repo_url before the argparse loop ran,
so `disinto init --backend=nomad --empty` had --backend=nomad swallowed
into repo_url, backend stayed at its "docker" default, and the --empty
validation then produced the nonsense "--empty is only valid with
--backend=nomad" error — flagged during S0.1 end-to-end verification on
a fresh LXC. nomad backend takes no positional anyway; the LXC already
has the repo cloned by the operator.

Change: only consume $1 as repo_url if it doesn't start with "--", then
defer the "repo URL required" check to after argparse (so the docker
path still errors with a helpful message on a missing positional, not
"Unknown option: --backend=docker").

Verified acceptance criteria:
  1. init --backend=nomad --empty             → dispatches to nomad
  2. init --backend=nomad --empty --dry-run   → 9-step plan, exit 0
  3. init <repo-url>                          → docker path unchanged
  4. init                                     → "repo URL required"
  5. init --backend=docker                    → "repo URL required"
                                                (not "Unknown option")
  6. shellcheck clean

Tests: 4 new regression cases in tests/disinto-init-nomad.bats covering
flag-first nomad invocation (both --flag=value and --flag value forms),
no-args docker default, and --backend=docker missing-positional error
path. Full suite: 10/10 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:19:36 +00:00
0850e83ec6 Merge pull request 'fix: fix: disinto hire-an-agent + compose generator defects blocking multi-llama-dev parallel operation (#834)' (#838) from fix/issue-834 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 09:12:04 +00:00
Claude
43dc86d84c fix: fix: disinto hire-an-agent + compose generator defects blocking multi-llama-dev parallel operation (#834)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Hiring a second llama-backed dev agent (e.g. `dev-qwen2`) alongside
`dev-qwen` tripped four defects that prevented safe parallel operation.

Gap 1 — hire-agent keyed per-agent token as FORGE_<ROLE>_TOKEN, so two
dev-role agents overwrote each other's token in .env. Re-key by agent
name via `tr 'a-z-' 'A-Z_'`: FORGE_TOKEN_<AGENT_UPPER>.

Gap 2 — hire-agent generated a random FORGE_PASS but never wrote it to
.env. The container's git credential helper needs both token and pass
to push over HTTPS (#361). Persist FORGE_PASS_<AGENT_UPPER> with the
same update-in-place idempotency as the token.

Gap 3 — _generate_local_model_services hardcoded FORGE_TOKEN_LLAMA for
every local-model service, forcing all hired llama agents to share one
Forgejo identity. Derive USER_UPPER from the TOML's `forge_user` field
and emit \${FORGE_TOKEN_<USER_UPPER>:-} per service.

Gap 4 — every local-model service mounted the shared `project-repos`
volume, so concurrent llama devs collided on /_factory worktree and
state/.dev-active. Switch to per-agent `project-repos-<service_name>`
and emit the matching top-level volume. Also escape embedded newlines
in `$all_vols` before the sed insertion so multi-agent volume lists
don't unterminate the substitute command.

.env.example documents the new FORGE_TOKEN_<AGENT> / FORGE_PASS_<AGENT>
naming convention (and preserves the legacy FORGE_TOKEN_LLAMA path used
by the ENABLE_LLAMA_AGENT=1 singleton build).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:55:48 +00:00
311e1926bb Merge pull request 'chore: gardener housekeeping' (#837) from chore/gardener-20260416-0838 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 08:52:37 +00:00
3b90bd234d Merge pull request 'fix: fix: issue_claim race — verify assignee after PATCH to prevent duplicate work (#830)' (#836) from fix/issue-830 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 08:46:39 +00:00
Claude
6533f322e3 fix: add last-reviewed watermark SHA to secret-scan safe patterns
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 08:46:00 +00:00
Claude
e9c144a511 chore: gardener housekeeping 2026-04-16
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline failed
2026-04-16 08:38:31 +00:00
Claude
620515634a fix: issue_claim race — verify assignee after PATCH to prevent duplicate work (#830)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Forgejo's assignees PATCH is last-write-wins, so two dev agents polling
concurrently could both observe .assignee == null at the pre-check, both
PATCH, and the loser would silently "succeed" and proceed to implement
the same issue — colliding at the PR/branch stage.

Re-read the assignee after the PATCH and bail out if it isn't self.
Label writes are moved AFTER this verification so a losing claim leaves
no stray in-progress label to roll back.

Adds tests/lib-issue-claim.bats covering the three paths:
  - happy path (single agent, re-read confirms self)
  - lost race (re-read shows another agent — returns 1, no labels added)
  - pre-check skip (initial GET already shows another agent)

Prerequisite for the LLAMA_BOTS parametric refactor that will run N
dev containers against the same project.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:35:18 +00:00
2a7ae0b7ea Merge pull request 'fix: [nomad-step-0] S0.5 — Woodpecker CI validation for nomad/vault artifacts (#825)' (#833) from fix/issue-825 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 08:18:46 +00:00
Claude
14c67f36e6 fix: add bats coverage for --backend <value> space-separated form (#825)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The bin/disinto flag loop has separate cases for `--backend value`
(space-separated) and `--backend=value`; a regression in either would
silently route to the docker default path. Per the "stub-first dispatch"
lesson, silent misrouting during a migration is the worst failure mode —
covering both forms closes that gap.

Also triggers a retry of the smoke-init pipeline step, which hit a known
Forgejo branch-indexing flake on pipeline #913 (same flake cleared on
retry for PR #829 pipelines #906#908); unrelated to the nomad-validate
changes, which went all-green in #913.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:06:51 +00:00
Claude
e5c41dd502 fix: tolerate vault operator diagnose exit 2 (advisory warnings) in CI (#825)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Pipeline #911 on PR #833 failed because `vault operator diagnose -config=
nomad/vault.hcl -skip=storage -skip=listener` returns exit code 2 — not
on a hard failure, but because our factory dev-box vault.hcl deliberately
runs TLS-disabled on a localhost-only listener (documented in the file
header), which triggers an advisory "Check Listener TLS" warning.

The -skip flag disables runtime sub-checks (storage access, listener
bind) but does NOT suppress the advisory checks on the parsed config, so
a valid dev-box config with documented-and-intentional warnings still
exits non-zero under strict CI.

Fix: wrap the command in a case on exit code. Treat rc=0 (all green)
and rc=2 (advisory warnings only — config still parses) as success, and
fail hard on rc=1 (real HCL/schema/storage failure) or any other rc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:59:28 +00:00
Claude
5150f8c486 fix: [nomad-step-0] S0.5 — Woodpecker CI validation for nomad/vault artifacts (#825)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Locks in static validation for every Nomad+Vault artifact before it can
merge. Four fail-closed steps in .woodpecker/nomad-validate.yml, gated
to PRs touching nomad/, lib/init/nomad/, or bin/disinto:

  1. nomad config validate nomad/server.hcl nomad/client.hcl
  2. vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener
  3. shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
  4. bats tests/disinto-init-nomad.bats — dispatcher smoke tests

bin/disinto picks up pre-existing SC2120 warnings on three passthrough
wrappers (generate_agent_docker, generate_caddyfile, generate_staging_index);
annotated with shellcheck disable=SC2120 so the new pipeline is clean
without narrowing the warning for future code.

Pinned image versions (hashicorp/nomad:1.9.5, hashicorp/vault:1.18.5)
match lib/init/nomad/install.sh — bump both or neither.

nomad/AGENTS.md documents the stack layout, how to add a jobspec in
Step 1, how CI validates it, and the two-place version pinning rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:54:06 +00:00
271ec9d8f5 Merge pull request 'fix: [nomad-step-0] S0.4 — disinto init --backend=nomad --empty orchestrator (cluster-up) (#824)' (#829) from fix/issue-824 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 07:42:47 +00:00
Claude
481175e043 fix: dedupe cluster-up.sh polling via poll_until_healthy helper (#824)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
CI duplicate-detection flagged the in-line vault + nomad polling loops
in cluster-up.sh as matching a 5-line window in vault-init.sh (the
`ready=1 / break / fi / sleep 1 / done` boilerplate).

Extracts the repeated pattern into three helpers at the top of the
file:

  - nomad_has_ready_node       wrapper so poll_until_healthy can take a
                               bare command name.
  - _die_with_service_status   shared "log + dump systemctl status +
                               die" path (factored out of the two
                               callsites + the timeout branch).
  - poll_until_healthy         ticks once per second up to TIMEOUT,
                               fail-fasts on systemd "failed" state,
                               and returns 0 on first successful check.

Step 7 (vault unseal) and Step 8 (nomad ready node) each collapse from
~15 lines of explicit for-loop bookkeeping to a one-line call. No
behavioural change: same tick cadence, same fail-fast, same status
dump on timeout. Local detect-duplicates.py run against main confirms
no new duplicates introduced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:26:54 +00:00
Claude
d2c6b33271 fix: [nomad-step-0] S0.4 — disinto init --backend=nomad --empty orchestrator (cluster-up) (#824)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
Wires S0.1–S0.3 into a single idempotent bring-up script and replaces
the S0.1 stub in _disinto_init_nomad so `disinto init --backend=nomad
--empty` produces a running empty single-node cluster on a fresh box.

lib/init/nomad/cluster-up.sh (new):
  1. install.sh                (nomad + vault binaries)
  2. systemd-nomad.sh          (unit + enable, not started)
  3. systemd-vault.sh          (unit + vault.hcl + enable)
  4. host-volume dirs under /srv/disinto/* (matching nomad/client.hcl)
  5. /etc/nomad.d/{server,client}.hcl (content-compare before write)
  6. vault-init.sh             (first-run init + unseal + persist keys)
  7. systemctl start vault     (poll until unsealed; fail-fast on
                                is-failed)
  8. systemctl start nomad     (poll until ≥1 node ready)
  9. /etc/profile.d/disinto-nomad.sh (VAULT_ADDR + NOMAD_ADDR for
                                      interactive shells)
  Re-running on a healthy box is a no-op — each sub-step is itself
  idempotent and steps 7/8 fast-path when already active + healthy.
  `--dry-run` prints the full step list and exits 0.

bin/disinto:
  - _disinto_init_nomad: replaces the S0.1 stub. Invokes cluster-up.sh
    directly (as root) or via `sudo -n` otherwise. Both `--empty` and
    the default (no flag) call cluster-up.sh today; Step 1 will branch
    on $empty to gate job deployment. --dry-run forwards through.
  - disinto_init: adds `--empty` flag parsing; rejects `--empty`
    combined with `--backend=docker` explicitly instead of silently
    ignoring it.
  - usage: documents `--empty` and drops the "stub, S0.1" annotation
    from --backend.

Closes #824.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:22:15 +00:00
accd10ec67 Merge pull request 'fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)' (#828) from fix/issue-823 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 07:04:57 +00:00
Claude
24cb8f83a2 fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:53:27 +00:00
75bec43c4a Merge pull request 'fix: [nomad-step-0] S0.2 — install nomad + systemd unit + nomad/server.hcl/client.hcl (#822)' (#827) from fix/issue-822 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 06:15:32 +00:00
Claude
06ead3a19d fix: [nomad-step-0] S0.2 — install nomad + systemd unit + nomad/server.hcl/client.hcl (#822)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Lands the Nomad install + baseline HCL config for the single-node factory
dev box. Nothing is wired into `disinto init` yet — S0.4 does that.

- lib/init/nomad/install.sh: idempotent apt install pinned to
  NOMAD_VERSION (default 1.9.5). Adds HashiCorp apt keyring and sources
  list only if absent; fast-paths when the pinned version is already
  installed.
- lib/init/nomad/systemd-nomad.sh: writes /etc/systemd/system/nomad.service
  (rewrites only when content differs), creates /etc/nomad.d and
  /var/lib/nomad, runs `systemctl enable nomad` WITHOUT starting.
- nomad/server.hcl: single-node combined server+client role. bootstrap_expect=1,
  localhost bind, default ports pinned explicitly, UI enabled. No TLS/ACL —
  factory dev box baseline.
- nomad/client.hcl: Docker task driver (allow_privileged=false, volumes
  enabled) and host_volume pre-wiring for forgejo-data, woodpecker-data,
  agent-data, project-repos, caddy-data, chat-history, ops-repo under
  /srv/disinto/*.

Verified: `nomad config validate nomad/*.hcl` reports "Configuration is
valid!" (with expected TLS/bootstrap warnings for a dev box). Shellcheck
clean across the repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:04:02 +00:00
74f49e1c2f Merge pull request 'fix: [nomad-step-0] S0.1 — add --backend=nomad flag + stub to bin/disinto init (#821)' (#826) from fix/issue-821 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 05:54:22 +00:00
Claude
de00400bc4 fix: [nomad-step-0] S0.1 — add --backend=nomad flag + stub to bin/disinto init (#821)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Lands the dispatch entry point for the Nomad+Vault migration. The docker
path remains the default and is byte-for-byte unchanged. The new
`--backend=nomad` value routes to a `_disinto_init_nomad` stub that fails
loud (exit 99) so no silent misrouting can happen while S0.2–S0.5 fill in
the real implementation. With `--dry-run --backend=nomad` the stub reports
status and exits 0 so dry-run callers (P7) don't see a hard failure.

- New `--backend <value>` flag (accepts `docker` | `nomad`); supports
  both `--backend nomad` and `--backend=nomad` forms.
- Invalid backend values are rejected with a clear error.
- `_disinto_init_nomad` lives next to `disinto_init` so future S0.x
  issues only need to fill in this function — flag parsing and dispatch
  stay frozen.
- `--help` lists the flag and both values.
- `shellcheck bin/disinto` introduces no new findings beyond the
  pre-existing baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 05:43:35 +00:00
32ab84a87c Merge pull request 'chore: gardener housekeeping 2026-04-16' (#819) from chore/gardener-20260416-0215 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 02:22:01 +00:00
Claude
c236350e00 chore: gardener housekeeping 2026-04-16
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- Bump AGENTS.md watermarks to HEAD (c363ee0) across all 9 per-directory files
- supervisor/AGENTS.md: document dual-container trigger (agents + edge) and SUPERVISOR_INTERVAL env var added by P1/#801
- lib/AGENTS.md: document agents-llama-all compose service (all 7 roles) added to generators.sh by P1/#801
- pending-actions.json: comment #623 (all deps now closed, ready for planner decomposition), comment #758 (needs human Forgejo admin action to unblock ops repo writes)
2026-04-16 02:15:38 +00:00
c363ee0aea Merge pull request 'fix: [nomad-prep] P12 — dispatcher commits result.json via git push, not bind-mount (#803)' (#818) from fix/issue-803 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 01:05:57 +00:00