Commit graph

1888 commits

Author SHA1 Message Date
29df502038 Merge pull request 'fix: vault-import.sh: pipe-separator in ops_data/paths_to_write silently truncates secret values containing | (#898)' (#913) from fix/issue-898 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 20:17:47 +00:00
6dcc36cc8d Merge pull request 'fix: fix: --build mode agents: service missing pull_policy: build (same root as #887) (#893)' (#911) from fix/issue-893 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 20:07:26 +00:00
Claude
27baf496db fix: vault-import.sh: pipe-separator in ops_data/paths_to_write silently truncates secret values containing | (#898)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Replace the `|`-delimited string accumulators with bash associative and
indexed arrays so any byte may appear in a secret value.

Two sites used `|` as a delimiter over data that includes user secrets:

1. ops_data["path:key"]="value|status" — extraction via `${data%%|*}`
   truncated values at the first `|` (silently corrupting writes).
2. paths_to_write["path"]="k1=v1|k2=v2|..." — split back via
   `IFS='|' read -ra` at write time, so a value containing `|` was
   shattered across kv pairs (silently misrouting writes).

Fix:

- Split ops_data into two assoc arrays (`ops_value`, `ops_status`) keyed
  on "vault_path:vault_key" — value and status are stored independently
  with no in-band delimiter. (`:` is safe because both vault_path and
  vault_key are identifier-safe.)
- Track distinct paths in `path_seen` and, for each path, collect its
  kv pairs into a fresh indexed `pairs_array` by filtering ops_value.
  `_kv_put_secret` already splits each entry on the first `=` only, so
  `=` and `|` inside values are both preserved.

Added a bats regression that imports values like `abc|xyz`, `p1|p2|p3`,
and `admin|with|pipes` and asserts they round-trip through Vault
unmodified. Values are single-quoted in the .env so they survive
`source` — the accumulator is what this test exercises.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 20:04:54 +00:00
dev-qwen2
9f67f79ecd fix: fix: --build mode agents: service missing pull_policy: build (same root as #887) (#893)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-16 19:53:57 +00:00
391aaa99a5 Merge pull request 'fix: lib/hvault.sh uses secret/ mount prefix but migration policies use kv/ — agents will get 403 (#890)' (#909) from fix/issue-890 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 19:49:21 +00:00
164851fc9b Merge pull request 'fix: [nomad-step-2] S2.5 — bin/disinto init --import-env / --import-sops / --age-key wire-up (#883)' (#907) from fix/issue-883-2 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 19:44:44 +00:00
dev-qwen2
5fd36e94bb fix: lib/hvault.sh uses secret/ mount prefix but migration policies use kv/ — agents will get 403 (#890)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Changes:
- Add VAULT_KV_MOUNT env var (default: kv) to make KV mount configurable
- Update hvault_kv_get to use ${VAULT_KV_MOUNT}/data/${path}
- Update hvault_kv_put to use ${VAULT_KV_MOUNT}/data/${path}
- Update hvault_kv_list to use ${VAULT_KV_MOUNT}/metadata/${path}
- Update tests to use kv/ paths instead of secret/

This ensures agents can read/write secrets using the same mount point
that the Nomad+Vault migration policies grant ACL for.
2026-04-16 19:32:36 +00:00
Claude
ece5d9b6cc fix: [nomad-step-2] S2.5 review — gate policies/auth/import on --empty; reject --empty + --import-* (#883)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Addresses review #907 blocker: docs/nomad-migration.md claimed
--empty "skips policies/auth/import/deploy" but _disinto_init_nomad
had no $empty gate around those blocks — operators reaching the
"cluster-only escape hatch" would still invoke vault-apply-policies.sh
and vault-nomad-auth.sh, contradicting the runbook.

Changes:
- _disinto_init_nomad: exit 0 immediately after cluster-up when
  --empty is set, in both dry-run and real-run branches. Only the
  cluster-up plan appears; no policies, no auth, no import, no
  deploy. Matches the docs.
- disinto_init: reject --empty combined with any --import-* flag.
  --empty discards the import step, so the combination silently
  does nothing (worse failure mode than a clear error up front).
  Symmetric to the existing --empty vs --with check.
- Pre-flight existence check for policies/auth scripts now runs
  unconditionally on the non-empty path (previously gated on
  --import-*), matching the unconditional invocation. Import-script
  check stays gated on --import-*.

Non-blocking observation also addressed: the pre-flight guard
comment + actual predicate were inconsistent ("unconditionally
invoke policies+auth" but only checked on import). Now the
predicate matches: [ "$empty" != "true" ] gates policies/auth,
and an inner --import-* guard gates the import script.

Tests (+3):
- --empty --dry-run shows no S2.x sections (negative assertions)
- --empty --import-env rejected
- --empty --import-sops --age-key rejected

30/30 nomad tests pass; shellcheck clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:25:32 +00:00
Claude
aa3782748d fix: [nomad-step-2] S2.5 — bin/disinto init --import-env / --import-sops / --age-key wire-up (#883)
Wire the Step-2 building blocks (import, auth, policies) into
`disinto init --backend=nomad` so a single command on a fresh LXC
provisions cluster + policies + auth + imports secrets + deploys
services.

Adds three flags to `disinto init --backend=nomad`:
  --import-env PATH   plaintext .env from old stack
  --import-sops PATH  sops-encrypted .env.vault.enc (requires --age-key)
  --age-key PATH      age keyfile to decrypt --import-sops

Flow: cluster-up.sh → vault-apply-policies.sh → vault-nomad-auth.sh →
(optional) vault-import.sh → deploy.sh. Policies + auth run on every
nomad real-run path (idempotent); import runs only when --import-* is
set; all layers safe to re-run.

Flag validation:
  --import-sops without --age-key → error
  --age-key without --import-sops → error
  --import-env alone (no sops)    → OK
  --backend=docker + any --import-* → error

Dry-run prints a five-section plan (cluster-up + policies + auth +
import + deploy) with every argv that would be executed; touches
nothing, logs no secret values.

Dry-run output prints one line per --import-* flag that is actually
set — not in an if/elif chain — so all three paths appear when all
three flags are passed. Prior attempts regressed this invariant.

Tests:
  tests/disinto-init-nomad.bats +10 cases covering flag validation,
  dry-run plan shape (each flag prints its own path), policies+auth
  always-on (without --import-*), and --flag=value form.

Docs: docs/nomad-migration.md new file — cutover-day runbook with
invocation shape, flag summary, idempotency contract, dry-run, and
secret-hygiene notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:25:32 +00:00
520f8f1be8 Merge pull request 'fix: Two parallel activation paths for llama agents (ENABLE_LLAMA_AGENT vs [agents.X] TOML) (#846)' (#906) from fix/issue-846 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 19:22:43 +00:00
d0062ec859 Merge pull request 'fix: fix: vault_request RETURN trap fires prematurely when vault-env.sh is sourced (#773)' (#904) from fix/issue-773 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 19:11:43 +00:00
dev-qwen2
e003829eaa fix: Remove agents-llama service references from docs and formulas (#846)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- AGENTS.md: Replace agents-llama and agents-llama-all rows with generic
  'Local-model agents' entry pointing to docs/agents-llama.md
- formulas/release.sh: Remove agents-llama from docker compose stop/up
  commands (line 181-182)
- formulas/release.toml: Remove agents-llama references from restart-agents
  step description (lines 192, 195, 206)

These changes complete the removal of the legacy ENABLE_LLAMA_AGENT activation
path. The release formula now only references the 'agents' service, which is
the only service that exists after disinto init regenerates docker-compose.yml
based on [agents.X] TOML sections.
2026-04-16 19:05:46 +00:00
dev-qwen2
28eb182487 fix: Two parallel activation paths for llama agents (ENABLE_LLAMA_AGENT vs [agents.X] TOML) (#846) 2026-04-16 19:05:46 +00:00
Agent
96870d9f30 fix: fix: vault_request RETURN trap fires prematurely when vault-env.sh is sourced (#773)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 19:02:47 +00:00
c77fb1dc53 Merge pull request 'fix: entrypoint: validate_projects_dir silently exits instead of logging FATAL under set -eo pipefail (#877)' (#905) from fix/issue-877 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 18:48:07 +00:00
Claude
bbaccd678d fix: entrypoint: validate_projects_dir silently exits instead of logging FATAL under set -eo pipefail (#877)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
`compgen -G ... | wc -l` under `set -eo pipefail` aborts the script on
the non-zero pipeline exit (compgen returns 1 on no match) before the
FATAL diagnostic branch can run. The container still fast-fails, but
operators saw no explanation.

Switch to the conditional `if ! compgen -G ... >/dev/null 2>&1; then`
pattern already used at the two other compgen call sites in this file
(bootstrap_factory_repo and the PROJECT_NAME parser). The count for the
success-path log is computed after we've confirmed at least one match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:36:42 +00:00
dd61d0d29e Merge pull request 'fix: [nomad-step-2] S2.6 — CI: vault policy fmt + validate + roles.yaml check (#884)' (#903) from fix/issue-884-1 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 18:27:34 +00:00
701872af61 Merge pull request 'chore: gardener housekeeping 2026-04-16' (#901) from chore/gardener-20260416-1810 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 18:17:32 +00:00
Claude
6e73c6dd1f fix: [nomad-step-2] S2.6 — CI: vault policy fmt + validate + roles.yaml check (#884)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Extend .woodpecker/nomad-validate.yml with three new fail-closed steps
that guard every artifact under vault/policies/ and vault/roles.yaml
before it can land:

  4. vault-policy-fmt      — cp+fmt+diff idempotence check (vault 1.18.5
                             has no `policy fmt -check` flag, so we
                             build the non-destructive check out of
                             `vault policy fmt` on a /tmp copy + diff
                             against the original)
  5. vault-policy-validate — HCL syntax + capability validation via
                             `vault policy write` against an inline
                             dev-mode Vault server (no offline
                             `policy validate` subcommand exists;
                             dev-mode writes are ephemeral so this is
                             a validator, not a deploy)
  6. vault-roles-validate  — yamllint + PyYAML-based role→policy
                             reference check (every role's `policy:`
                             field must match a vault/policies/*.hcl
                             basename; also checks the four required
                             fields name/policy/namespace/job_id)

Secret-scan coverage for vault/policies/*.hcl is already provided by
the P11 gate (.woodpecker/secret-scan.yml) via its `vault/**/*` trigger
path — this pipeline intentionally does NOT duplicate that gate to
avoid the inline-heredoc / YAML-parse failure mode that sank the prior
attempt at this issue (PR #896).

Trigger paths extended: `vault/policies/**` and `vault/roles.yaml`.
`lib/init/nomad/vault-*.sh` is already covered by the existing
`lib/init/nomad/**` glob.

Docs: nomad/AGENTS.md and vault/policies/AGENTS.md updated with the
policy lifecycle, the CI enforcement table, and the common failure
modes authors will see.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:15:03 +00:00
Claude
6d7e539c28 chore: gardener housekeeping 2026-04-16
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 18:10:18 +00:00
6bdbeb5bd2 Merge pull request 'fix: [nomad-step-2] S2.4 — forgejo.hcl reads admin creds from Vault via template stanza (#882)' (#897) from fix/issue-882 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 17:50:36 +00:00
8b287ebf9a Merge pull request 'fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880)' (#889) from fix/issue-880 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 17:39:05 +00:00
Claude
0bc6f9c3cd fix: shorten empty-Vault placeholders to dodge secret-scan TOKEN= pattern
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
The lib/secret-scan.sh `(SECRET|TOKEN|...)=<16+ non-space chars>`
rule flagged the long `INTERNAL_TOKEN=VAULT-EMPTY-run-tools-vault-
seed-forgejo-sh` placeholder as a plaintext secret, failing CI's
secret-scan workflow on every PR that touched nomad/jobs/forgejo.hcl.
Shorten both placeholders to `seed-me` (<16 chars) — still visible in
a `grep FORGEJO__security__` audit, still obviously broken. The
operator-facing fix pointer moves to the `# WARNING` comment line in
the rendered env and to a new block comment above the template stanza.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:33:15 +00:00
Claude
89e454d0c7 fix: [nomad-step-2] S2.4 — forgejo.hcl reads admin creds from Vault via template stanza (#882)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline failed
Upgrade nomad/jobs/forgejo.hcl to read SECRET_KEY + INTERNAL_TOKEN from
Vault via a template stanza using the service-forgejo role (S2.3).
Non-secret config (DB, ports, ROOT_URL, registration lockdown) stays
inline. An empty-Vault fallback (`with ... else ...`) renders visible
placeholder env vars so a fresh LXC still brings forgejo up — the
operator sees the warning instead of forgejo silently regenerating
SECRET_KEY on every restart.

Add tools/vault-seed-forgejo.sh — idempotent seeder that ensures the
kv/ mount is KV v2 and populates kv/data/disinto/shared/forgejo with
random secret_key (32B hex) + internal_token (64B hex) on a clean
install. Existing non-empty values are left untouched; partial paths
are filled in atomically. Parser shape is positional-arity case
dispatch to stay structurally distinct from the two sibling vault-*.sh
tools and avoid the 5-line sliding-window dup detector.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:25:44 +00:00
dev-qwen2
428fa223d8 fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite for incremental updates and secure jq interpolation (#880)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-16 17:22:05 +00:00
dev-qwen2
197716ed5c fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite by grouping key-value pairs per path (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
b4c290bfda fix: [nomad-step-2] S2.2 — Fix bot/runner operation parsing and sops value extraction (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
78f92d0cd0 fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
7a1f0b2c26 fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
1dc50e5784 fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880) 2026-04-16 17:22:05 +00:00
a2a7c4a12c Merge pull request 'fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881)' (#895) from fix/issue-881 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 17:10:18 +00:00
Claude
b2c86c3037 fix: [nomad-step-2] S2.3 review round 1 — document new helper + script, drop unused vault CLI precondition (#881)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Review feedback from PR #895 round 1:

- lib/AGENTS.md (hvault.sh row): add hvault_get_or_empty(PATH) to the
  public-function list; replace the "not sourced at runtime yet" note
  with the three actual callers (vault-apply-policies.sh,
  vault-apply-roles.sh, vault-nomad-auth.sh).
- lib/AGENTS.md (lib/init/nomad/ row): add a one-line description of
  vault-nomad-auth.sh (Step 2, this PR); relabel the row header from
  "Step 0 installer scripts" to "installer scripts" since it now spans
  Step 0 + Step 2.
- lib/init/nomad/vault-nomad-auth.sh: drop the `vault` CLI from the
  binary precondition check — hvault.sh's helpers are all curl-based,
  so the CLI is never invoked. The precondition would spuriously die on
  a Nomad-client-only node that has Vault server reachable but no
  `vault` binary installed. Inline comment preserves the rationale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:58:27 +00:00
Claude
8efef9f1bb fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Wires Nomad → Vault via workload identity so jobs can exchange their
short-lived JWT for a Vault token carrying the policies in
vault/policies/ — no shared VAULT_TOKEN in job env.

- `lib/init/nomad/vault-nomad-auth.sh` — idempotent script: enable jwt
  auth at path `jwt-nomad`, config JWKS/algs, apply roles, install
  server.hcl + SIGHUP nomad on change.
- `tools/vault-apply-roles.sh` — companion sync script (S2.1 sibling);
  reads vault/roles.yaml and upserts each Vault role under
  auth/jwt-nomad/role/<name> with created/updated/unchanged semantics.
- `vault/roles.yaml` — declarative role→policy→bound_claims map; one
  entry per vault/policies/*.hcl. Keeps S2.1 policies and S2.3 role
  bindings visible side-by-side at review time.
- `nomad/server.hcl` — adds vault stanza (enabled, address,
  default_identity.aud=["vault.io"], ttl=1h).
- `lib/hvault.sh` — new `hvault_get_or_empty` helper shared between
  vault-apply-policies.sh, vault-apply-roles.sh, and vault-nomad-auth.sh;
  reads a Vault endpoint and distinguishes 200 / 404 / other.
- `vault/policies/AGENTS.md` — extends S2.1 docs with JWT-auth role
  naming convention, token shape, and the "add new service" flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:44:59 +00:00
88e49b9e9d Merge pull request 'fix: bug: hire-an-agent TOML editor corrupts existing [agents.X] block on re-run (#886)' (#891) from fix/issue-886 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 16:31:20 +00:00
37c3009a62 Merge pull request 'fix: bug: code fixes to docker/agents/ don't take effect — agent image is never rebuilt (#887)' (#892) from fix/issue-887 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 16:25:04 +00:00
Agent
cf99bdc51e fix: add tomlkit to Dockerfile for comment-preserving TOML editing (#886)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-16 16:21:07 +00:00
Claude
9ee704ea9c fix: bug: code fixes to docker/agents/ don't take effect — agent image is never rebuilt (#887)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add `pull_policy: build` to every agent service emitted by the generator
that shares `docker/agents/Dockerfile` as its build context. Without it,
`docker compose up -d --force-recreate agents-<name>` reuses the cached
`disinto/agents:latest` image and silently keeps running stale
`docker/agents/entrypoint.sh` code even after the repo is updated. This
masked PR #864 (and likely earlier merges) — the fix landed on disk but
never reached the container.

#853 already paired `build:` with `image:` on hired-agent stanzas, which
was enough for first-time ups but not for re-ups. `pull_policy: build`
tells Compose to rebuild the image on every up; BuildKit's layer cache
makes the no-change case near-instant, and the change case picks up the
new source automatically. This covers:

- TOML-driven `agents-<name>` hired via `disinto hire-an-agent` — primary
  target of the issue.
- Legacy `agents-llama` and `agents-llama-all` stanzas — same Dockerfile,
  same staleness problem.

`bin/disinto up` already passed `--build`, so operators on the supported
UX path were already covered; this closes the gap for the direct
`docker compose` path the issue explicitly names in its acceptance.

Regression test added to `tests/lib-generators.bats` to pin the directive
alongside the existing #853 build/image invariants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:08:48 +00:00
Agent
8943af4484 fix: bug: hire-an-agent TOML editor corrupts existing [agents.X] block on re-run (#886)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-16 16:00:17 +00:00
3b6325fd4f Merge pull request 'fix: [nomad-step-2] S2.1 — vault/policies/*.hcl + tools/vault-apply-policies.sh (#879)' (#888) from fix/issue-879 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 15:56:01 +00:00
c3a61dce00 Merge pull request 'fix: [nomad-step-1] deploy.sh-fix — poll deployment status not alloc status; bump timeout 120→240s (#878)' (#885) from fix/issue-878 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 15:54:58 +00:00
Claude
86807d6861 fix: collapse --dry-run flag parser to single-arg case (no while/case loop)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
CI's duplicate-detection step (sliding 5-line window) flagged 4 new
duplicate blocks shared with lib/init/nomad/cluster-up.sh — both used
the same `dry_run=false; while [ $# -gt 0 ]; do case "$1" in --dry-run)
... -h|--help) ... *) die "unknown flag: $1" ;; esac done` shape.

vault-apply-policies.sh has exactly one optional flag, so a flat
single-arg case with an `'')` no-op branch is shorter and structurally
distinct from the multi-flag while-loop parsers elsewhere in the repo.
The --help text now uses printf instead of a heredoc, which avoids the
EOF/exit/;;/die anchor that was the other half of the duplicate window.

DIFF_BASE=main .woodpecker/detect-duplicates.py now reports 0 new
duplicate blocks. Behavior unchanged: --dry-run, --help, --bogus, and
no-arg invocations all verified locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:43:46 +00:00
Agent
3734920c0c fix: [nomad-step-1] deploy.sh-fix — correct jq selectors for deployment status; add deployment ID retry
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-16 15:43:07 +00:00
Claude
2d6bdae70b fix: [nomad-step-2] S2.1 — vault/policies/*.hcl + tools/vault-apply-policies.sh (#879)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
Land the Vault ACL policies and an idempotent apply script. 18 policies:
service-{forgejo,woodpecker}, bot-{dev,review,gardener,architect,planner,
predictor,supervisor,vault,dev-qwen}, runner-{GITHUB,CODEBERG,CLAWHUB,
NPM,DOCKER_HUB}_TOKEN + runner-DEPLOY_KEY, and dispatcher.

tools/vault-apply-policies.sh diffs each file against the on-server
policy text before calling hvault_policy_apply, reporting created /
updated / unchanged per file. --dry-run prints planned names + SHA256
and makes no Vault calls.

vault/policies/AGENTS.md documents the naming convention (service-/
bot-/runner-/dispatcher), the KV path each policy grants, the rationale
for one-policy-per-runner-secret (AD-006 least-privilege at dispatch
time), and what lands in later S2.* issues (#880-#884).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:39:26 +00:00
Agent
dee05d21f8 fix: [nomad-step-1] deploy.sh-fix — poll deployment status not alloc status; bump timeout 120→240s (#878)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-16 15:29:41 +00:00
a34a478a8e Merge pull request 'fix: [nomad-step-0] S0.2-fix — install.sh must also install docker daemon (block step 1 placement) (#871)' (#876) from fix/issue-871 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 14:19:44 +00:00
15e36ec133 Merge pull request 'fix: bug: TOML-driven agent services lack FACTORY_REPO env and projects/env/state volume mounts — sidecar silently never polls (#855)' (#875) from fix/issue-855 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 14:12:11 +00:00
Claude
b77bae9c2a fix: [nomad-step-0] S0.2-fix — install.sh must also install docker daemon (block step 1 placement) (#871)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Nomad's docker task driver reports Healthy=false without a running
dockerd. On the factory dev box docker was pre-installed so Step 0's
cluster-up passed silently, but a fresh ubuntu:24.04 LXC hit "missing
drivers" placement failures the moment Step 1 tried to deploy forgejo
(the first docker-driver consumer).

Fix install.sh to also install docker.io + enable --now docker.service
when absent, and add a poll for the nomad self-node's docker driver
Detected+Healthy before declaring Step 8 done — otherwise the race
between dockerd startup and nomad driver fingerprinting lets the node
reach "ready" while docker is still unhealthy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:05:24 +00:00
Agent
41dbed030b fix: bug: TOML-driven agent services lack FACTORY_REPO env and projects/env/state volume mounts — sidecar silently never polls (#855)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
In _generate_local_model_services:
- Add FACTORY_REPO environment variable to enable factory bootstrap
- Add volume mounts for ./projects, ./.env, and ./state to provide real project TOMLs

In entrypoint.sh:
- Add validate_projects_dir() function that fails loudly if no real .toml files
  are found in the projects directory (prevents silent-zombie mode where the
  polling loop matches zero files and does nothing forever)

This fixes the issue where hired agents (via hire-an-agent) ran forever without
picking up any work because they were pinned to the baked /home/agent/disinto
directory with only *.toml.example files.
2026-04-16 13:58:22 +00:00
c48b344a48 Merge pull request 'fix: bug: generator emits ghcr.io/disinto/agents image ref but no registry pull is configured (#853)' (#874) from fix/issue-853 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 13:54:36 +00:00
Claude
a469fc7c34 fix: bug: generator emits ghcr.io/disinto/agents image ref but no registry pull is configured (#853)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The TOML-driven hired-agent services (`_generate_local_model_services` in
`lib/generators.sh`) were emitting `image: ghcr.io/disinto/agents:<tag>`
for every hired agent. The ghcr image is not publicly pullable and
deployments don't carry ghcr credentials, so `docker compose up` failed
with `denied` on every new hire. The legacy `agents-llama` stanza dodged
this because it uses the registry-less local name plus a `build:` fallback.

Fix: match the legacy stanza — emit `build: { context: ., dockerfile:
docker/agents/Dockerfile }` paired with `image: disinto/agents:<tag>`.
Hosts that built locally with `disinto init --build` will find the image;
hosts without one will build it. No ghcr auth required either way.

Added a regression test that guards both the absence of the ghcr prefix
and the presence of the build directive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:42:51 +00:00