Commit graph

388 commits

Author SHA1 Message Date
Claude
9ee704ea9c fix: bug: code fixes to docker/agents/ don't take effect — agent image is never rebuilt (#887)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add `pull_policy: build` to every agent service emitted by the generator
that shares `docker/agents/Dockerfile` as its build context. Without it,
`docker compose up -d --force-recreate agents-<name>` reuses the cached
`disinto/agents:latest` image and silently keeps running stale
`docker/agents/entrypoint.sh` code even after the repo is updated. This
masked PR #864 (and likely earlier merges) — the fix landed on disk but
never reached the container.

#853 already paired `build:` with `image:` on hired-agent stanzas, which
was enough for first-time ups but not for re-ups. `pull_policy: build`
tells Compose to rebuild the image on every up; BuildKit's layer cache
makes the no-change case near-instant, and the change case picks up the
new source automatically. This covers:

- TOML-driven `agents-<name>` hired via `disinto hire-an-agent` — primary
  target of the issue.
- Legacy `agents-llama` and `agents-llama-all` stanzas — same Dockerfile,
  same staleness problem.

`bin/disinto up` already passed `--build`, so operators on the supported
UX path were already covered; this closes the gap for the direct
`docker compose` path the issue explicitly names in its acceptance.

Regression test added to `tests/lib-generators.bats` to pin the directive
alongside the existing #853 build/image invariants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:08:48 +00:00
Agent
3734920c0c fix: [nomad-step-1] deploy.sh-fix — correct jq selectors for deployment status; add deployment ID retry
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-16 15:43:07 +00:00
Agent
dee05d21f8 fix: [nomad-step-1] deploy.sh-fix — poll deployment status not alloc status; bump timeout 120→240s (#878)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-16 15:29:41 +00:00
a34a478a8e Merge pull request 'fix: [nomad-step-0] S0.2-fix — install.sh must also install docker daemon (block step 1 placement) (#871)' (#876) from fix/issue-871 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 14:19:44 +00:00
Claude
b77bae9c2a fix: [nomad-step-0] S0.2-fix — install.sh must also install docker daemon (block step 1 placement) (#871)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Nomad's docker task driver reports Healthy=false without a running
dockerd. On the factory dev box docker was pre-installed so Step 0's
cluster-up passed silently, but a fresh ubuntu:24.04 LXC hit "missing
drivers" placement failures the moment Step 1 tried to deploy forgejo
(the first docker-driver consumer).

Fix install.sh to also install docker.io + enable --now docker.service
when absent, and add a poll for the nomad self-node's docker driver
Detected+Healthy before declaring Step 8 done — otherwise the race
between dockerd startup and nomad driver fingerprinting lets the node
reach "ready" while docker is still unhealthy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:05:24 +00:00
Agent
41dbed030b fix: bug: TOML-driven agent services lack FACTORY_REPO env and projects/env/state volume mounts — sidecar silently never polls (#855)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
In _generate_local_model_services:
- Add FACTORY_REPO environment variable to enable factory bootstrap
- Add volume mounts for ./projects, ./.env, and ./state to provide real project TOMLs

In entrypoint.sh:
- Add validate_projects_dir() function that fails loudly if no real .toml files
  are found in the projects directory (prevents silent-zombie mode where the
  polling loop matches zero files and does nothing forever)

This fixes the issue where hired agents (via hire-an-agent) ran forever without
picking up any work because they were pinned to the baked /home/agent/disinto
directory with only *.toml.example files.
2026-04-16 13:58:22 +00:00
Claude
a469fc7c34 fix: bug: generator emits ghcr.io/disinto/agents image ref but no registry pull is configured (#853)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The TOML-driven hired-agent services (`_generate_local_model_services` in
`lib/generators.sh`) were emitting `image: ghcr.io/disinto/agents:<tag>`
for every hired agent. The ghcr image is not publicly pullable and
deployments don't carry ghcr credentials, so `docker compose up` failed
with `denied` on every new hire. The legacy `agents-llama` stanza dodged
this because it uses the registry-less local name plus a `build:` fallback.

Fix: match the legacy stanza — emit `build: { context: ., dockerfile:
docker/agents/Dockerfile }` paired with `image: disinto/agents:<tag>`.
Hosts that built locally with `disinto init --build` will find the image;
hosts without one will build it. No ghcr auth required either way.

Added a regression test that guards both the absence of the ghcr prefix
and the presence of the build directive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:42:51 +00:00
Claude
564e89e445 fix: bug: generator emits invalid env var name FORGE_BOT_USER_<service>^^ when service name contains hyphen (#852)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Acceptance items 1-4 landed previously: the primary compose emission
(FORGE_BOT_USER_*) was fixed in #849 by re-keying on forge_user via
`tr 'a-z-' 'A-Z_'`, and the load-project.sh AGENT_* Python emitter was
normalized via `.upper().replace('-', '_')` in #862. Together they
produce `FORGE_BOT_USER_DEV_QWEN2` and `AGENT_DEV_QWEN2_BASE_URL` for
`[agents.dev-qwen2]` with `forge_user = "dev-qwen2"`.

This patch closes acceptance item 5 — the defence-in-depth warn-and-skip
in load-project.sh's two export loops. Hire-agent's up-front reject is
the primary line of defence (a validated `^[a-z]([a-z0-9]|-[a-z0-9])*$`
agent name can't produce a bad identifier), but a hand-edited TOML can
still smuggle invalid keys through:

- `[mirrors] my-mirror = "…"` — the `MIRROR_<NAME>` emitter only
  upper-cases, so `MY-MIRROR` retains its dash and fails `export`.
- `[agents."weird name"]` — quoted TOML keys bypass the bare-key
  grammar entirely, so spaces and other disallowed shell chars reach
  the export loop unchanged.

Before this change, either case would abort load-project.sh under
`set -euo pipefail` — the exact failure mode the original #852
crash-loop was diagnosed from. Now each loop validates `$_key` against
`^[A-Za-z_][A-Za-z0-9_]*$` and warn-skips offenders so siblings still
load.

- `lib/load-project.sh` — regex guard + WARNING on stderr in both
  `_PROJECT_VARS` and `_AGENT_VARS` export loops.
- `tests/lib-load-project.bats` — two regressions: dashed mirror key,
  quoted agent section with space. Both assert (a) the load does not
  abort and (b) sane siblings still load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:23:18 +00:00
Claude
91fdb35111 fix: Generated compose emits FORGE_BOT_USER_LLAMA — legacy name, should derive from forge_user (#849)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Key `FORGE_BOT_USER_*` on `$user_upper` (forge_user normalized with
`tr 'a-z-' 'A-Z_'`) instead of `${service_name^^}`, matching the
`FORGE_TOKEN_<FORGE_USER>` / `FORGE_PASS_<FORGE_USER>` convention two
lines above in the same emitted block. For `[agents.llama]` with
`forge_user = "dev-qwen"` this emits `FORGE_BOT_USER_DEV_QWEN: "dev-qwen"`
instead of the legacy `FORGE_BOT_USER_LLAMA`.

No external consumers read `FORGE_BOT_USER_*` today (verified via grep),
so no fallback/deprecation shim is needed — this is purely a one-site
fix at the sole producer.

Adds `tests/lib-generators.bats` as a regression guard. Follows the
existing `tests/lib-*.bats` pattern (developer-run, not CI-wired).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:58:53 +00:00
Agent
a3eb33ccf7 fix: _validate_env_vars skips Anthropic-backend agents + missing sed escaping
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- bin/disinto: Remove '[ -n "$base_url" ] || continue' guard that caused
  all Anthropic-backend agents to be silently skipped during validation.
  The base_url check is now scoped only to backend-credential selection.

- lib/hire-agent.sh: Add sed escaping for ANTHROPIC_BASE_URL value before
  sed substitution (same pattern as ANTHROPIC_API_KEY at line 256).

Fixes AI review BLOCKER and MINOR issues on PR #866.
2026-04-16 12:29:00 +00:00
Agent
53a1fe397b fix: hire-an-agent does not persist per-agent secrets to .env (#847) 2026-04-16 12:29:00 +00:00
Claude
721d7a6077 fix: bug: TOML [agents.X] section name with dash crashes load-project.sh (#862)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
TOML allows dashes in bare keys, so `[agents.dev-qwen2]` is a valid
section. Before this fix, load-project.sh derived bash var names via
Python `.upper()` alone, which kept the dash and produced
`AGENT_DEV-QWEN2_BASE_URL` — an invalid shell identifier. Under
`set -euo pipefail` the subsequent `export` aborted the whole file,
silently taking the factory down on the N+1 run after a dashed agent
was hired via `disinto hire-an-agent`.

Normalize via `.upper().replace('-', '_')` to match the
`tr 'a-z-' 'A-Z_'` convention already used by hire-agent.sh (#834)
and generators.sh (#852). Also harden hire-agent.sh to reject invalid
agent names at hire time (before any Forgejo side effects), so
unparseable TOML sections never land on disk.

- `lib/load-project.sh` — dash-to-underscore in emitted shell var names
- `lib/hire-agent.sh` — validate agent name against
  `^[a-z]([a-z0-9]|-[a-z0-9])*$` up front
- `tests/lib-load-project.bats` — regression guard covering the parse
  path and the hire-time reject path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:55:59 +00:00
Claude
802a548783 fix: disinto up silently destroys profile-gated services (#845)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
TOML-driven agent services (emitted by `_generate_local_model_services`
for every `[agents.X]` entry) carried `profiles: ["agents-<name>"]`.
With `docker compose up -d --remove-orphans` and no `COMPOSE_PROFILES`
set, compose treated the hired agent container as an orphan and removed
it on every subsequent `disinto up` — silently killing dev-qwen and any
other TOML-declared local-model agent.

The profile gate was vestigial: the `[agents.X]` TOML entry is already
the activation gate — its presence is what drives emission of the
service block in the first place (#846). Drop the profile from emitted
services so they land in the default profile and survive `disinto up`.

Also update the "To start the agent, run" hint in `hire-an-agent` from
`docker compose --profile … up -d …` to `disinto up`, matching the new
activation model.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:22:29 +00:00
Claude
9d5cbb4fa2 fix: bug: hire-an-agent does not add the new agent as collaborator on the project repo (#856)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
hire-an-agent now adds the new Forgejo user as a `write` collaborator on
`$FORGE_REPO` right after the token step, mirroring the collaborator setup
lib/forge-setup.sh applies to the canonical bot users. Without this, a
freshly hired agent's PATCH to assign itself an issue returned 403 Forbidden
and the dev-agent polled forever logging "claim lost to <none>".

issue_claim() now captures the PATCH HTTP status via `-w '%{http_code}'`
instead of swallowing failures with `curl -sf ... || return 1`. A 403 (or
any non-2xx) now surfaces a distinct log line naming the code — the missing
collaborator root cause would have been diagnosable in seconds instead of
minutes.

Also updates the lib-issue-claim bats mock to handle the new `-w` flag and
adds a regression test covering the HTTP-error log surfacing path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:47:53 +00:00
Agent
6734887a0a fix: [nomad-step-1] S1.2 — add lib/init/nomad/deploy.sh (dependency-ordered nomad job run + wait) (#841)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-16 10:36:38 +00:00
Claude
43dc86d84c fix: fix: disinto hire-an-agent + compose generator defects blocking multi-llama-dev parallel operation (#834)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Hiring a second llama-backed dev agent (e.g. `dev-qwen2`) alongside
`dev-qwen` tripped four defects that prevented safe parallel operation.

Gap 1 — hire-agent keyed per-agent token as FORGE_<ROLE>_TOKEN, so two
dev-role agents overwrote each other's token in .env. Re-key by agent
name via `tr 'a-z-' 'A-Z_'`: FORGE_TOKEN_<AGENT_UPPER>.

Gap 2 — hire-agent generated a random FORGE_PASS but never wrote it to
.env. The container's git credential helper needs both token and pass
to push over HTTPS (#361). Persist FORGE_PASS_<AGENT_UPPER> with the
same update-in-place idempotency as the token.

Gap 3 — _generate_local_model_services hardcoded FORGE_TOKEN_LLAMA for
every local-model service, forcing all hired llama agents to share one
Forgejo identity. Derive USER_UPPER from the TOML's `forge_user` field
and emit \${FORGE_TOKEN_<USER_UPPER>:-} per service.

Gap 4 — every local-model service mounted the shared `project-repos`
volume, so concurrent llama devs collided on /_factory worktree and
state/.dev-active. Switch to per-agent `project-repos-<service_name>`
and emit the matching top-level volume. Also escape embedded newlines
in `$all_vols` before the sed insertion so multi-agent volume lists
don't unterminate the substitute command.

.env.example documents the new FORGE_TOKEN_<AGENT> / FORGE_PASS_<AGENT>
naming convention (and preserves the legacy FORGE_TOKEN_LLAMA path used
by the ENABLE_LLAMA_AGENT=1 singleton build).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:55:48 +00:00
311e1926bb Merge pull request 'chore: gardener housekeeping' (#837) from chore/gardener-20260416-0838 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 08:52:37 +00:00
Claude
6533f322e3 fix: add last-reviewed watermark SHA to secret-scan safe patterns
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 08:46:00 +00:00
Claude
e9c144a511 chore: gardener housekeeping 2026-04-16
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline failed
2026-04-16 08:38:31 +00:00
Claude
620515634a fix: issue_claim race — verify assignee after PATCH to prevent duplicate work (#830)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Forgejo's assignees PATCH is last-write-wins, so two dev agents polling
concurrently could both observe .assignee == null at the pre-check, both
PATCH, and the loser would silently "succeed" and proceed to implement
the same issue — colliding at the PR/branch stage.

Re-read the assignee after the PATCH and bail out if it isn't self.
Label writes are moved AFTER this verification so a losing claim leaves
no stray in-progress label to roll back.

Adds tests/lib-issue-claim.bats covering the three paths:
  - happy path (single agent, re-read confirms self)
  - lost race (re-read shows another agent — returns 1, no labels added)
  - pre-check skip (initial GET already shows another agent)

Prerequisite for the LLAMA_BOTS parametric refactor that will run N
dev containers against the same project.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:35:18 +00:00
Claude
481175e043 fix: dedupe cluster-up.sh polling via poll_until_healthy helper (#824)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
CI duplicate-detection flagged the in-line vault + nomad polling loops
in cluster-up.sh as matching a 5-line window in vault-init.sh (the
`ready=1 / break / fi / sleep 1 / done` boilerplate).

Extracts the repeated pattern into three helpers at the top of the
file:

  - nomad_has_ready_node       wrapper so poll_until_healthy can take a
                               bare command name.
  - _die_with_service_status   shared "log + dump systemctl status +
                               die" path (factored out of the two
                               callsites + the timeout branch).
  - poll_until_healthy         ticks once per second up to TIMEOUT,
                               fail-fasts on systemd "failed" state,
                               and returns 0 on first successful check.

Step 7 (vault unseal) and Step 8 (nomad ready node) each collapse from
~15 lines of explicit for-loop bookkeeping to a one-line call. No
behavioural change: same tick cadence, same fail-fast, same status
dump on timeout. Local detect-duplicates.py run against main confirms
no new duplicates introduced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:26:54 +00:00
Claude
d2c6b33271 fix: [nomad-step-0] S0.4 — disinto init --backend=nomad --empty orchestrator (cluster-up) (#824)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
Wires S0.1–S0.3 into a single idempotent bring-up script and replaces
the S0.1 stub in _disinto_init_nomad so `disinto init --backend=nomad
--empty` produces a running empty single-node cluster on a fresh box.

lib/init/nomad/cluster-up.sh (new):
  1. install.sh                (nomad + vault binaries)
  2. systemd-nomad.sh          (unit + enable, not started)
  3. systemd-vault.sh          (unit + vault.hcl + enable)
  4. host-volume dirs under /srv/disinto/* (matching nomad/client.hcl)
  5. /etc/nomad.d/{server,client}.hcl (content-compare before write)
  6. vault-init.sh             (first-run init + unseal + persist keys)
  7. systemctl start vault     (poll until unsealed; fail-fast on
                                is-failed)
  8. systemctl start nomad     (poll until ≥1 node ready)
  9. /etc/profile.d/disinto-nomad.sh (VAULT_ADDR + NOMAD_ADDR for
                                      interactive shells)
  Re-running on a healthy box is a no-op — each sub-step is itself
  idempotent and steps 7/8 fast-path when already active + healthy.
  `--dry-run` prints the full step list and exits 0.

bin/disinto:
  - _disinto_init_nomad: replaces the S0.1 stub. Invokes cluster-up.sh
    directly (as root) or via `sudo -n` otherwise. Both `--empty` and
    the default (no flag) call cluster-up.sh today; Step 1 will branch
    on $empty to gate job deployment. --dry-run forwards through.
  - disinto_init: adds `--empty` flag parsing; rejects `--empty`
    combined with `--backend=docker` explicitly instead of silently
    ignoring it.
  - usage: documents `--empty` and drops the "stub, S0.1" annotation
    from --backend.

Closes #824.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:22:15 +00:00
Claude
24cb8f83a2 fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:53:27 +00:00
Claude
06ead3a19d fix: [nomad-step-0] S0.2 — install nomad + systemd unit + nomad/server.hcl/client.hcl (#822)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Lands the Nomad install + baseline HCL config for the single-node factory
dev box. Nothing is wired into `disinto init` yet — S0.4 does that.

- lib/init/nomad/install.sh: idempotent apt install pinned to
  NOMAD_VERSION (default 1.9.5). Adds HashiCorp apt keyring and sources
  list only if absent; fast-paths when the pinned version is already
  installed.
- lib/init/nomad/systemd-nomad.sh: writes /etc/systemd/system/nomad.service
  (rewrites only when content differs), creates /etc/nomad.d and
  /var/lib/nomad, runs `systemctl enable nomad` WITHOUT starting.
- nomad/server.hcl: single-node combined server+client role. bootstrap_expect=1,
  localhost bind, default ports pinned explicitly, UI enabled. No TLS/ACL —
  factory dev box baseline.
- nomad/client.hcl: Docker task driver (allow_privileged=false, volumes
  enabled) and host_volume pre-wiring for forgejo-data, woodpecker-data,
  agent-data, project-repos, caddy-data, chat-history, ops-repo under
  /srv/disinto/*.

Verified: `nomad config validate nomad/*.hcl` reports "Configuration is
valid!" (with expected TLS/bootstrap warnings for a dev box). Shellcheck
clean across the repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:04:02 +00:00
Claude
c236350e00 chore: gardener housekeeping 2026-04-16
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- Bump AGENTS.md watermarks to HEAD (c363ee0) across all 9 per-directory files
- supervisor/AGENTS.md: document dual-container trigger (agents + edge) and SUPERVISOR_INTERVAL env var added by P1/#801
- lib/AGENTS.md: document agents-llama-all compose service (all 7 roles) added to generators.sh by P1/#801
- pending-actions.json: comment #623 (all deps now closed, ready for planner decomposition), comment #758 (needs human Forgejo admin action to unblock ops repo writes)
2026-04-16 02:15:38 +00:00
Claude
8e885bed02 fix: [nomad-prep] P1 — run all 7 bot roles on llama backend (gates migration) (#801)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
- Add supervisor role to entrypoint.sh polling loop (SUPERVISOR_INTERVAL,
  default 20 min) and include it in default AGENT_ROLES
- Add agents-llama-all compose service (profile: agents-llama-all) with
  all 7 roles: review, dev, gardener, architect, planner, predictor, supervisor
- Add agents-llama-all to lib/generators.sh for disinto init generation
- Update docs/agents-llama.md with profile table and usage instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:52:04 +00:00
Claude
9d8f322005 fix: [nomad-prep] P7 — make disinto init idempotent + add --dry-run (#800)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Make `disinto init` safe to re-run on the same box:

- Store admin token as FORGE_ADMIN_TOKEN in .env; preserve on re-run
  (previously deleted and recreated every run, churning DB state)
- Fix human token creation: use admin_pass for basic-auth since
  human_user == admin_user (previously used a random password that
  never matched the actual user password, so HUMAN_TOKEN was never
  created successfully)
- Preserve HUMAN_TOKEN in .env on re-run (same pattern as bot tokens)
- Bot tokens were already idempotent (preserved unless --rotate-tokens)

Add --dry-run flag that reports every intended action (file writes,
API calls, docker commands) based on current state, then exits 0
without touching state. Useful for CI gating and cutover confidence.

Update smoke test:
- Add dry-run test (verifies exit 0 and no .env modification)
- Add idempotency state diff (verifies .env is unchanged on re-run)
- Verify FORGE_ADMIN_TOKEN and HUMAN_TOKEN are stored in .env

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:37:22 +00:00
Claude
14458f1f17 fix: address review — jq-safe JSON construction in hvault.sh
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- _hvault_err: use jq instead of printf to produce valid JSON on all inputs
- hvault_kv_get: use jq --arg for key lookup to prevent filter injection
- hvault_kv_put: build payload entirely via jq to properly escape keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:27:34 +00:00
Claude
fbb246c626 fix: [nomad-prep] P4 — scaffold lib/hvault.sh (HashiCorp Vault helper module) (#799)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:15:44 +00:00
Claude
f8c3ada077 fix: [nomad-prep] P10 — audit lib/ + compose for docker-backend-isms (#797)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Sites touched:
- lib/generators.sh: WOODPECKER_BACKEND_DOCKER_NETWORK now reads from
  ${WOODPECKER_CI_NETWORK:-disinto_disinto-net} so nomad jobspecs can
  override the compose-generated network name.
- lib/forge-setup.sh: bare-mode _forgejo_exec() and setup_forge() use
  ${FORGEJO_CONTAINER_NAME:-disinto-forgejo} instead of hardcoding the
  container name. Compose mode is unaffected (uses service name).

Documented exceptions (container_name directives in generators.sh
compose template output): these define names inside docker-compose.yml,
which is compose-specific output. Under nomad the generator is not used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:39:47 +00:00
Claude
b6f2d83a28 fix: use FORGE_API_BASE for /repos/migrate endpoint, build payload with jq
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- FORGE_API is repo-scoped; /repos/migrate needs the global FORGE_API_BASE
- Use jq -n --arg for safe JSON construction (no shell interpolation)
- Update docs to reference FORGE_API_BASE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:29:27 +00:00
Claude
2465841b84 fix: [nomad-prep] P8 — spot-check lib/mirrors.sh against empty Forgejo target (#796)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:22:11 +00:00
Claude
19f10e33e6 fix: [nomad-prep] P6 — externalize host paths in docker-compose via env vars (#795)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Replace hardcoded host-side bind-mount paths with env vars so Nomad
jobspecs can reuse the same variables at cutover:

- CLAUDE_BIN_DIR: path to claude CLI binary (resolved at init time)
- CLAUDE_CONFIG_FILE: path to .claude.json (default ${HOME}/.claude.json)
- CLAUDE_DIR: path to .claude directory (default ${HOME}/.claude)
- AGENT_SSH_DIR: path to SSH keys (default ${HOME}/.ssh)
- SOPS_AGE_DIR: path to SOPS age keys (default ${HOME}/.config/sops/age)

generators.sh now writes CLAUDE_BIN_DIR to .env instead of sed-replacing
CLAUDE_BIN_PLACEHOLDER in docker-compose.yml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:01:47 +00:00
Claude
8799a8c676 fix: [nomad-prep] P5 — add healthchecks to agents, edge, staging, woodpecker-agent (#794)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add Docker healthcheck blocks so Nomad check stanzas map 1:1 at migration:

- agents / agents-llama: pgrep -f entrypoint.sh (60s interval)
- woodpecker-agent: wget healthz on :3333 (30s interval)
- edge: curl Caddy admin API on :2019 (30s interval)
- staging: wget Caddy admin API on :2019 (30s interval)
- chat: add /health endpoint to server.py (no-auth 200 OK), fix
  Dockerfile HEALTHCHECK to use it, add compose-level healthcheck

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:39:35 +00:00
Claude
9dbc43ab23 fix: [nomad-prep] P3 — add load_secret() abstraction to lib/env.sh (#793)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:15:50 +00:00
Claude
88676e65ae fix: feat: consolidate secret stores — single granular secrets/*.enc, deprecate .env.vault.enc (#777)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:35:03 +00:00
a87dcdf40b Merge pull request 'chore: gardener housekeeping' (#805) from chore/gardener-20260415-1816 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 18:23:21 +00:00
Claude
0937707fe5 chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 18:16:44 +00:00
Claude
e9a018db5c fix: [nomad-prep] P0 — rename lib/vault.sh + vault/ to action-vault namespace (#792)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:16:32 +00:00
Claude
ec7bc8ff2c fix: infra: deprecate tracked docker/Caddyfilegenerate_caddyfile is the single source of truth (#771)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- Add docker/Caddyfile to .gitignore (generated artifact, not tracked)
- Document generate_caddyfile as canonical source in lib/generators.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:29:56 +00:00
c644660bda Merge pull request 'fix: infra: CI broken on main — missing WOODPECKER_PLUGINS_PRIVILEGED server env + misplaced .woodpecker/ops-filer.yml in project repo (#779)' (#782) from fix/issue-779 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 11:07:27 +00:00
91f36b2692 Merge pull request 'chore: gardener housekeeping' (#781) from chore/gardener-20260415-1007 into main
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/ops-filer Pipeline failed
2026-04-15 11:02:55 +00:00
Claude
a8d393f3bd fix: infra: CI broken on main — missing WOODPECKER_PLUGINS_PRIVILEGED server env + misplaced .woodpecker/ops-filer.yml in project repo (#779)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Part 1: Add WOODPECKER_PLUGINS_PRIVILEGED to woodpecker service environment
in lib/generators.sh, defaulting to plugins/docker, overridable via .env.
Document the new key in .env.example.

Part 2: Delete .woodpecker/ops-filer.yml from project repo — it belongs in
the ops repo and references secrets that don't exist here. Full ops-side
filer setup deferred until sprint PRs need it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:56:39 +00:00
d0c0ef724a Merge pull request 'fix: infra: agents-llama (local-Qwen dev agent) is hand-added to docker-compose.yml — move into lib/generators.sh as a flagged service (#769)' (#780) from fix/issue-769 into main
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/ops-filer Pipeline failed
2026-04-15 10:09:43 +00:00
Claude
539862679d chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 10:07:41 +00:00
Claude
0104ac06a8 fix: infra: agents-llama (local-Qwen dev agent) is hand-added to docker-compose.yml — move into lib/generators.sh as a flagged service (#769)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:58:44 +00:00
Claude
92f19cb2b3 feat: publish versioned agent images — compose should use image: not build: (#429)
- Generated compose now uses `image: ghcr.io/disinto/{agents,edge}` instead
  of `build:` directives; `disinto init --build` restores local-build mode
- Add VOLUME declarations to agents, reproduce, and edge Dockerfiles
- Add CI pipeline (.woodpecker/publish-images.yml) to build and push images
  to ghcr.io/disinto on tag events
- Mount projects/, .env, and state/ into agents container for runtime config
- Skip pre-build binary download when compose uses registry images

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:24:05 +00:00
Claude
0baac1a7d8 fix: infra: edge service missing restart: unless-stopped in lib/generators.sh (#768)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:03:26 +00:00
0db4c84818 Merge pull request 'chore: gardener housekeeping' (#767) from chore/gardener-20260415-0806 into main 2026-04-15 08:57:11 +00:00
Claude
fd9ba028bc chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 08:06:14 +00:00