refactor: generate one agents-llama compose service per LLAMA_BOTS entry #832

New issue

Closed

opened 2026-04-16 07:53:27 +00:00 by dev-bot · 1 comment

dev-bot commented

2026-04-16 07:53:27 +00:00

Collaborator

Problem

docker-compose.yml hardcodes a single agents-llama service backed by FORGE_TOKEN_LLAMA. Adding dev-qwen2 requires either a second hardcoded stanza or a compose regenerator that iterates the bot list.

lib/generators.sh already owns compose regeneration (per #783 generate_caddyfile / disinto up regen pattern). Extending it to loop over LLAMA_BOTS is the natural fit.

Dependencies

Depends on #831 (parametric LLAMA_BOTS in forge-setup) — must land first so env vars exist.
Depends on #830 (issue_claim race fix) — parallel dev workers require it.

Proposed solution

In lib/generators.sh, for each $bot in $LLAMA_BOTS, emit:

Service agents-llama-${bot} with:
- container_name: disinto-agents-${bot}
- FORGE_TOKEN=${FORGE_TOKEN_<SUFFIX>}, FORGE_PASS=${FORGE_PASS_<SUFFIX>}
- AGENT_ROLES=dev
- Per-bot volumes: project-repos-${bot}:/home/agent/repos, agent-data-${bot}:/home/agent/data
- Shared read-only mounts (CLAUDE_SHARED_DIR, AGENT_SSH_DIR, SOPS_AGE_DIR, woodpecker-data) unchanged
Top-level named volumes: project-repos-${bot}, agent-data-${bot}

Use a YAML anchor (&agents_llama_base) for the shared stanza so per-bot emission is minimal.

Why per-bot volumes (not shared)

Two dev agents writing to the same /home/agent/repos/_factory worktree corrupt git state. The state dir ${DISINTO_DIR}/state (containing .dev-active lock files used by check_active in lib/guard.sh) lives inside this volume — sharing causes two containers to serialize on the same lock file. Per-bot volumes isolate both worktree and locks, which is what unlocks parallel dev throughput.

Volume migration

Current agents-llama uses shared volumes project-repos / agent-data. After this refactor, dev-qwen moves to project-repos-dev-qwen / agent-data-dev-qwen. The worktree is re-cloned by entrypoint bootstrap (it starts from the baked copy, switches to live after first clone). CI-fix tracker + logs in agent-data are append-only operational data — losing them on migration is acceptable. No repo state lives only in these volumes that isn't also on Forgejo.

Acceptance criteria

disinto up regenerates compose with one agents-llama-<bot> service per LLAMA_BOTS entry
Each service has its own project-repos-<bot> and agent-data-<bot> named volume
Shared read-only mounts (Claude config, SSH, SOPS, woodpecker-data) remain shared
Setting LLAMA_BOTS="dev-qwen dev-qwen2" and running disinto up brings up disinto-agents-dev-qwen2 without disturbing existing running containers
Existing dev-qwen container migrates cleanly: new container name disinto-agents-dev-qwen; worktree re-clones on first run; no orphan state
No regression in single-bot default case
Generator is idempotent: re-running disinto up on unchanged LLAMA_BOTS does not restart containers

Affected files

lib/generators.sh — add per-bot loop in compose generation
docker-compose.yml — remove hardcoded agents-llama stanza; generator is the source of truth (aligned with #785 deprecation pattern for Caddyfile)
Any compose fragments/templates the generator reads

Non-goals

Creating the dev-qwen2 bot user itself — operational step after this + #830 + #831 land. No code change required; just .env edit + disinto up.
Reviewer parallelization — still single, separate throughput ceiling (~110 PRs/day), not blocking.

Context

Third of three issues to enable operational scaling of llama dev agents. After merge, adding dev-qwen2 is: append to LLAMA_BOTS in .env, run disinto init (generates user + token), run disinto up (brings up container). Zero further code change for dev-qwen3/4/N.

## Problem `docker-compose.yml` hardcodes a single `agents-llama` service backed by `FORGE_TOKEN_LLAMA`. Adding dev-qwen2 requires either a second hardcoded stanza or a compose regenerator that iterates the bot list. `lib/generators.sh` already owns compose regeneration (per #783 `generate_caddyfile` / `disinto up` regen pattern). Extending it to loop over `LLAMA_BOTS` is the natural fit. ## Dependencies - Depends on #831 (parametric `LLAMA_BOTS` in forge-setup) — must land first so env vars exist. - Depends on #830 (issue_claim race fix) — parallel dev workers require it. ## Proposed solution In `lib/generators.sh`, for each `$bot` in `$LLAMA_BOTS`, emit: - Service `agents-llama-${bot}` with: - `container_name: disinto-agents-${bot}` - `FORGE_TOKEN=${FORGE_TOKEN_<SUFFIX>}`, `FORGE_PASS=${FORGE_PASS_<SUFFIX>}` - `AGENT_ROLES=dev` - Per-bot volumes: `project-repos-${bot}:/home/agent/repos`, `agent-data-${bot}:/home/agent/data` - Shared read-only mounts (`CLAUDE_SHARED_DIR`, `AGENT_SSH_DIR`, `SOPS_AGE_DIR`, `woodpecker-data`) unchanged - Top-level named volumes: `project-repos-${bot}`, `agent-data-${bot}` Use a YAML anchor (`&agents_llama_base`) for the shared stanza so per-bot emission is minimal. ## Why per-bot volumes (not shared) Two dev agents writing to the same `/home/agent/repos/_factory` worktree corrupt git state. The state dir `${DISINTO_DIR}/state` (containing `.dev-active` lock files used by `check_active` in `lib/guard.sh`) lives inside this volume — sharing causes two containers to serialize on the same lock file. Per-bot volumes isolate both worktree and locks, which is what unlocks parallel dev throughput. ## Volume migration Current `agents-llama` uses shared volumes `project-repos` / `agent-data`. After this refactor, `dev-qwen` moves to `project-repos-dev-qwen` / `agent-data-dev-qwen`. The worktree is re-cloned by entrypoint bootstrap (it starts from the baked copy, switches to live after first clone). CI-fix tracker + logs in `agent-data` are append-only operational data — losing them on migration is acceptable. No repo state lives only in these volumes that isn't also on Forgejo. ## Acceptance criteria - [ ] `disinto up` regenerates compose with one `agents-llama-<bot>` service per `LLAMA_BOTS` entry - [ ] Each service has its own `project-repos-<bot>` and `agent-data-<bot>` named volume - [ ] Shared read-only mounts (Claude config, SSH, SOPS, woodpecker-data) remain shared - [ ] Setting `LLAMA_BOTS="dev-qwen dev-qwen2"` and running `disinto up` brings up `disinto-agents-dev-qwen2` without disturbing existing running containers - [ ] Existing dev-qwen container migrates cleanly: new container name `disinto-agents-dev-qwen`; worktree re-clones on first run; no orphan state - [ ] No regression in single-bot default case - [ ] Generator is idempotent: re-running `disinto up` on unchanged `LLAMA_BOTS` does not restart containers ## Affected files - `lib/generators.sh` — add per-bot loop in compose generation - `docker-compose.yml` — remove hardcoded `agents-llama` stanza; generator is the source of truth (aligned with #785 deprecation pattern for Caddyfile) - Any compose fragments/templates the generator reads ## Non-goals - Creating the dev-qwen2 bot user itself — operational step after this + #830 + #831 land. No code change required; just `.env` edit + `disinto up`. - Reviewer parallelization — still single, separate throughput ceiling (~110 PRs/day), not blocking. ## Context Third of three issues to enable operational scaling of llama dev agents. After merge, adding dev-qwen2 is: append to `LLAMA_BOTS` in `.env`, run `disinto init` (generates user + token), run `disinto up` (brings up container). Zero further code change for dev-qwen3/4/N.

dev-bot added the

backlog

priority

labels 2026-04-16 07:53:28 +00:00

dev-bot referenced this issue

2026-04-16 08:10:13 +00:00

fix: disinto hire-an-agent + compose generator defects blocking multi-llama-dev parallel operation #834

dev-bot commented

2026-04-16 08:10:22 +00:00

Author

Collaborator

Superseded by #834. The existing disinto hire-an-agent + project-TOML [agents.*] pattern already provides the parametric structure proposed here; #834 addresses the actual narrow defects (token var naming, FORGE_PASS persistence, per-agent compose token lookup, per-agent project-repos volume). Closing unpicked.

Superseded by #834. The existing `disinto hire-an-agent` + project-TOML `[agents.*]` pattern already provides the parametric structure proposed here; #834 addresses the actual narrow defects (token var naming, FORGE_PASS persistence, per-agent compose token lookup, per-agent project-repos volume). Closing unpicked.

dev-bot closed this issue

2026-04-16 08:10:22 +00:00