fix: disinto hire-an-agent + compose generator defects blocking multi-llama-dev parallel operation #834

New issue

Closed

opened 2026-04-16 08:10:13 +00:00 by dev-bot · 0 comments

dev-bot commented

2026-04-16 08:10:13 +00:00

Collaborator

Problem

disinto hire-an-agent already owns the provisioning pipeline for new agents (user + token + .profile repo + project-TOML [agents.*] section + compose regeneration). When used to hire a second llama-backed dev agent (e.g. dev-qwen2) alongside the existing dev-qwen, four defects prevent safe parallel operation.

Supersedes

Replaces earlier speculative refactor issues #831 and #832, which proposed a new LLAMA_BOTS env list + generator loop. The existing hire-an-agent + project-TOML [agents.*] pattern already provides that structure — only the four defects below need fixing.

Gap 1: token env-var name collides on role, not agent

lib/hire-agent.sh:~176 saves the per-agent token as FORGE_${ROLE_UPPER}_TOKEN (role-keyed). Hiring dev-qwen2 with role dev overwrites FORGE_DEV_TOKEN that any other dev-role agent is using.

Fix: key by agent name.

agent_upper=$(echo "$agent_name" | tr 'a-z-' 'A-Z_')
token_var="FORGE_TOKEN_${agent_upper}"
pass_var="FORGE_PASS_${agent_upper}"

Gap 2: FORGE_PASS is never persisted

user_pass is generated locally in lib/hire-agent.sh but only the token is written to .env. The container's git credential helper (configured in docker/agents/entrypoint.sh) needs both FORGE_TOKEN_* and FORGE_PASS_* to pass HTTPS auth for git push.

Fix: write FORGE_PASS_${agent_upper}=${user_pass} to .env alongside the token, same idempotency rules (update-in-place if already present).

Gap 3: compose token lookup is singleton

lib/generators.sh:118-119 hardcodes FORGE_TOKEN_LLAMA as the only lookup for every local-model agent service:

FORGE_TOKEN: ${FORGE_TOKEN_LLAMA:-${FORGE_TOKEN:-}}
FORGE_PASS:  ${FORGE_PASS_LLAMA:-${FORGE_PASS:-}}

This means multiple hired llama agents would all share the same Forgejo identity. The parsed TOML section already exposes forge_user. Replace with per-agent lookup:

FORGE_TOKEN: ${FORGE_TOKEN_<USER_UPPER>:-}
FORGE_PASS:  ${FORGE_PASS_<USER_UPPER>:-}

where USER_UPPER is derived from forge_user with the same tr 'a-z-' 'A-Z_' convention as Gap 1.

Gap 4: project-repos volume is shared across agents

lib/generators.sh:~110 mounts project-repos:/home/agent/repos for every local-model agent service. Two hired llama devs collide on:

/home/agent/repos/_factory — git worktree corruption if both writing concurrently
${DISINTO_DIR}/state/.dev-active — lock file read by check_active in lib/guard.sh, serializing the two agents and defeating the point of adding a second worker

Fix: use a per-agent named volume.

volumes:
  - project-repos-${service_name}:/home/agent/repos

and emit the named volume in the top-level volumes: block next to the existing agents-${service_name}-data pattern.

Existing dev-qwen single-agent deployment migrates in place on next disinto up: volume renames from shared project-repos to project-repos-dev-qwen; entrypoint bootstrap re-clones worktree from the baked copy; CI-fix tracker + logs in agent-data are already per-agent (volume name already includes ${service_name}), so no migration there. No permanent state lives only in the old shared volume.

Dependencies

#830 (issue_claim race fix) — required for safe parallel dev-poll.

Acceptance criteria

hire-an-agent writes FORGE_TOKEN_<AGENT_UPPER> and FORGE_PASS_<AGENT_UPPER> to .env, never role-keyed
Re-running hire-an-agent for an existing agent preserves its token (idempotent per #800) or rotates only with explicit flag
_generate_local_model_services looks up per-agent FORGE_TOKEN / FORGE_PASS from forge_user → <USER_UPPER>
Each agents-<name> service has a dedicated project-repos-<name> named volume
Hiring a second llama dev succeeds end-to-end: disinto hire-an-agent dev-qwen2 dev --local-model http://10.10.10.1:8081 --model unsloth/Qwen3.5-35B-A3B + disinto up brings up disinto-agents-dev-qwen2 without disturbing the running disinto-agents-dev-qwen
Existing single-agent dev-qwen deployment migrates cleanly on first disinto up after the change (worktree re-clones, no stuck state)
.env.example documents the FORGE_TOKEN_<AGENT> / FORGE_PASS_<AGENT> naming convention

Affected files

lib/hire-agent.sh — token/pass var naming (around line 176), persist FORGE_PASS
lib/generators.sh — per-agent FORGE_TOKEN / FORGE_PASS lookup (lines 118-119), per-agent project-repos volume (line ~110 + top-level volume list)
.env.example — document the new var naming

Non-goals

New LLAMA_BOTS env list — not needed; project TOML [agents.*] sections already enumerate agents
Reviewer parallelization — separate throughput ceiling, not blocking
Anthropic-backed dev-bot — unchanged; continues to use shared FORGE_TOKEN

Context

The dev-qwen llama dev agent merged 35 PRs in the last 24h. GPU is <1% utilized — the bottleneck is serial dev-poll → PR → review → merge handoff, not compute. llama-server runs with --parallel 4 (3 slots idle). Scaling to 2 llama devs is expected to roughly double throughput.

Operational rollout after this + #830 land:

disinto hire-an-agent dev-qwen2 dev \
  --local-model http://10.10.10.1:8081 \
  --model unsloth/Qwen3.5-35B-A3B
disinto up

No code change required at that point.

## Problem `disinto hire-an-agent` already owns the provisioning pipeline for new agents (user + token + `.profile` repo + project-TOML `[agents.*]` section + compose regeneration). When used to hire a second llama-backed dev agent (e.g. `dev-qwen2`) alongside the existing `dev-qwen`, four defects prevent safe parallel operation. ## Supersedes Replaces earlier speculative refactor issues #831 and #832, which proposed a new `LLAMA_BOTS` env list + generator loop. The existing `hire-an-agent` + project-TOML `[agents.*]` pattern already provides that structure — only the four defects below need fixing. ## Gap 1: token env-var name collides on role, not agent `lib/hire-agent.sh:~176` saves the per-agent token as `FORGE_${ROLE_UPPER}_TOKEN` (role-keyed). Hiring `dev-qwen2` with role `dev` overwrites `FORGE_DEV_TOKEN` that any other `dev`-role agent is using. Fix: key by agent name. ```bash agent_upper=$(echo "$agent_name" | tr 'a-z-' 'A-Z_') token_var="FORGE_TOKEN_${agent_upper}" pass_var="FORGE_PASS_${agent_upper}" ``` ## Gap 2: FORGE_PASS is never persisted `user_pass` is generated locally in `lib/hire-agent.sh` but only the token is written to `.env`. The container's git credential helper (configured in `docker/agents/entrypoint.sh`) needs both `FORGE_TOKEN_*` and `FORGE_PASS_*` to pass HTTPS auth for git push. Fix: write `FORGE_PASS_${agent_upper}=${user_pass}` to `.env` alongside the token, same idempotency rules (update-in-place if already present). ## Gap 3: compose token lookup is singleton `lib/generators.sh:118-119` hardcodes `FORGE_TOKEN_LLAMA` as the only lookup for every local-model agent service: ```yaml FORGE_TOKEN: ${FORGE_TOKEN_LLAMA:-${FORGE_TOKEN:-}} FORGE_PASS: ${FORGE_PASS_LLAMA:-${FORGE_PASS:-}} ``` This means multiple hired llama agents would all share the same Forgejo identity. The parsed TOML section already exposes `forge_user`. Replace with per-agent lookup: ```yaml FORGE_TOKEN: ${FORGE_TOKEN_<USER_UPPER>:-} FORGE_PASS: ${FORGE_PASS_<USER_UPPER>:-} ``` where `USER_UPPER` is derived from `forge_user` with the same `tr 'a-z-' 'A-Z_'` convention as Gap 1. ## Gap 4: project-repos volume is shared across agents `lib/generators.sh:~110` mounts `project-repos:/home/agent/repos` for every local-model agent service. Two hired llama devs collide on: - `/home/agent/repos/_factory` — git worktree corruption if both writing concurrently - `${DISINTO_DIR}/state/.dev-active` — lock file read by `check_active` in `lib/guard.sh`, serializing the two agents and defeating the point of adding a second worker Fix: use a per-agent named volume. ```yaml volumes: - project-repos-${service_name}:/home/agent/repos ``` and emit the named volume in the top-level `volumes:` block next to the existing `agents-${service_name}-data` pattern. Existing `dev-qwen` single-agent deployment migrates in place on next `disinto up`: volume renames from shared `project-repos` to `project-repos-dev-qwen`; entrypoint bootstrap re-clones worktree from the baked copy; CI-fix tracker + logs in `agent-data` are already per-agent (volume name already includes `${service_name}`), so no migration there. No permanent state lives only in the old shared volume. ## Dependencies - #830 (`issue_claim` race fix) — required for safe parallel dev-poll. ## Acceptance criteria - [ ] `hire-an-agent` writes `FORGE_TOKEN_<AGENT_UPPER>` and `FORGE_PASS_<AGENT_UPPER>` to `.env`, never role-keyed - [ ] Re-running `hire-an-agent` for an existing agent preserves its token (idempotent per #800) or rotates only with explicit flag - [ ] `_generate_local_model_services` looks up per-agent `FORGE_TOKEN` / `FORGE_PASS` from `forge_user` → `<USER_UPPER>` - [ ] Each `agents-<name>` service has a dedicated `project-repos-<name>` named volume - [ ] Hiring a second llama dev succeeds end-to-end: `disinto hire-an-agent dev-qwen2 dev --local-model http://10.10.10.1:8081 --model unsloth/Qwen3.5-35B-A3B` + `disinto up` brings up `disinto-agents-dev-qwen2` without disturbing the running `disinto-agents-dev-qwen` - [ ] Existing single-agent dev-qwen deployment migrates cleanly on first `disinto up` after the change (worktree re-clones, no stuck state) - [ ] `.env.example` documents the `FORGE_TOKEN_<AGENT>` / `FORGE_PASS_<AGENT>` naming convention ## Affected files - `lib/hire-agent.sh` — token/pass var naming (around line 176), persist `FORGE_PASS` - `lib/generators.sh` — per-agent `FORGE_TOKEN` / `FORGE_PASS` lookup (lines 118-119), per-agent `project-repos` volume (line ~110 + top-level volume list) - `.env.example` — document the new var naming ## Non-goals - New `LLAMA_BOTS` env list — not needed; project TOML `[agents.*]` sections already enumerate agents - Reviewer parallelization — separate throughput ceiling, not blocking - Anthropic-backed `dev-bot` — unchanged; continues to use shared `FORGE_TOKEN` ## Context The `dev-qwen` llama dev agent merged 35 PRs in the last 24h. GPU is <1% utilized — the bottleneck is serial dev-poll → PR → review → merge handoff, not compute. llama-server runs with `--parallel 4` (3 slots idle). Scaling to 2 llama devs is expected to roughly double throughput. Operational rollout after this + #830 land: ```bash disinto hire-an-agent dev-qwen2 dev \ --local-model http://10.10.10.1:8081 \ --model unsloth/Qwen3.5-35B-A3B disinto up ``` No code change required at that point.