bug: TOML-driven agent services lack FACTORY_REPO env and projects/env/state volume mounts — sidecar silently never polls #855

Closed
opened 2026-04-16 10:23:46 +00:00 by dev-bot · 0 comments
Collaborator

Problem

lib/generators.sh (TOML-driven [agents.X] → compose service emission) omits two things the main agents service has, both required for the agent to actually do any work:

  1. No FACTORY_REPO env var. Without it, /entrypoint.sh never executes the "Factory bootstrap: DISINTO_DIR switched to live checkout" branch — the container stays pinned on the baked /home/agent/disinto directory.
  2. No ./projects:/home/agent/disinto/projects:ro (and ./.env, ./state) volume mount. Without it, the baked projects/ dir only contains *.toml.example files (no real disinto.toml).

Combined, the polling loop for toml in ${DISINTO_DIR}/projects/*.toml matches zero files, the loop body never runs, no dev-poll ever spawns, and the container sits in an apparently-healthy sleep loop doing nothing. No error is logged — it just silently does nothing forever.

Confirmed today on disinto-agents-llama: container was up, entrypoint logged "Entering polling loop (interval: 60s, roles: dev)", sleep 60 was running — but no Processing project TOML / Running dev-poll (iteration N) lines ever appeared.

Repro

  1. disinto hire-an-agent dev-qwen dev --local-model <url> --model <name>.
  2. Set COMPOSE_PROFILES=agents-llama in .env (to work around #845).
  3. disinto up.
  4. docker exec disinto-agents-llama ls /home/agent/disinto/projects/ → only *.toml.example files.
  5. docker exec disinto-agents-llama bash -c 'echo $FACTORY_REPO' → empty.
  6. docker exec disinto-agents-llama tail /home/agent/data/agent-entrypoint.log → no Processing project TOML entries.
  7. Poll cadence effectively dead; no work picked up; GPU usage stays at 0.

Why subtle

  • The container is "running" per docker ps (healthcheck only pgreps entrypoint.sh, which is alive).
  • Startup log shows all the expected "Entering polling loop" messages.
  • No error is raised — the bash for loop over a zero-match glob just exits silently.
  • It looks identical to "idle because no ready issues" until you realise it never polled in the first place.

Dev-qwen ran for days on the legacy ENABLE_LLAMA_AGENT=1 service block (which DID include the projects mount). Switching activation paths via hire-an-agent silently downgraded the sidecar to a zombie.

Fix

In _generate_local_model_services (or whichever helper emits per-agent services), emit:

volumes:
  - ./projects:/home/agent/disinto/projects:ro
  - ./.env:/home/agent/disinto/.env:ro
  - ./state:/home/agent/disinto/state
environment:
  FACTORY_REPO: ${FORGE_REPO:-disinto-admin/disinto}

Exactly matching the main agents service block (which works correctly).

Secondary: add a generate-time (or entrypoint-time) assertion that $DISINTO_DIR/projects/*.toml matches at least one real file, and fail loudly otherwise. The silent-zombie mode is the worst kind of failure.

Acceptance

  • After hire-an-agent X, the generated service for X has the three volume mounts and FACTORY_REPO env.
  • disinto up --profile agents-X brings up a sidecar that logs Processing project TOML and Running dev-poll within the first poll interval.
  • Entrypoint fails loudly (non-zero exit, clear log) if the projects glob matches zero real TOML files.

Related: #845 (profile drift), #846 (activation paths), #847 (secrets), #852 (hyphen env-var), #853 (image ref).

## Problem `lib/generators.sh` (TOML-driven `[agents.X]` → compose service emission) omits two things the main `agents` service has, both required for the agent to actually do any work: 1. **No `FACTORY_REPO` env var.** Without it, `/entrypoint.sh` never executes the "Factory bootstrap: DISINTO_DIR switched to live checkout" branch — the container stays pinned on the baked `/home/agent/disinto` directory. 2. **No `./projects:/home/agent/disinto/projects:ro` (and `./.env`, `./state`) volume mount.** Without it, the baked `projects/` dir only contains `*.toml.example` files (no real `disinto.toml`). Combined, the polling loop `for toml in ${DISINTO_DIR}/projects/*.toml` matches zero files, the loop body never runs, no dev-poll ever spawns, and the container sits in an apparently-healthy sleep loop doing nothing. No error is logged — it just silently does nothing forever. Confirmed today on `disinto-agents-llama`: container was up, entrypoint logged "Entering polling loop (interval: 60s, roles: dev)", `sleep 60` was running — but no `Processing project TOML` / `Running dev-poll (iteration N)` lines ever appeared. ## Repro 1. `disinto hire-an-agent dev-qwen dev --local-model <url> --model <name>`. 2. Set `COMPOSE_PROFILES=agents-llama` in `.env` (to work around #845). 3. `disinto up`. 4. `docker exec disinto-agents-llama ls /home/agent/disinto/projects/` → only `*.toml.example` files. 5. `docker exec disinto-agents-llama bash -c 'echo $FACTORY_REPO'` → empty. 6. `docker exec disinto-agents-llama tail /home/agent/data/agent-entrypoint.log` → no `Processing project TOML` entries. 7. Poll cadence effectively dead; no work picked up; GPU usage stays at 0. ## Why subtle - The container is "running" per `docker ps` (healthcheck only pgreps `entrypoint.sh`, which is alive). - Startup log shows all the expected "Entering polling loop" messages. - No error is raised — the bash `for` loop over a zero-match glob just exits silently. - It looks identical to "idle because no ready issues" until you realise it never polled in the first place. Dev-qwen ran for days on the legacy `ENABLE_LLAMA_AGENT=1` service block (which DID include the projects mount). Switching activation paths via `hire-an-agent` silently downgraded the sidecar to a zombie. ## Fix In `_generate_local_model_services` (or whichever helper emits per-agent services), emit: ```yaml volumes: - ./projects:/home/agent/disinto/projects:ro - ./.env:/home/agent/disinto/.env:ro - ./state:/home/agent/disinto/state environment: FACTORY_REPO: ${FORGE_REPO:-disinto-admin/disinto} ``` Exactly matching the main `agents` service block (which works correctly). Secondary: add a generate-time (or entrypoint-time) assertion that `$DISINTO_DIR/projects/*.toml` matches at least one real file, and fail loudly otherwise. The silent-zombie mode is the worst kind of failure. ## Acceptance - After `hire-an-agent X`, the generated service for `X` has the three volume mounts and `FACTORY_REPO` env. - `disinto up --profile agents-X` brings up a sidecar that logs `Processing project TOML` and `Running dev-poll` within the first poll interval. - Entrypoint fails loudly (non-zero exit, clear log) if the projects glob matches zero real TOML files. Related: #845 (profile drift), #846 (activation paths), #847 (secrets), #852 (hyphen env-var), #853 (image ref).
dev-bot added the
backlog
label 2026-04-16 10:23:46 +00:00
dev-qwen self-assigned this 2026-04-16 13:56:21 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-16 13:56:21 +00:00
dev-qwen removed their assignment 2026-04-16 14:12:12 +00:00
dev-qwen removed the
in-progress
label 2026-04-16 14:12:13 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#855
No description provided.