bug: code fixes to docker/agents/ don't take effect — agent image is never rebuilt #887

Closed
opened 2026-04-16 15:36:03 +00:00 by dev-bot · 0 comments
Collaborator

Problem

When a PR lands that modifies docker/agents/entrypoint.sh (or any baked-in file under docker/agents/), the running agent containers continue to execute the old code. The disinto/agents:latest image on the host was built at some past point and is never rebuilt when the repo is updated.

docker-compose.yml for TOML-driven agent services uses image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest} — no build: directive (see also #853). No registry pipeline publishes fresh images. No local rebuild is triggered by disinto up or hire-an-agent. Result: landed fixes never reach agent containers unless an operator manually runs docker compose build or equivalent.

Repro

  1. PR #864 merged the #861 entrypoint-ordering fix (added an early-parse block that logs Parsed PROJECT_NAME=...).
  2. disinto hire-an-agent dev-qwen2 dev ... regenerates compose, which still references ghcr.io/disinto/agents:latest.
  3. docker compose --profile agents-dev-qwen2 up -d agents-dev-qwen2 — container starts from the pre-fix image.
  4. docker exec disinto-agents-dev-qwen2 grep 'Parsed PROJECT_NAME' /entrypoint.sh → not present.
  5. md5sum /home/johba/disinto/docker/agents/entrypoint.sh and docker exec ... md5sum /entrypoint.sh return DIFFERENT hashes.

Fix for #861 is thus a no-op in production until someone remembers to rebuild. Agent containers run indefinitely on stale code.

Fix options

Option A: auto-build in generator

lib/generators.sh should emit build: { context: ., dockerfile: docker/agents/Dockerfile } alongside image:. docker compose up with both directives will rebuild when local files differ. Covers both:

  • Fresh hire: disinto hire-an-agentdocker compose --profile X up rebuilds if needed.
  • Existing agents: subsequent docker compose up on any change rebuilds.

Option B: CI pipeline publishes to ghcr

Woodpecker job on merge to main rebuilds ghcr.io/disinto/agents:latest and pushes. disinto up pulls fresh. Requires ghcr auth configured on disinto-dev-box. Heavier lift; touches #853 (currently registry is inaccessible, causing pull failures — workaround: docker tag disinto/agents:latest ghcr.io/disinto/agents:latest).

Option C: disinto up runs rebuild

bin/disinto up performs docker compose build agents agents-<name> before up. Simple but slow.

Recommend Option A as the minimum useful step; it sidesteps #853 and matches the pattern of the legacy hardcoded agents service (which does have build:).

Acceptance

  • A code change to docker/agents/entrypoint.sh is reflected in running agent containers after an operator runs docker compose --profile agents-<name> up -d --force-recreate agents-<name> (no manual docker build needed)
  • No change to hire-an-agent UX (still just hire → up)
  • Existing pre-built ghcr image still works as a fallback if available

Affected files

  • lib/generators.sh — emit build: directive alongside image: for _generate_local_model_services
  • docker-compose.yml template (if any) — ensure both directives present
  • bin/disinto — if Option C path chosen, add docker compose build step to up

Context

Caught today while chasing why #861 didn't unblock dev-qwen2 despite being merged. Host code had the fix; container didn't. Two hours of debugging a non-existent bug in code that was actually fixed but not deployed. This also explains why other landed fixes (e.g. #856 collaborator auto-add) may only work for NEW operations — any existing code paths baked into an older image are still broken until rebuild.

Related: #853 (ghcr image ref with no pull auth) — these two together mean that the only way a fix reaches agent containers today is an undocumented manual rebuild + retag sequence.

## Problem When a PR lands that modifies `docker/agents/entrypoint.sh` (or any baked-in file under `docker/agents/`), the running agent containers continue to execute the old code. The `disinto/agents:latest` image on the host was built at some past point and is never rebuilt when the repo is updated. `docker-compose.yml` for TOML-driven agent services uses `image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest}` — no `build:` directive (see also #853). No registry pipeline publishes fresh images. No local rebuild is triggered by `disinto up` or `hire-an-agent`. Result: landed fixes never reach agent containers unless an operator manually runs `docker compose build` or equivalent. ## Repro 1. PR #864 merged the #861 entrypoint-ordering fix (added an early-parse block that logs `Parsed PROJECT_NAME=...`). 2. `disinto hire-an-agent dev-qwen2 dev ...` regenerates compose, which still references `ghcr.io/disinto/agents:latest`. 3. `docker compose --profile agents-dev-qwen2 up -d agents-dev-qwen2` — container starts from the pre-fix image. 4. `docker exec disinto-agents-dev-qwen2 grep 'Parsed PROJECT_NAME' /entrypoint.sh` → not present. 5. `md5sum /home/johba/disinto/docker/agents/entrypoint.sh` and `docker exec ... md5sum /entrypoint.sh` return DIFFERENT hashes. Fix for #861 is thus a no-op in production until someone remembers to rebuild. Agent containers run indefinitely on stale code. ## Fix options ### Option A: auto-build in generator `lib/generators.sh` should emit `build: { context: ., dockerfile: docker/agents/Dockerfile }` alongside `image:`. `docker compose up` with both directives will rebuild when local files differ. Covers both: - Fresh hire: `disinto hire-an-agent` → `docker compose --profile X up` rebuilds if needed. - Existing agents: subsequent `docker compose up` on any change rebuilds. ### Option B: CI pipeline publishes to ghcr Woodpecker job on merge to main rebuilds `ghcr.io/disinto/agents:latest` and pushes. `disinto up` pulls fresh. Requires ghcr auth configured on disinto-dev-box. Heavier lift; touches #853 (currently registry is inaccessible, causing pull failures — workaround: `docker tag disinto/agents:latest ghcr.io/disinto/agents:latest`). ### Option C: `disinto up` runs rebuild `bin/disinto up` performs `docker compose build agents agents-<name>` before `up`. Simple but slow. Recommend Option A as the minimum useful step; it sidesteps #853 and matches the pattern of the legacy hardcoded `agents` service (which does have `build:`). ## Acceptance - [ ] A code change to `docker/agents/entrypoint.sh` is reflected in running agent containers after an operator runs `docker compose --profile agents-<name> up -d --force-recreate agents-<name>` (no manual `docker build` needed) - [ ] No change to hire-an-agent UX (still just `hire → up`) - [ ] Existing pre-built ghcr image still works as a fallback if available ## Affected files - `lib/generators.sh` — emit `build:` directive alongside `image:` for `_generate_local_model_services` - `docker-compose.yml` template (if any) — ensure both directives present - `bin/disinto` — if Option C path chosen, add `docker compose build` step to `up` ## Context Caught today while chasing why #861 didn't unblock dev-qwen2 despite being merged. Host code had the fix; container didn't. Two hours of debugging a non-existent bug in code that was actually fixed but not deployed. This also explains why other landed fixes (e.g. #856 collaborator auto-add) may only work for NEW operations — any existing code paths baked into an older image are still broken until rebuild. Related: #853 (ghcr image ref with no pull auth) — these two together mean that the only way a fix reaches agent containers today is an undocumented manual rebuild + retag sequence.
dev-bot added the
backlog
priority
labels 2026-04-16 15:36:04 +00:00
dev-bot self-assigned this 2026-04-16 16:00:05 +00:00
dev-bot added
in-progress
and removed
backlog
labels 2026-04-16 16:00:05 +00:00
dev-bot removed their assignment 2026-04-16 16:25:06 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#887
No description provided.