Sprint: versioned agent images

Vision issues

#429 — feat: publish versioned agent images — compose should use image: not build:

What this enables

After this sprint, disinto init produces a docker-compose.yml that pulls a pinned image from a registry instead of building from source. A new factory instance needs only a token and a config file — no clone, no build, no local Docker context. This closes the gap between "works on my machine" and "one-command bootstrap."

It also enables rollback: if agents misbehave after an upgrade, AGENTS_IMAGE=v0.1.1 disinto up restores the previous version without touching the codebase.

What exists today

The release pipeline is more complete than it looks:

formulas/release.toml — 7-step release formula. Steps 4-5 already build and tag the image locally (docker compose build --no-cache agents, docker tag disinto-agents disinto-agents:$RELEASE_VERSION). The gap: no push step, no registry target.
lib/release.sh — Creates vault TOML and ops repo PR for the release. No image version wired into compose generation.
lib/generators.sh _generate_compose_impl() — Generates compose with build: context: . dockerfile: docker/agents/Dockerfile for agents, runner, reproduce, edge. Version-unaware.
vault/vault-env.sh — DOCKER_HUB_TOKEN is in VAULT_ALLOWED_SECRETS. Not currently used.
docker/agents/Dockerfile — No VOLUME declarations; runtime state, repos, and config are mounted via compose but not declared. Claude binary injected by compose at init time.

Complexity

Files touched: 4

formulas/release.toml — add push-image step (after tag-image, before restart-agents)
lib/generators.sh — _generate_compose_impl() reads AGENTS_IMAGE env var; emits image: when set, falls back to build: when not set (dev mode)
docker/agents/Dockerfile — add explicit VOLUME declarations for /home/agent/data, /home/agent/repos, /home/agent/disinto/projects, /home/agent/disinto/state
bin/disinto disinto_up() — pass AGENTS_IMAGE through to compose if set in .env

Subsystems: release formula, compose generation, Dockerfile hygiene Sub-issues: 3 Gluecode ratio: ~80% gluecode (release step, VOLUME declarations), ~20% new (AGENTS_IMAGE env var path)

Risks

Registry credentials: DOCKER_HUB_TOKEN is in vault allowlist but not wired up. The push step needs a registry login — either Docker Hub (DOCKER_HUB_TOKEN) or GHCR (GITHUB_TOKEN, already in vault). The sprint spec must pick one and add the credential to the release vault TOML.
Volume shadow: if VOLUME declarations don't match the compose volume mounts exactly, runtime files land in anonymous volumes instead of named ones. Must test before shipping.
Existing deployments: currently on build:. Migration: set AGENTS_IMAGE in .env, re-run disinto init (compose is regenerated), restart. No SSH, no worktree needed.
runner service: same image as agents, same version. Must update runner service in compose gen too.

Cost — new infra to maintain

Registry account + token rotation: one vault secret (DOCKER_HUB_TOKEN) needs rotation policy. GHCR (via GITHUB_TOKEN) has no additional account but ties release to GitHub.
Release formula grows from 7 to 8 steps. Small maintenance surface.
AGENTS_IMAGE becomes a documented env var in .env for pinned deployments. Needs docs.

Recommendation

Worth it. The release formula is 90% done — one push step closes the gap. The compose generation change is purely additive (AGENTS_IMAGE env var, fallback to build: for dev). Volume declarations are hygiene that should exist regardless of versioning.

Pick GHCR over Docker Hub: GITHUB_TOKEN is already in the vault allowlist and ops repo. No new account needed.

3.7 KiB Raw Blame History