disinto-ops/sprints/versioned-agent-images.md

70 lines
3.7 KiB
Markdown
Raw Normal View History

2026-04-09 08:31:46 +00:00
# Sprint: versioned agent images
## Vision issues
- #429 — feat: publish versioned agent images — compose should use image: not build:
## What this enables
After this sprint, `disinto init` produces a `docker-compose.yml` that pulls a pinned image
from a registry instead of building from source. A new factory instance needs only a token
and a config file — no clone, no build, no local Docker context. This closes the gap between
"works on my machine" and "one-command bootstrap."
It also enables rollback: if agents misbehave after an upgrade, `AGENTS_IMAGE=v0.1.1 disinto up`
restores the previous version without touching the codebase.
## What exists today
The release pipeline is more complete than it looks:
- `formulas/release.toml` — 7-step release formula. Steps 4-5 already build and tag the image
locally (`docker compose build --no-cache agents`, `docker tag disinto-agents disinto-agents:$RELEASE_VERSION`).
The gap: no push step, no registry target.
- `lib/release.sh` — Creates vault TOML and ops repo PR for the release. No image version wired
into compose generation.
- `lib/generators.sh` `_generate_compose_impl()` — Generates compose with `build: context: .
dockerfile: docker/agents/Dockerfile` for agents, runner, reproduce, edge. Version-unaware.
- `vault/vault-env.sh``DOCKER_HUB_TOKEN` is in `VAULT_ALLOWED_SECRETS`. Not currently used.
- `docker/agents/Dockerfile` — No VOLUME declarations; runtime state, repos, and config are
mounted via compose but not declared. Claude binary injected by compose at init time.
## Complexity
Files touched: 4
- `formulas/release.toml` — add `push-image` step (after tag-image, before restart-agents)
- `lib/generators.sh``_generate_compose_impl()` reads `AGENTS_IMAGE` env var; emits
`image:` when set, falls back to `build:` when not set (dev mode)
- `docker/agents/Dockerfile` — add explicit VOLUME declarations for /home/agent/data,
/home/agent/repos, /home/agent/disinto/projects, /home/agent/disinto/state
- `bin/disinto` `disinto_up()` — pass `AGENTS_IMAGE` through to compose if set in `.env`
Subsystems: release formula, compose generation, Dockerfile hygiene
Sub-issues: 3
Gluecode ratio: ~80% gluecode (release step, VOLUME declarations), ~20% new (AGENTS_IMAGE env var path)
## Risks
- Registry credentials: `DOCKER_HUB_TOKEN` is in vault allowlist but not wired up. The push step
needs a registry login — either Docker Hub (DOCKER_HUB_TOKEN) or GHCR (GITHUB_TOKEN, already
in vault). The sprint spec must pick one and add the credential to the release vault TOML.
- Volume shadow: if VOLUME declarations don't match the compose volume mounts exactly, runtime
files land in anonymous volumes instead of named ones. Must test before shipping.
- Existing deployments: currently on `build:`. Migration: set AGENTS_IMAGE in .env, re-run
`disinto init` (compose is regenerated), restart. No SSH, no worktree needed.
- `runner` service: same image as agents, same version. Must update runner service in compose gen too.
## Cost — new infra to maintain
- Registry account + token rotation: one vault secret (DOCKER_HUB_TOKEN) needs rotation policy.
GHCR (via GITHUB_TOKEN) has no additional account but ties release to GitHub.
- Release formula grows from 7 to 8 steps. Small maintenance surface.
- `AGENTS_IMAGE` becomes a documented env var in .env for pinned deployments. Needs docs.
## Recommendation
Worth it. The release formula is 90% done — one push step closes the gap. The compose
generation change is purely additive (AGENTS_IMAGE env var, fallback to build: for dev).
Volume declarations are hygiene that should exist regardless of versioning.
Pick GHCR over Docker Hub: GITHUB_TOKEN is already in the vault allowlist and ops repo.
No new account needed.