disinto-ops/sprints/versioned-agent-images.md

# Sprint: versioned agent images

## Vision issues
- #429 — feat: publish versioned agent images — compose should use image: not build:

## What this enables
After this sprint, `disinto init` produces a `docker-compose.yml` that pulls a pinned image
from a registry instead of building from source. A new factory instance needs only a token
and a config file — no clone, no build, no local Docker context. This closes the gap between
"works on my machine" and "one-command bootstrap."

It also enables rollback: if agents misbehave after an upgrade, `AGENTS_IMAGE=v0.1.1 disinto up`
restores the previous version without touching the codebase.

## What exists today

The release pipeline is more complete than it looks:

- `formulas/release.toml` — 7-step release formula. Steps 4-5 already build and tag the image
  locally (`docker compose build --no-cache agents`, `docker tag disinto-agents disinto-agents:$RELEASE_VERSION`).
  The gap: no push step, no registry target.
- `lib/release.sh` — Creates vault TOML and ops repo PR for the release. No image version wired
  into compose generation.
- `lib/generators.sh` `_generate_compose_impl()` — Generates compose with `build: context: .
  dockerfile: docker/agents/Dockerfile` for agents, runner, reproduce, edge. Version-unaware.
- `vault/vault-env.sh` — `DOCKER_HUB_TOKEN` is in `VAULT_ALLOWED_SECRETS`. Not currently used.
- `docker/agents/Dockerfile` — No VOLUME declarations; runtime state, repos, and config are
  mounted via compose but not declared. Claude binary injected by compose at init time.

## Complexity

Files touched: 4
- `formulas/release.toml` — add `push-image` step (after tag-image, before restart-agents)
- `lib/generators.sh` — `_generate_compose_impl()` reads `AGENTS_IMAGE` env var; emits
  `image:` when set, falls back to `build:` when not set (dev mode)
- `docker/agents/Dockerfile` — add explicit VOLUME declarations for /home/agent/data,
  /home/agent/repos, /home/agent/disinto/projects, /home/agent/disinto/state
- `bin/disinto` `disinto_up()` — pass `AGENTS_IMAGE` through to compose if set in `.env`

Subsystems: release formula, compose generation, Dockerfile hygiene
Sub-issues: 3
Gluecode ratio: ~80% gluecode (release step, VOLUME declarations), ~20% new (AGENTS_IMAGE env var path)

## Risks

- Registry credentials: `DOCKER_HUB_TOKEN` is in vault allowlist but not wired up. The push step
  needs a registry login — either Docker Hub (DOCKER_HUB_TOKEN) or GHCR (GITHUB_TOKEN, already
  in vault). The sprint spec must pick one and add the credential to the release vault TOML.
- Volume shadow: if VOLUME declarations don't match the compose volume mounts exactly, runtime
  files land in anonymous volumes instead of named ones. Must test before shipping.
- Existing deployments: currently on `build:`. Migration: set AGENTS_IMAGE in .env, re-run
  `disinto init` (compose is regenerated), restart. No SSH, no worktree needed.
- `runner` service: same image as agents, same version. Must update runner service in compose gen too.

## Cost — new infra to maintain

- Registry account + token rotation: one vault secret (DOCKER_HUB_TOKEN) needs rotation policy.
  GHCR (via GITHUB_TOKEN) has no additional account but ties release to GitHub.
- Release formula grows from 7 to 8 steps. Small maintenance surface.
- `AGENTS_IMAGE` becomes a documented env var in .env for pinned deployments. Needs docs.

## Recommendation

Worth it. The release formula is 90% done — one push step closes the gap. The compose
generation change is purely additive (AGENTS_IMAGE env var, fallback to build: for dev).
Volume declarations are hygiene that should exist regardless of versioning.

Pick GHCR over Docker Hub: GITHUB_TOKEN is already in the vault allowlist and ops repo.
No new account needed.
sprint: add versioned-agent-images.md 2026-04-09 08:31:46 +00:00			`# Sprint: versioned agent images`

			`## Vision issues`
			`- #429 — feat: publish versioned agent images — compose should use image: not build:`

			`## What this enables`
			After this sprint, `disinto init` produces a `docker-compose.yml` that pulls a pinned image
			`from a registry instead of building from source. A new factory instance needs only a token`
			`and a config file — no clone, no build, no local Docker context. This closes the gap between`
			`"works on my machine" and "one-command bootstrap."`

			It also enables rollback: if agents misbehave after an upgrade, `AGENTS_IMAGE=v0.1.1 disinto up`
			`restores the previous version without touching the codebase.`

			`## What exists today`

			`The release pipeline is more complete than it looks:`

			- `formulas/release.toml` — 7-step release formula. Steps 4-5 already build and tag the image
			locally (`docker compose build --no-cache agents`, `docker tag disinto-agents disinto-agents:$RELEASE_VERSION`).
			`The gap: no push step, no registry target.`
			- `lib/release.sh` — Creates vault TOML and ops repo PR for the release. No image version wired
			`into compose generation.`
			- `lib/generators.sh` `_generate_compose_impl()` — Generates compose with `build: context: .
			dockerfile: docker/agents/Dockerfile` for agents, runner, reproduce, edge. Version-unaware.
			- `vault/vault-env.sh` — `DOCKER_HUB_TOKEN` is in `VAULT_ALLOWED_SECRETS`. Not currently used.
			- `docker/agents/Dockerfile` — No VOLUME declarations; runtime state, repos, and config are
			`mounted via compose but not declared. Claude binary injected by compose at init time.`

			`## Complexity`

			`Files touched: 4`
			- `formulas/release.toml` — add `push-image` step (after tag-image, before restart-agents)
			- `lib/generators.sh` — `_generate_compose_impl()` reads `AGENTS_IMAGE` env var; emits
			`image:` when set, falls back to `build:` when not set (dev mode)
			- `docker/agents/Dockerfile` — add explicit VOLUME declarations for /home/agent/data,
			`/home/agent/repos, /home/agent/disinto/projects, /home/agent/disinto/state`
			- `bin/disinto` `disinto_up()` — pass `AGENTS_IMAGE` through to compose if set in `.env`

			`Subsystems: release formula, compose generation, Dockerfile hygiene`
			`Sub-issues: 3`
			`Gluecode ratio: ~80% gluecode (release step, VOLUME declarations), ~20% new (AGENTS_IMAGE env var path)`

			`## Risks`

			- Registry credentials: `DOCKER_HUB_TOKEN` is in vault allowlist but not wired up. The push step
			`needs a registry login — either Docker Hub (DOCKER_HUB_TOKEN) or GHCR (GITHUB_TOKEN, already`
			`in vault). The sprint spec must pick one and add the credential to the release vault TOML.`
			`- Volume shadow: if VOLUME declarations don't match the compose volume mounts exactly, runtime`
			`files land in anonymous volumes instead of named ones. Must test before shipping.`
			- Existing deployments: currently on `build:`. Migration: set AGENTS_IMAGE in .env, re-run
			`disinto init` (compose is regenerated), restart. No SSH, no worktree needed.
			- `runner` service: same image as agents, same version. Must update runner service in compose gen too.

			`## Cost — new infra to maintain`

			`- Registry account + token rotation: one vault secret (DOCKER_HUB_TOKEN) needs rotation policy.`
			`GHCR (via GITHUB_TOKEN) has no additional account but ties release to GitHub.`
			`- Release formula grows from 7 to 8 steps. Small maintenance surface.`
			- `AGENTS_IMAGE` becomes a documented env var in .env for pinned deployments. Needs docs.

			`## Recommendation`

			`Worth it. The release formula is 90% done — one push step closes the gap. The compose`
			`generation change is purely additive (AGENTS_IMAGE env var, fallback to build: for dev).`
			`Volume declarations are hygiene that should exist regardless of versioning.`

			`Pick GHCR over Docker Hub: GITHUB_TOKEN is already in the vault allowlist and ops repo.`
			`No new account needed.`