4.2 KiB
Nomad Cutover Runbook
End-to-end procedure to cut over the disinto factory from docker-compose on disinto-dev-box to Nomad on disinto-nomad-box.
Target: disinto-nomad-box (10.10.10.216) becomes production; disinto-dev-box stays warm for rollback.
Downtime budget: <5 min blue-green flip.
Data scope: Forgejo issues + disinto-ops git bundle only. Everything else is regenerated or discarded. OAuth secrets are regenerated on fresh init (all sessions invalidated).
1. Pre-cutover readiness checklist
- Nomad + Vault stack healthy on a fresh wipe+init (step 5 verified)
- Codeberg mirror current —
git logparity between dev-box Forgejo and Codeberg - SSH key pair generated for nomad-box, registered on DO edge (see §4.6)
- Companion tools landed:
disinto backup create(#1057)disinto backup import(#1058)
- Backup tarball produced and tested against a scratch LXC (see §3)
2. Pre-cutover artifact: backup
On disinto-dev-box:
./bin/disinto backup create /tmp/disinto-backup-$(date +%Y%m%d).tar.gz
Copy the tarball to nomad-box (and optionally to a local workstation for safekeeping):
scp /tmp/disinto-backup-*.tar.gz nomad-box:/tmp/
3. Pre-cutover dry-run
On a throwaway LXC:
lxc launch ubuntu:24.04 cutover-dryrun
# inside the container:
disinto init --backend=nomad --import-env .env --with edge
./bin/disinto backup import /tmp/disinto-backup-*.tar.gz
Verify:
- Issue count matches source Forgejo
- disinto-ops repo refs match source bundle
Destroy the LXC once satisfied:
lxc delete cutover-dryrun --force
4. Cutover T-0 (operator executes; <5 min target)
4.1 Stop dev-box services
# On disinto-dev-box — stop, do NOT remove volumes (rollback needs them)
docker-compose stop
4.2 Provision nomad-box (if not already done)
# On disinto-nomad-box
disinto init --backend=nomad --import-env .env --with edge
4.3 Import backup
# On disinto-nomad-box
./bin/disinto backup import /tmp/disinto-backup-*.tar.gz
4.4 Configure Codeberg pull mirror
Manual, one-time step in the new Forgejo UI:
- Create a mirror repository pointing at the Codeberg upstream
- Confirm initial sync completes
4.5 Claude login
# On disinto-nomad-box
claude login
Set up Anthropic OAuth so agents can authenticate.
4.6 Autossh tunnel swap
Operator step — cross-host, no dev-agent involvement. Do NOT automate.
-
Stop the tunnel on dev-box:
# On disinto-dev-box systemctl stop reverse-tunnel -
Copy or regenerate the tunnel unit on nomad-box:
# Copy from dev-box, or let init regenerate it scp dev-box:/etc/systemd/system/reverse-tunnel.service \ nomad-box:/etc/systemd/system/ -
Register nomad-box's public key on DO edge:
# On DO edge box — same restricted-command as the dev-box key echo "<nomad-box-pubkey>" >> /home/johba/.ssh/authorized_keys -
Start the tunnel on nomad-box:
# On disinto-nomad-box systemctl enable --now reverse-tunnel -
Verify end-to-end:
curl https://self.disinto.ai/api/v1/version # Should return the new box's Forgejo version
5. Post-cutover smoke
curl https://self.disinto.ai→ Forgejo welcome page- Create a test PR → Woodpecker pipeline runs → agents assign and work
- Claude chat login via Forgejo OAuth succeeds
6. Rollback (if any step 4 gate fails)
-
Stop the tunnel on nomad-box:
systemctl stop reverse-tunnel # on nomad-box -
Restore the tunnel on dev-box:
systemctl start reverse-tunnel # on dev-box -
Bring dev-box services back up:
docker-compose up -d # on dev-box -
DO Caddy config is unchanged — traffic restores in <5 min.
-
File a post-mortem issue. Keep nomad-box state intact for debugging.
7. Post-stable cleanup (T+1 week)
docker-compose down -von dev-box- Archive
/var/lib/docker/volumes/disinto_*to cold storage - Delete disinto-dev-box LXC or keep as permanent rollback reserve (operator decision)