Picks up from abandoned PR #859 (branch fix/issue-842 @ 6408023). Two
bugs in the prior art:
1. The `--empty is only valid with --backend=nomad` guard was removed
when the `--with`/mutually-exclusive guards were added. This regressed
test #6 in tests/disinto-init-nomad.bats:102 — `disinto init
--backend=docker --empty --dry-run` was exiting 0 instead of failing.
Restored alongside the new guards.
2. `_disinto_init_nomad` unconditionally appended `--dry-run` to the
real-run deploy_cmd, so even `disinto init --backend=nomad --with
forgejo` (no --dry-run) would only echo the deploy plan instead of
actually running nomad job run. That violates the issue's acceptance
criteria ("Forgejo job deploys", "curl http://localhost:3000/api/v1/version
returns 200"). Removed.
All 17 tests in tests/disinto-init-nomad.bats now pass; shellcheck clean.
Why: disinto_init() consumed $1 as repo_url before the argparse loop ran,
so `disinto init --backend=nomad --empty` had --backend=nomad swallowed
into repo_url, backend stayed at its "docker" default, and the --empty
validation then produced the nonsense "--empty is only valid with
--backend=nomad" error — flagged during S0.1 end-to-end verification on
a fresh LXC. nomad backend takes no positional anyway; the LXC already
has the repo cloned by the operator.
Change: only consume $1 as repo_url if it doesn't start with "--", then
defer the "repo URL required" check to after argparse (so the docker
path still errors with a helpful message on a missing positional, not
"Unknown option: --backend=docker").
Verified acceptance criteria:
1. init --backend=nomad --empty → dispatches to nomad
2. init --backend=nomad --empty --dry-run → 9-step plan, exit 0
3. init <repo-url> → docker path unchanged
4. init → "repo URL required"
5. init --backend=docker → "repo URL required"
(not "Unknown option")
6. shellcheck clean
Tests: 4 new regression cases in tests/disinto-init-nomad.bats covering
flag-first nomad invocation (both --flag=value and --flag value forms),
no-args docker default, and --backend=docker missing-positional error
path. Full suite: 10/10 pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Locks in static validation for every Nomad+Vault artifact before it can
merge. Four fail-closed steps in .woodpecker/nomad-validate.yml, gated
to PRs touching nomad/, lib/init/nomad/, or bin/disinto:
1. nomad config validate nomad/server.hcl nomad/client.hcl
2. vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener
3. shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
4. bats tests/disinto-init-nomad.bats — dispatcher smoke tests
bin/disinto picks up pre-existing SC2120 warnings on three passthrough
wrappers (generate_agent_docker, generate_caddyfile, generate_staging_index);
annotated with shellcheck disable=SC2120 so the new pipeline is clean
without narrowing the warning for future code.
Pinned image versions (hashicorp/nomad:1.9.5, hashicorp/vault:1.18.5)
match lib/init/nomad/install.sh — bump both or neither.
nomad/AGENTS.md documents the stack layout, how to add a jobspec in
Step 1, how CI validates it, and the two-place version pinning rule.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires S0.1–S0.3 into a single idempotent bring-up script and replaces
the S0.1 stub in _disinto_init_nomad so `disinto init --backend=nomad
--empty` produces a running empty single-node cluster on a fresh box.
lib/init/nomad/cluster-up.sh (new):
1. install.sh (nomad + vault binaries)
2. systemd-nomad.sh (unit + enable, not started)
3. systemd-vault.sh (unit + vault.hcl + enable)
4. host-volume dirs under /srv/disinto/* (matching nomad/client.hcl)
5. /etc/nomad.d/{server,client}.hcl (content-compare before write)
6. vault-init.sh (first-run init + unseal + persist keys)
7. systemctl start vault (poll until unsealed; fail-fast on
is-failed)
8. systemctl start nomad (poll until ≥1 node ready)
9. /etc/profile.d/disinto-nomad.sh (VAULT_ADDR + NOMAD_ADDR for
interactive shells)
Re-running on a healthy box is a no-op — each sub-step is itself
idempotent and steps 7/8 fast-path when already active + healthy.
`--dry-run` prints the full step list and exits 0.
bin/disinto:
- _disinto_init_nomad: replaces the S0.1 stub. Invokes cluster-up.sh
directly (as root) or via `sudo -n` otherwise. Both `--empty` and
the default (no flag) call cluster-up.sh today; Step 1 will branch
on $empty to gate job deployment. --dry-run forwards through.
- disinto_init: adds `--empty` flag parsing; rejects `--empty`
combined with `--backend=docker` explicitly instead of silently
ignoring it.
- usage: documents `--empty` and drops the "stub, S0.1" annotation
from --backend.
Closes#824.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lands the dispatch entry point for the Nomad+Vault migration. The docker
path remains the default and is byte-for-byte unchanged. The new
`--backend=nomad` value routes to a `_disinto_init_nomad` stub that fails
loud (exit 99) so no silent misrouting can happen while S0.2–S0.5 fill in
the real implementation. With `--dry-run --backend=nomad` the stub reports
status and exits 0 so dry-run callers (P7) don't see a hard failure.
- New `--backend <value>` flag (accepts `docker` | `nomad`); supports
both `--backend nomad` and `--backend=nomad` forms.
- Invalid backend values are rejected with a clear error.
- `_disinto_init_nomad` lives next to `disinto_init` so future S0.x
issues only need to fill in this function — flag parsing and dispatch
stay frozen.
- `--help` lists the flag and both values.
- `shellcheck bin/disinto` introduces no new findings beyond the
pre-existing baseline.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make `disinto init` safe to re-run on the same box:
- Store admin token as FORGE_ADMIN_TOKEN in .env; preserve on re-run
(previously deleted and recreated every run, churning DB state)
- Fix human token creation: use admin_pass for basic-auth since
human_user == admin_user (previously used a random password that
never matched the actual user password, so HUMAN_TOKEN was never
created successfully)
- Preserve HUMAN_TOKEN in .env on re-run (same pattern as bot tokens)
- Bot tokens were already idempotent (preserved unless --rotate-tokens)
Add --dry-run flag that reports every intended action (file writes,
API calls, docker commands) based on current state, then exits 0
without touching state. Useful for CI gating and cutover confidence.
Update smoke test:
- Add dry-run test (verifies exit 0 and no .env modification)
- Add idempotency state diff (verifies .env is unchanged on re-run)
- Verify FORGE_ADMIN_TOKEN and HUMAN_TOKEN are stored in .env
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `_regen_file` helper that idempotently regenerates a file: moves
existing file aside, runs the generator, compares output byte-for-byte,
and either restores the original (preserving mtime) or keeps the new
version with a `.prev` backup.
- `disinto_up` now calls `generate_compose` and `generate_caddyfile`
before bringing the stack up, ensuring generator changes are applied.
- Pass `--build --remove-orphans` to `docker compose up -d` so image
rebuilds and orphan container cleanup happen automatically.
- Add `--no-regen` escape hatch that skips regeneration and prints a
warning for operators debugging generators or testing hand-edits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Generated compose now uses `image: ghcr.io/disinto/{agents,edge}` instead
of `build:` directives; `disinto init --build` restores local-build mode
- Add VOLUME declarations to agents, reproduce, and edge Dockerfiles
- Add CI pipeline (.woodpecker/publish-images.yml) to build and push images
to ghcr.io/disinto on tag events
- Mount projects/, .env, and state/ into agents container for runtime config
- Skip pre-build binary download when compose uses registry images
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gate /chat/* behind Forgejo OAuth2 authorization-code flow.
- Extract generic _create_forgejo_oauth_app() helper in lib/ci-setup.sh;
Woodpecker OAuth becomes a thin wrapper, chat gets its own app.
- bin/disinto init now creates TWO OAuth apps (woodpecker-ci + disinto-chat)
and writes CHAT_OAUTH_CLIENT_ID / CHAT_OAUTH_CLIENT_SECRET to .env.
- docker/chat/server.py: new routes /chat/login (→ Forgejo authorize),
/chat/oauth/callback (code→token exchange, user allowlist check, session
cookie). All other /chat/* routes require a valid session or redirect to
/chat/login. Session store is in-memory with 24h TTL.
- lib/generators.sh: pass FORGE_URL, CHAT_OAUTH_CLIENT_ID,
CHAT_OAUTH_CLIENT_SECRET, EDGE_TUNNEL_FQDN, DISINTO_CHAT_ALLOWED_USERS
to the chat container environment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix .env write in edge register to use single grep -Ev + mv pattern (not three-pass append)
- Fix register.sh to source authorized_keys.sh and call rebuild_authorized_keys directly
- Fix caddy.sh remove_route to use jq to find route index by host match
- Fix authorized_keys.sh operator precedence: { [ -z ] || [ -z ]; } && continue
- Fix install.sh Caddyfile to use { admin localhost:2019 } global options
- Fix deregister and status SSH to use StrictHostKeyChecking=accept-new
- Changed SSH StrictHostKeyChecking from 'no' to 'accept-new' for better security
- Fixed .env write logic with proper deduplication before appending
- Fixed deregister .env cleanup to use single grep pattern
- Added --domain-suffix option to install.sh
- Removed no-op DOMAIN_SUFFIX sed from install.sh
- Changed cp -n to cp for idempotent script updates
- Fixed authorized_keys.sh SCRIPT_DIR to point to lib/
- Fixed Caddy route management to use POST /routes instead of /load
- Fixed Caddy remove_route to find route by host match, not hardcoded index
Extract branch-wait retry logic into _bp_wait_for_branch helper with
exponential backoff (10 attempts, 2s base, capped at 10s per wait,
~70s worst-case). Replaces the 3-attempt/2s-fixed loops in all three
setup functions. Upgrade caller warnings in bin/disinto to ERROR.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--poll-interval was incorrectly written as compact_pct in the project TOML,
misconfiguring CLAUDE_AUTOCOMPACT_PCT_OVERRIDE instead of polling behavior.
Now compact_pct is hardcoded to 60 (the correct default) and poll_interval
is a separate TOML field emitted as POLL_INTERVAL in the compose service.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply delete-then-recreate pattern for human token (matching admin token in PR #274).
Forge/Forgejo only returns sha1 at creation time; listing returns no sha1, causing
HUMAN_TOKEN to be silently empty on re-runs when token name already exists.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Delete any existing token with the same name before creating a fresh one,
so that sha1 is always returned by the create response. The list API does
not return sha1 (Forgejo redacts it for security), making the old fallback
unreliable.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Export actual_ops_slug from setup_ops_repo via _ACTUAL_OPS_SLUG global,
then update ops_repo in the TOML in-place using Python re.sub after TOML
creation or detection. Falls back to inserting after the repo line if the
key is missing. This prevents duplicate TOML keys on repeated init runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>