Compare commits

..

258 commits

Author SHA1 Message Date
d368f904fb fix: edge-control: install.sh seeds empty allowlist — every register breaks until admin populates it, with no install-time warning (#1110) (#1113)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
Co-authored-by: dev-qwen <dev-qwen@disinto.local>
Co-committed-by: dev-qwen <dev-qwen@disinto.local>
2026-04-21 13:21:21 +00:00
ba961efc96 Merge pull request 'fix: edge-control: audit log silently never writes — file mode 0640 + group disinto-register denies the writer (#1109)' (#1112) from fix/issue-1109 into main
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
2026-04-21 13:21:18 +00:00
4ffeb97486 Merge pull request 'chore: gardener housekeeping' (#1108) from chore/gardener-20260421-1208 into main
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/nomad-validate Pipeline is pending
2026-04-21 13:20:50 +00:00
Claude
1c8916d28a fix: edge-control: audit log silently never writes — file mode 0640 + group disinto-register denies the writer (#1109)
Some checks are pending
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline is running
Change log file mode from 0640 to 0660 so the disinto-register group
(which the writer runs under) has write permission. Apply the same fix
to the logrotate create directive so rotated files remain writable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 12:19:25 +00:00
Claude
0946ca9828 chore: gardener housekeeping 2026-04-21
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-21 12:08:35 +00:00
0d61819184 Merge pull request 'fix: edge-control: deregister has no ownership check — any authorized SSH key can take over any project (#1091)' (#1101) from fix/issue-1091-ownership-check into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-21 06:04:13 +00:00
9bfa2a40ae Merge pull request 'chore: gardener housekeeping' (#1102) from chore/gardener-20260421-0553 into main
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/nomad-validate Pipeline is pending
2026-04-21 06:03:03 +00:00
Claude
fd98e0e3b3 chore: gardener housekeeping 2026-04-21
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-21 05:53:43 +00:00
Claude
d15ebf2bd1 fix: edge-control: deregister has no ownership check — any authorized SSH key can take over any project (#1091)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- do_deregister now accepts <project> <pubkey> and verifies the caller's
  pubkey matches the stored pubkey before removing the registration
- Returns {"error":"pubkey mismatch"} on failure without revealing the
  stored pubkey
- dispatch in main() updated to parse pubkey from deregister command args
- bin/disinto deregister subcommand reads tunnel_key.pub and sends it
  as ownership proof over SSH

The user-facing CLI (disinto edge deregister <project>) is unchanged —
the pubkey is automatically sourced from secrets/tunnel_key.pub.
2026-04-21 05:51:35 +00:00
19ead14ede Merge pull request 'chore: gardener housekeeping' (#1100) from chore/gardener-20260420-2348 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 23:53:36 +00:00
Claude
6466af87da chore: gardener housekeeping 2026-04-20
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-20 23:48:12 +00:00
2c5fb6abc2 Merge pull request 'fix: edge-control: per-caller attribution for register/deregister (#1094)' (#1098) from fix/issue-1094 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 20:04:57 +00:00
Claude
1835750b0d fix: edge-control: per-caller attribution for register/deregister (#1094)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
- register.sh parses --as <tag> from forced-command argv, stores as
  registered_by in registry entries (defaults to "unknown")
- allocate_port() accepts optional registered_by parameter
- list output includes registered_by for each tunnel
- deregister response includes deregistered_by
- install.sh accepts --admin-tag <name> (defaults to "admin") and wires
  it into the forced-command entry as --as <tag>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 19:57:36 +00:00
47fb08524c Merge pull request 'fix: edge-control: append-only audit log for register/deregister operations (#1095)' (#1099) from fix/issue-1095 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 19:49:44 +00:00
Agent
5ddf379191 fix: edge-control: append-only audit log for register/deregister operations (#1095)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
Every successful register/deregister appends one line to
/var/log/disinto/edge-register.log with space-separated key=value pairs:

  2026-04-20T14:30:12Z register   project=myproj port=20034 pubkey_fp=SHA256:… caller=alice
  2026-04-20T14:31:55Z deregister project=myproj port=20034 pubkey_fp=SHA256:… caller=alice

- Log dir /var/log/disinto/ created by install.sh (root:disinto-register, 0750)
- Log file created at install time (0640, root:disinto-register)
- Logrotate: daily rotation, 30 days retention, copytruncate
- Write failures emit a warning but do not fail the operation
- Caller derived from SSH_USERNAME > SUDO_USER > USER env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 19:42:10 +00:00
2fd4da6b64 Merge pull request 'fix: edge-control: admin-approved allowlist for project names (#1092)' (#1097) from fix/issue-1092 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 19:33:44 +00:00
Agent
f9b88a4922 fix: edge-control: allow duplicate hash for common shell control-flow (#1092)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
Add a8bdb7f1 to ALLOWED_HASHES — this is a common shell pattern
(return 1 / fi / fi / return 0 / }) that legitimately appears in both
lib/env.sh and tools/edge-control/register.sh. It is not copy-paste.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 19:24:21 +00:00
Agent
d055bc3a3a fix: edge-control: admin-approved allowlist for project names (#1092)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/edge-subpath Pipeline was successful
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 19:17:03 +00:00
3116293d8e Merge pull request 'fix: edge-control: reserved name list and stricter DNS-label validation in register (#1093)' (#1096) from fix/issue-1093 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 19:15:31 +00:00
johba
2195e9ff46 ci: retrigger woodpecker pipeline
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
Config fetch timed out against Forgejo at 18:50Z (context deadline
exceeded during a box load spike); no pipeline was created and no CI
status was posted on PR #1096. Empty commit to re-kick the build.
2026-04-20 19:00:38 +00:00
Claude
4187756059 fix: edge-control: reserved name list and stricter DNS-label validation in register (#1093)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 18:49:49 +00:00
65df00ea6a Merge pull request 'fix: vision(#623): scope Claude chat working directory to project staging checkout (#1027)' (#1089) from fix/issue-1027-1 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 18:26:41 +00:00
dev-qwen2
7f1f8fa01c fix: vision(#623): scope Claude chat working directory to project staging checkout (#1027)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- server.py: add CHAT_WORKSPACE_DIR env var, set cwd to workspace
  and use --permission-mode acceptEdits + append message in Claude invocations
- lib/generators.sh: add workspace bind mount and env var to compose generator
- nomad/jobs/chat.hcl: add workspace host volume (static source "chat-workspace"),
  meta block + NOMAD_META_ env var, volume_mount — Nomad-compatible pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 18:11:33 +00:00
a330db9537 Merge pull request 'fix: tools/edge-control/verify-chat-sandbox.sh targets deleted disinto-chat container (#1087)' (#1090) from fix/issue-1087 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 17:57:00 +00:00
Agent
750981529b fix: tools/edge-control/verify-chat-sandbox.sh targets deleted disinto-chat container (#1087)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 17:46:35 +00:00
d1867bd877 Merge pull request 'chore: gardener housekeeping' (#1088) from chore/gardener-20260420-1719 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 17:29:29 +00:00
Claude
f782f6be3a chore: gardener housekeeping 2026-04-20
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-20 17:19:23 +00:00
0483e2b7d1 Merge pull request 'fix: feat: drop chat rate-limiting — remove per-user hour/day request caps and token cap (reverts #711) (#1084)' (#1086) from fix/issue-1084 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 16:41:51 +00:00
dev-qwen2
6745736a0f fix: remove CHAT_MAX_* rate-limit env vars from generate_compose() (#1084)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-20 16:28:33 +00:00
dev-qwen2
f28c8000bb fix: feat: drop chat rate-limiting — remove per-user hour/day request caps and token cap (reverts #711) (#1084)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-20 16:12:11 +00:00
bcf7db93b1 Merge pull request 'fix: feat: merge chat container into edge — run chat server inside edge container with full permissions (reverts sandbox from #706) (#1083)' (#1085) from fix/issue-1083 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 15:59:42 +00:00
Claude
686b1c2d40 fix: update AGENTS.md and sync rate-limit env vars in static compose
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- AGENTS.md line 45: reflect chat merged into edge (no standalone Dockerfile/entrypoint)
- docker-compose.yml: add CHAT_MAX_REQUESTS_PER_HOUR/DAY and CHAT_MAX_TOKENS_PER_DAY
  to match generators.sh (advisory from review)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 15:47:03 +00:00
Claude
83176c5f28 fix: feat: merge chat container into edge — run chat server inside edge container with full permissions (reverts sandbox from #706) (#1083)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 15:47:03 +00:00
42e9cae6f8 Merge pull request 'fix: feat: configure Forgejo ROOT_URL for /forge/ subpath routing (#1080)' (#1082) from fix/issue-1080 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 15:36:16 +00:00
398a7398a9 Merge pull request 'fix: fix: strip /staging prefix in Caddyfile before proxying to staging container (#1079)' (#1081) from fix/issue-1079 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 15:22:27 +00:00
Agent
02f8e13f33 fix: feat: configure Forgejo ROOT_URL for /forge/ subpath routing (#1080)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Move FORGEJO_ROOT_URL and WOODPECKER_HOST configuration to BEFORE
generate_compose so the .env file is available for variable substitution.

When EDGE_TUNNEL_FQDN is set with subpath routing mode, the .env file
now gets FORGEJO_ROOT_URL=https://<fqdn>/forge/ written before
docker-compose.yml is generated, ensuring the subpath is included in
the generated compose file.

This fixes the 404 on /forge/ by ensuring Forgejo's ROOT_URL includes
the /forge/ prefix so its internal router recognizes the subpath.

The Caddyfile already correctly does NOT strip the prefix - it passes
the full /forge/... path to forgejo:3000.
2026-04-20 15:13:01 +00:00
dev-qwen2
d17754efab fix: fix: strip /staging prefix in Caddyfile before proxying to staging container (#1079)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-20 15:09:34 +00:00
abca547dcc Merge pull request 'fix: vision(#623): WebSocket streaming for chat UI to replace one-shot claude --print (#1026)' (#1076) from fix/issue-1026 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 11:48:10 +00:00
Agent
01f7d061bc fix: WebSocket streaming - address all AI review findings (#1076)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Fixes identified in AI review:
- Blocker #1: Server now handles chat_request WebSocket frames and invokes Claude
- Blocker #2: accept_connection() uses self.headers from BaseHTTPRequestHandler
- Blocker #3: handle_websocket_upgrade() uses asyncio.open_connection() for proper StreamWriter
- Medium #4: _decode_frame() uses readexactly() for all fixed-length reads
- Medium #5: Message queue cleaned up on disconnect in handle_connection() finally block
- Low #6: WebSocket close code corrected from 768 to 1000
- Low #7: _send_close() and _send_pong() are now async with proper await

Changes:
- Added _handle_chat_request() method to invoke Claude within WebSocket coroutine
- Fixed _send_close() to use struct.pack for correct close code (1000)
- Made _send_pong() async with proper await
- Updated handle_connection() to call async close/pong methods and cleanup queue
- Fixed handle_websocket_upgrade() to pass Sec-WebSocket-Key from HTTP headers
- Replaced create_connection() with open_connection() for proper reader/writer
2026-04-20 11:36:27 +00:00
Agent
17e745376d fix: vision(#623): WebSocket streaming for chat UI to replace one-shot claude --print (#1026) 2026-04-20 11:36:27 +00:00
aa87639356 Merge pull request 'fix: vision(#623): automate subdomain fallback pivot if subpath routing fails (#1028)' (#1078) from fix/issue-1028 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 11:28:44 +00:00
Claude
78a295f567 fix: vision(#623): automate subdomain fallback pivot if subpath routing fails (#1028)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 11:12:20 +00:00
89c0a65453 Merge pull request 'fix: vision(#623): end-to-end subpath routing smoke test for Forgejo + Woodpecker + chat (#1025)' (#1063) from fix/issue-1025-3 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 11:01:13 +00:00
48ce3edb4b fix: convert bash array to POSIX for-loop in caddyfile-routing-test
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Step ran in alpine:3.19 with default /bin/sh (busybox ash) which does not
support bash array syntax. REQUIRED_HANDLERS=(...) + "${ARR[@]}" failed
with "syntax error: unexpected (".

Inlined the handler list into a single space-separated for-loop that works
under POSIX sh. No behavioral change; same 6 handlers checked.

Fixes edge-subpath/caddyfile-routing-test exit 2 on pipelines targeting
fix/issue-1025-3 — see #1025.
2026-04-20 10:47:12 +00:00
181f82dfd0 fix: use workspace-relative path for rendered Caddyfile in edge-subpath pipeline
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
Woodpecker mounts the workspace dir across steps in a workflow; /tmp does not
persist between step containers. render-caddyfile was writing to
/tmp/edge-render/Caddyfile.rendered which caddy-validate could not read
(caddy: no such file or directory).

Changed all /tmp/edge-render references to edge-render (workspace-relative).

Fixes edge-subpath/caddy-validate exit 1 on pipelines targeting
fix/issue-1025-3 — see #1025.
2026-04-20 10:44:17 +00:00
a620e296de Merge pull request 'fix: fix: collect-engagement.sh never commits evidence to ops repo — data silently lost (#982)' (#1075) from fix/issue-982 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 08:54:08 +00:00
88aca4a064 Merge pull request 'fix: bug: disinto init --backend=nomad — does not bootstrap Forgejo admin user (#1069)' (#1073) from fix/issue-1069 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 08:46:33 +00:00
Agent
253dd7c6ff fix: fix: collect-engagement.sh never commits evidence to ops repo — data silently lost (#982)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-20 08:44:05 +00:00
1a24e79fb5 Merge pull request 'fix: fix: re-seed ops repo directories after branch protection resolved (#820)' (#1074) from fix/issue-820 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-20 08:39:18 +00:00
dev-qwen2
95bacbbfa4 fix: resolve all CI review blockers for forgejo admin bootstrap (#1069)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-20 08:35:40 +00:00
Agent
6673c0efff fix: fix: re-seed ops repo directories after branch protection resolved (#820)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-20 08:23:01 +00:00
dev-qwen2
a7bcb96935 fix: correct MD5 hashes for forgejo-bootstrap.sh duplicate detection (#1069)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-20 08:21:31 +00:00
85e6907dc3 fix: rename logging helpers in test-caddyfile-routing.sh to avoid dup-detection
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline was successful
log_info / log_pass / log_fail / log_section were copied verbatim from
tests/smoke-edge-subpath.sh and triggered ci.duplicate-detection with 3
collision hashes. Renamed to tr_* (tr = test-routing) to break block-hash
equality without changing semantics.

43 call sites updated. No behavioral change.

Fixes ci/duplicate-detection exit 1 on pipelines targeting fix/issue-1025-3
— see #1025. A proper shared lib/test-helpers.sh is a better long-term
solution but out of scope here.
2026-04-20 08:11:08 +00:00
7763facb11 fix: add curl to apk install in caddy-validate step
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/pr/ci Pipeline is pending
ci/woodpecker/pr/edge-subpath Pipeline is pending
ci/woodpecker/pr/smoke-init Pipeline is pending
The step runs `curl -sS -o /tmp/caddy ...` to download the caddy binary
but only installs ca-certificates. curl is not in alpine:3.19 base image.
Adding curl to the apk add line so the download actually runs.

Fixes edge-subpath/caddy-validate exit 127 (command not found) on
pipelines targeting fix/issue-1025-3 — see #1025.
2026-04-20 08:10:58 +00:00
dev-qwen2
23e47e3820 fix: bug: disinto init --backend=nomad — does not bootstrap Forgejo admin user (#1069)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-20 08:06:06 +00:00
49190359b8 Merge pull request 'fix: bug: deploy.sh 360s still too tight for chat cold-start + cascade-skip masks edge/vault-runner (#1070)' (#1071) from fix/issue-1070 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 08:04:27 +00:00
f89d22dd39 Merge pull request 'fix: bug: disinto backup import — schema mismatch with create; 0 issues imported (#1068)' (#1072) from fix/issue-1068 into main
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
2026-04-20 08:01:51 +00:00
Agent
4c6d545060 fix: bug: disinto backup import — schema mismatch with create; 0 issues imported (#1068)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-20 07:58:25 +00:00
Claude
d1a026c702 fix: deploy.sh 360s still too tight for chat cold-start + cascade-skip masks edge/vault-runner (#1070)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
Two changes:
- Set JOB_READY_TIMEOUT_CHAT=600 (chat cold-start takes ~5-6 min on fresh LXC)
- On deploy timeout/failure, log WARNING and continue submitting remaining jobs
  instead of dying immediately; print final health summary with failed jobs list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 07:56:30 +00:00
fbd66dd4ea Merge pull request 'chore: gardener housekeeping 2026-04-20' (#1067) from chore/gardener-20260420-0625 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 06:33:32 +00:00
Claude
f4ff202c55 chore: gardener housekeeping 2026-04-20
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-20 06:25:42 +00:00
88222503d5 Merge pull request 'chore: gardener housekeeping 2026-04-20' (#1066) from chore/gardener-20260420-0021 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-20 00:25:30 +00:00
Claude
91841369f4 chore: gardener housekeeping 2026-04-20
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-20 00:21:20 +00:00
343b928a26 Merge pull request 'fix: tool: disinto backup import — idempotent restore on fresh Nomad cluster (#1058)' (#1064) from fix/issue-1058 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 21:35:46 +00:00
Agent
99fe90ae27 fix: tool: disinto backup import — idempotent restore on fresh Nomad cluster (#1058)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-19 21:28:02 +00:00
3aa521509a Merge pull request 'fix: docs: nomad-cutover-runbook.md — end-to-end cutover procedure (#1060)' (#1065) from fix/issue-1060 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-19 21:01:03 +00:00
Claude
2c7c8d0b38 fix: docs: nomad-cutover-runbook.md — end-to-end cutover procedure (#1060)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:50:45 +00:00
ec4e608827 Merge pull request 'fix: tool: disinto backup create — export Forgejo issues + disinto-ops git bundle (#1057)' (#1062) from fix/issue-1057 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 20:43:54 +00:00
dev-qwen2
6b81e2a322 fix: simplify pipeline trigger to pull_request event only
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/edge-subpath Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-19 20:40:57 +00:00
dev-qwen2
ae8eb09ee7 fix: correct Woodpecker when clause syntax for path filters
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/edge-subpath Pipeline failed
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/edge-subpath Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
2026-04-19 20:31:36 +00:00
Claude
cb8c131bc4 fix: clear EXIT trap before return to avoid unbound $tmpdir under set -u
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:29:44 +00:00
Claude
c287ec0626 fix: tool: disinto backup create — export Forgejo issues + disinto-ops git bundle (#1057)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:29:44 +00:00
dev-qwen2
1a1ae0b629 fix: shellcheck unreachable code warnings in smoke script
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/edge-subpath Pipeline failed
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/edge-subpath Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
2026-04-19 20:28:32 +00:00
dev-qwen2
5b46acb0b9 fix: vision(#623): end-to-end subpath routing smoke test for Forgejo + Woodpecker + chat (#1025) 2026-04-19 20:28:20 +00:00
449611e6df Merge pull request 'fix: bug: disinto-woodpecker-agent unhealthy; step logs truncated on short-duration failures (#1044)' (#1061) from fix/issue-1044 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 20:19:27 +00:00
9f365e40c0 Merge pull request 'fix: bug: claude_run_with_watchdog leaks orphan bash children — review-pr.sh lock stuck for 47 min when Claude Bash-tool command hangs (#1055)' (#1056) from fix/issue-1055 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-19 20:12:19 +00:00
Agent
e90ff4eb7b fix: bug: disinto-woodpecker-agent unhealthy; step logs truncated on short-duration failures (#1044)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add gRPC keepalive settings to maintain stable connections between
woodpecker-agent and woodpecker-server:

- WOODPECKER_GRPC_KEEPALIVE_TIME=10s: Send ping every 10s to detect
  stale connections before they timeout
- WOODPECKER_GRPC_KEEPALIVE_TIMEOUT=20s: Allow 20s for ping response
  before marking connection dead
- WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS=true: Keep connection
  alive even during idle periods between workflows

Also reduce Nomad healthcheck interval from 15s to 10s for faster
detection of agent failures.

These settings address the "queue: task canceled" and "wait(): code:
Unknown" gRPC errors that were causing step logs to be truncated when
the agent-server connection dropped mid-stream.
2026-04-19 20:09:04 +00:00
441e2a366d Merge pull request 'fix: Compose generator should detect duplicate service names at generate-time (#850)' (#1053) from fix/issue-850-4 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-19 20:02:27 +00:00
dev-qwen2
f878427866 fix: bug: claude_run_with_watchdog leaks orphan bash children — review-pr.sh lock stuck for 47 min when Claude Bash-tool command hangs (#1055)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Fixes orphan process issue by:

1. lib/agent-sdk.sh: Use setsid to run claude in a new process group
   - All children of claude inherit this process group
   - Changed all kill calls to target the process group with -PID syntax
   - Affected lines: setsid invocation, SIGTERM kill, SIGKILL kill, watchdog cleanup

2. review/review-pr.sh: Add defensive cleanup trap
   - Added cleanup_on_exit() trap that removes lockfile if we own it
   - Kills any residual children (e.g., bash -c from Claude's Bash tool)
   - Added explicit lockfile removal on all early-exit paths
   - Added lockfile removal on successful completion

3. tests/test-watchdog-process-group.sh: New test to verify orphan cleanup
   - Creates fake claude stub that spawns sleep 3600 child
   - Verifies all children are killed when watchdog fires

Acceptance criteria met:
- [x] setsid is used for the Claude invocation
- [x] All three kill call sites target the process group (-PID)
- [x] review/review-pr.sh has EXIT/INT/TERM trap for lockfile removal
- [x] shellcheck clean on all modified files
2026-04-19 19:54:07 +00:00
Agent
0f91efc478 fix: reset duplicate detection state between compose generation runs
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Reset _seen_services and _service_sources arrays at the start of
_generate_compose_impl to prevent state bleeding between multiple
invocations. This fixes the test-duplicate-service-detection.sh test
which fails when run due to global associative array state persisting
between test cases.

Fixes: #850
2026-04-19 19:53:29 +00:00
Agent
1170ecb2f0 fix: Compose generator should detect duplicate service names at generate-time (#850)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-19 19:12:40 +00:00
e9aed747b5 fix: feat: per-workflow/per-step CI diagnostics in agent fix prompts (implements #1050) (#1051) (#1052)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
Closes #1051. Implements the fix sketched in #1050.
2026-04-19 19:08:16 +00:00
Claude
d1c7f4573a ci: retrigger after flaky failure
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-19 18:49:43 +00:00
Claude
42807903ef ci: retrigger after flaky failure
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
2026-04-19 18:37:03 +00:00
Claude
1e1acd50ab fix: feat: per-workflow/per-step CI diagnostics in agent fix prompts (implements #1050) (#1051)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 18:33:44 +00:00
9cc12f2303 Merge pull request 'chore: gardener housekeeping 2026-04-19' (#1048) from chore/gardener-20260419-1702 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 17:14:22 +00:00
072d352c1c Merge pull request 'fix: bug: dev-poll skips CI-fix on re-claimed issues — blocked label not cleared on re-claim, starves new PRs at 0 attempts (#1047)' (#1049) from fix/issue-1047 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-19 17:11:07 +00:00
dev-qwen2
78f4966d0c fix: bug: dev-poll skips CI-fix on re-claimed issues — blocked label not cleared on re-claim, starves new PRs at 0 attempts (#1047)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-19 17:05:10 +00:00
Claude
ca8079ae70 chore: gardener housekeeping 2026-04-19
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-19 17:03:00 +00:00
5ba18c8f80 Merge pull request 'fix: bug: disinto-edge hard-fails on missing age key / secrets even when collect-engagement feature is not configured (#1038)' (#1045) from fix/issue-1038 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-19 15:43:12 +00:00
dev-qwen2
1c0ec3c7ec fix: bug: disinto-edge hard-fails on missing age key / secrets even when collect-engagement feature is not configured (#1038)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-19 15:39:57 +00:00
eb19aa6c84 Merge pull request 'chore: gardener housekeeping 2026-04-19' (#1042) from chore/gardener-20260419-1056 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 11:01:56 +00:00
Claude
86793c4c00 chore: gardener housekeeping 2026-04-19
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-19 10:56:38 +00:00
0bb04545d4 Merge pull request 'fix: [nomad-step-5] edge dispatcher task: Missing vault.read(kv/data/disinto/bots/vault) on fresh init (#1035)' (#1041) from fix/issue-1035 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 10:08:47 +00:00
1de3b0d560 Merge pull request 'fix: [nomad-step-5] edge caddy task fails to clone Forgejo from 127.0.0.1:3000 under bridge network (#1034)' (#1039) from fix/issue-1034 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 09:56:50 +00:00
Agent
d1e535696a detect-duplicates: add allowed hashes for vault-seed-ops-repo duplicate patterns
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The new vault-seed-ops-repo.sh script intentionally follows the same
pattern as vault-seed-forgejo.sh. Add 13 allowed hashes to prevent
false positives in duplicate detection CI.
2026-04-19 09:56:11 +00:00
dev-qwen2
ada27759de fix: [nomad-step-5] edge caddy task fails to clone Forgejo from 127.0.0.1:3000 under bridge network (#1034)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-19 09:45:06 +00:00
dev-qwen2
2648c401f4 fix: [nomad-step-5] edge caddy task fails to clone Forgejo from 127.0.0.1:3000 under bridge network (#1034) 2026-04-19 09:45:06 +00:00
b09463b162 Merge pull request 'fix: [nomad-step-5] deploy.sh 240s healthy_deadline too tight for chat cold-start (#1036)' (#1040) from fix/issue-1036 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 09:41:12 +00:00
Agent
72f981528d test: add test cases for edge service ops-repo seed (#1035)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-19 09:40:19 +00:00
Agent
cd778c4775 fix: [nomad-step-5] edge dispatcher task: Missing vault.read(kv/data/disinto/bots/vault) on fresh init (#1035)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
2026-04-19 09:35:27 +00:00
Claude
bf3d16e8b3 fix: [nomad-step-5] deploy.sh 240s healthy_deadline too tight for chat cold-start (#1036)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 09:32:46 +00:00
7c543c9a16 Merge pull request 'fix: fix: edge.hcl uses Docker hostname routing — forgejo/woodpecker/chat upstreams unreachable in Nomad (#1031)' (#1032) from fix/issue-1031 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 04:50:57 +00:00
b5fe756d7a Merge pull request 'chore: gardener housekeeping' (#1030) from chore/gardener-20260419-0434 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-19 04:45:09 +00:00
Claude
47046ead2e fix: add network_mode=host to dispatcher task — FORGE_URL unreachable from bridge namespace
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
The dispatcher task's FORGE_URL was changed to 127.0.0.1:3000 but the
task was still in bridge networking mode, making the host's loopback
unreachable. Add network_mode = "host" to match the caddy task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 04:44:10 +00:00
Claude
7fd8a0cbba fix: edge.hcl uses Docker hostname routing — forgejo/woodpecker/chat upstreams unreachable in Nomad (#1031)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Add network_mode = "host" to the caddy task docker config (matching
woodpecker-agent.hcl pattern) and replace all bare Docker hostnames
with 127.0.0.1:<port>:
- forgejo:3000  → 127.0.0.1:3000
- woodpecker:8000 → 127.0.0.1:8000
- chat:8080 → 127.0.0.1:8080
- FORGE_URL env in both caddy and dispatcher tasks

Staging route already uses nomadService discovery (S5-fix-7, #1018).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 04:36:32 +00:00
Claude
cf8a4b51ed chore: gardener housekeeping 2026-04-19
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-19 04:34:16 +00:00
a467d613a4 Merge pull request 'chore: gardener housekeeping' (#1029) from chore/gardener-20260418-2226 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 22:29:55 +00:00
Claude
2fd5bf2192 chore: gardener housekeeping 2026-04-18
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-18 22:26:40 +00:00
3fb2de4a8a Merge pull request 'fix: tech-debt: no-op sed in generate_compose --build mode (lib/generators.sh) (#915)' (#1024) from fix/issue-915 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-18 16:39:30 +00:00
Agent
c24d204b0f fix: tech-debt: no-op sed in generate_compose --build mode (lib/generators.sh) (#915)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-18 16:29:59 +00:00
58a4ce4e0c Merge pull request 'chore: gardener housekeeping' (#1020) from chore/gardener-20260418-1620 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 16:28:06 +00:00
Claude
b475f99873 chore: gardener housekeeping 2026-04-18
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-18 16:20:53 +00:00
b05a31197c Merge pull request 'fix: [nomad-step-5] S5-fix-7 — staging port 80 collides with edge; staging should use dynamic port (#1018)' (#1019) from fix/issue-1018 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 13:48:25 +00:00
Claude
e6dcad143d fix: [nomad-step-5] S5-fix-7 — staging port 80 collides with edge; staging should use dynamic port (#1018)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 13:39:30 +00:00
a35d6e7848 Merge pull request 'fix: [nomad-step-5] S5-fix-6 — chat Dockerfile must bake Claude CLI (same as agents #984) (#1016)' (#1017) from fix/issue-1016 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-18 13:10:10 +00:00
Agent
38b55e1855 fix: [nomad-step-5] S5-fix-6 — chat Dockerfile must bake Claude CLI (same as agents #984) (#1016)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-18 13:08:01 +00:00
Agent
4f5e546c42 fix: [nomad-step-5] S5-fix-6 — chat Dockerfile must bake Claude CLI (same as agents #984) (#1016)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-18 13:01:12 +00:00
85969ad42d Merge pull request 'fix: [nomad-step-5] S5-fix-5 — chat.hcl tmpfs syntax: use mount block not tmpfs argument (#1012)' (#1015) from fix/issue-1012-2 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 12:47:29 +00:00
Claude
31e2f63f1b fix: [nomad-step-5] S5-fix-5 — chat.hcl tmpfs syntax: use mount block not tmpfs argument (#1012)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 12:43:08 +00:00
f98338cec7 Merge pull request 'fix: [nomad-step-5] S5-fix-4 — staging health check 404: host volume empty, needs content seeded (#1010)' (#1011) from fix/issue-1010 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 11:35:45 +00:00
Agent
fa7fb60415 fix: [nomad-step-5] S5-fix-4 — staging health check 404: host volume empty, needs content seeded (#1010)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-18 11:22:39 +00:00
6f21582ffa Merge pull request 'fix: [nomad-step-5] S5-fix-2 — staging.hcl command should be caddy file-server not file-server (#1007)' (#1008) from fix/issue-1007 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 10:43:13 +00:00
cfe526b481 Merge pull request 'fix: nomad template whitespace trimming strips newlines between env var blocks (#996)' (#1006) from fix/issue-996 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 10:37:09 +00:00
Claude
ec8791787d fix: [nomad-step-5] S5-fix-2 — staging.hcl command should be caddy file-server not file-server (#1007)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 10:35:59 +00:00
dev-qwen2
d8f2be1c4f fix: nomad template whitespace trimming strips newlines between env var blocks (#996)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-18 10:29:20 +00:00
dev-qwen2
78a19a8add fix: nomad template whitespace trimming strips newlines between env var blocks (#996) 2026-04-18 10:29:20 +00:00
1eac6d63e2 Merge pull request 'fix: [nomad-step-5] S5-fix-1 — chat/edge image build context should be docker/<svc>/ not repo root (#1004)' (#1005) from fix/issue-1004 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 10:13:56 +00:00
Agent
f2bafbc190 fix: [nomad-step-5] S5-fix-1 — chat/edge image build context should be docker/<svc>/ not repo root (#1004)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-18 10:02:23 +00:00
dfb1a45295 Merge pull request 'chore: gardener housekeeping' (#1003) from chore/gardener-20260418-0955 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 10:02:15 +00:00
Claude
832d6bb851 chore: gardener housekeeping 2026-04-18
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-18 09:55:21 +00:00
8fc3ba5b59 Merge pull request 'fix: [nomad-step-5] S5.5 — wire --with edge,staging,chat + vault-runner + full deploy ordering (#992)' (#1002) from fix/issue-992-2 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 09:38:28 +00:00
Claude
3b82f8e3a1 fix: handle _hvault_seed_key rc=2 API error explicitly in vault-seed-chat.sh (#992)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:26:20 +00:00
Claude
8381f88491 fix: deduplicate vault-seed-chat.sh preconditions + help text for CI (#992)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:09:16 +00:00
Claude
0c85339285 fix: update bats test to include edge in known services list (#992)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:05:10 +00:00
Claude
acd6240ec4 fix: [nomad-step-5] S5.5 — wire --with edge,staging,chat + vault-runner + full deploy ordering (#992)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 09:01:54 +00:00
16474a1800 Merge pull request 'fix: [nomad-step-5] S5.2 — nomad/jobs/staging.hcl + chat.hcl (#989)' (#999) from fix/issue-989-2 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 08:28:40 +00:00
Claude
8b1857e83f fix: add site-content to HOST_VOLUME_DIRS + update AGENTS.md jobspec table (#989)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add /srv/disinto/docker to HOST_VOLUME_DIRS in cluster-up.sh so the
staging host volume directory exists before Nomad starts (prevents
client fingerprinting failure on fresh-box init).

Also add staging.hcl and chat.hcl entries to the nomad/AGENTS.md
jobspec documentation table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:20:10 +00:00
Claude
da93748fee fix: [nomad-step-5] S5.2 — nomad/jobs/staging.hcl + chat.hcl (#989)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add lightweight Nomad service jobs for the staging file server and
Claude chat UI. Key changes:

- nomad/jobs/staging.hcl: caddy:alpine file-server mounting docker/
  as /srv/site (read-only), no Vault integration needed
- nomad/jobs/chat.hcl: custom disinto/chat:local image with sandbox
  hardening (cap_drop ALL, tmpfs, pids_limit 128, security_opt),
  Vault-templated OAuth secrets from kv/disinto/shared/chat
- nomad/client.hcl: add site-content host volume for staging
- vault/policies/service-chat.hcl + vault/roles.yaml: read-only
  access to chat secrets via workload identity
- bin/disinto: wire staging+chat into build, deploy order, seed
  mapping, summary, and service validation
- tests/disinto-init-nomad.bats: update known-services assertion

Fixes prior art issue where security_opt and pids_limit were placed
at task level instead of inside docker driver config block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 08:01:48 +00:00
30bc21c650 Merge pull request 'fix: [nomad-step-5] S5.4 — dispatcher.sh DISPATCHER_BACKEND=nomad branch (nomad job dispatch) (#991)' (#997) from fix/issue-991 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-18 07:43:29 +00:00
Agent
9806ed40df fix: [nomad-step-5] S5.4 — dispatcher.sh nomad exit code extraction (dead != failure) (#991)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-18 07:41:05 +00:00
Agent
9f94b818a3 fix: [nomad-step-5] S5.4 — dispatcher.sh DISPATCHER_BACKEND=nomad branch (nomad job dispatch) (#991)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-18 07:28:54 +00:00
Agent
9f9abdee82 fix: [nomad-step-5] S5.4 — dispatcher.sh DISPATCHER_BACKEND=nomad branch (nomad job dispatch) (#991)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-18 07:20:16 +00:00
90831d3347 Merge pull request 'fix: [nomad-step-5] S5.1 — nomad/jobs/edge.hcl (Caddy + dispatcher sidecar) (#988)' (#994) from fix/issue-988 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 07:16:45 +00:00
dev-qwen2
72aecff8d8 fix: [nomad-step-5] S5.1 — nomad/jobs/edge.hcl (Caddy + dispatcher sidecar) (#988)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-18 07:08:20 +00:00
84d63d49b5 Merge pull request 'fix: [nomad-step-5] S5.3 — nomad/jobs/vault-runner.hcl (parameterized batch dispatch) (#990)' (#993) from fix/issue-990 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 06:58:33 +00:00
Claude
e17e9604c1 fix: [nomad-step-5] S5.3 — nomad/jobs/vault-runner.hcl (parameterized batch dispatch) (#990)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 06:45:40 +00:00
daaaf70d34 Merge pull request 'fix: [nomad-step-4] S4-fix-7 — agents.hcl must use :local tag not :latest (Nomad always pulls :latest) (#986)' (#987) from fix/issue-986 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 06:23:16 +00:00
dev-qwen2
4a07049383 fix: [nomad-step-4] S4-fix-7 — agents.hcl must use :local tag not :latest (Nomad always pulls :latest) (#986)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-18 06:11:33 +00:00
8c7b26f916 Merge pull request 'fix: [nomad-step-4] S4-fix-6 — bake Claude CLI into agents Docker image (remove host bind-mount) (#984)' (#985) from fix/issue-984 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-18 05:56:41 +00:00
dev-qwen2
deda192d60 fix: [nomad-step-4] S4-fix-6 — bake Claude CLI into agents Docker image (remove host bind-mount) (#984)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-18 05:44:35 +00:00
dev-qwen2
4a3c8e16db fix: [nomad-step-4] S4-fix-6 — bake Claude CLI into agents Docker image (remove host bind-mount) (#984)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-18 05:34:46 +00:00
450e2a09c8 Merge pull request 'chore: gardener housekeeping' (#983) from chore/gardener-20260418-0313 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-18 03:19:50 +00:00
Claude
f2b175e49b chore: gardener housekeeping 2026-04-18
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-18 03:13:46 +00:00
c872f28242 Merge pull request 'chore: gardener housekeeping' (#980) from chore/gardener-20260417-2106 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 21:13:47 +00:00
Claude
386f9a1bc0 chore: gardener housekeeping 2026-04-17
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 21:06:33 +00:00
71e770b8ae Merge pull request 'fix: [nomad-step-4] S4-fix-5 — agents.hcl needs force_pull=false for locally-built image (#978)' (#979) from fix/issue-978 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 17:02:18 +00:00
Agent
ffd1f41b33 fix: [nomad-step-4] S4-fix-5 — agents.hcl needs force_pull=false for locally-built image (#978)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 16:57:19 +00:00
05e57478ad Merge pull request 'fix: [nomad-step-4] S4-fix-4 — Dockerfile COPY tea fails on fresh clone (download instead) (#976)' (#977) from fix/issue-976 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-17 16:30:53 +00:00
dev-qwen2
5185cc720a fix: [nomad-step-4] S4-fix-4 — Dockerfile COPY tea fails on fresh clone (download instead) (#976)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-17 16:28:43 +00:00
93c26ef037 Merge pull request 'fix: [nomad-step-4] S4-fix-3 — Dockerfile COPY sops fails on fresh clone (download instead) (#974)' (#975) from fix/issue-974 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-17 16:14:54 +00:00
dev-qwen2
98bb5a3fee fix: [nomad-step-4] S4-fix-3 — Dockerfile COPY sops fails on fresh clone (download instead) (#974)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-17 16:08:41 +00:00
3cb76d571b Merge pull request 'fix: [nomad-step-4] S4-fix-2 — build disinto/agents:latest locally before deploy (no registry) (#972)' (#973) from fix/issue-972 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 16:03:16 +00:00
dev-qwen2
0c767d9fee fix: [nomad-step-4] S4-fix-2 — build disinto/agents:latest locally before deploy (no registry) (#972)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-17 15:47:52 +00:00
243b598374 Merge pull request 'fix: tech-debt: init --dry-run shows batch seed→deploy but real run is interleaved (#950)' (#970) from fix/issue-950 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 15:31:29 +00:00
dev-qwen2
b9588073ad fix: tech-debt: init --dry-run shows batch seed→deploy but real run is interleaved (#950)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-17 15:21:47 +00:00
9bb9be450a Merge pull request 'chore: gardener housekeeping' (#969) from chore/gardener-20260417-1445 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 15:07:58 +00:00
3b5498bc30 Merge pull request 'fix: [nomad-step-3] S3-fix-6 — woodpecker-agent can't reach server gRPC at localhost:9000 (port bound to LXC IP) (#964)' (#966) from fix/issue-964 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 15:01:59 +00:00
Claude
7f5234bd71 fix: woodpecker jobspecs deployed via deploy.sh, not Nomad API directly
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 14:59:14 +00:00
Claude
8bbd7e8ac8 chore: gardener housekeeping 2026-04-17 2026-04-17 14:59:14 +00:00
Agent
ab0a6be41f fix: use Nomad interpolation syntax for WOODPECKER_SERVER
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 14:58:13 +00:00
Agent
3d62b52e36 fix: [nomad-step-3] S3-fix-6 — woodpecker-agent can't reach server gRPC at localhost:9000 (port bound to LXC IP) (#964) 2026-04-17 14:58:13 +00:00
82a712bac3 Merge pull request 'fix: [nomad-step-4] S4-fix-1 — vault-seed-agents.sh must seed kv/disinto/bots/dev (missing from .env import) (#963)' (#965) from fix/issue-963 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-17 14:46:52 +00:00
dev-qwen2
1a637fdc27 fix: [nomad-step-4] S4-fix-1 — vault-seed-agents.sh must seed kv/disinto/bots/dev (missing from .env import) (#963)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 14:43:06 +00:00
edf7a28bd3 Merge pull request 'fix: [nomad-step-3] S3-fix-5 — nomad/client.hcl must allow_privileged for woodpecker-agent (#961)' (#962) from fix/issue-961 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 12:53:42 +00:00
dev-qwen2
fbcc6c5e43 fix: [nomad-step-3] S3-fix-5 — nomad/client.hcl must allow_privileged for woodpecker-agent (#961)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 12:48:08 +00:00
9c4c5f1ac8 Merge pull request 'fix: [nomad-step-4] S4.2 — wire --with agents + deploy ordering (#956)' (#960) from fix/issue-956 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 11:06:39 +00:00
dev-qwen2
155ec85a3e fix: [nomad-step-4] S4.2 — wire --with agents + deploy ordering (#956)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-17 10:55:13 +00:00
a51f543005 Merge pull request 'fix: [nomad-step-4] S4.1 — nomad/jobs/agents.hcl (7 roles, llama, vault-templated bot tokens) (#955)' (#959) from fix/issue-955 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 10:49:36 +00:00
2ef77f4aa3 Merge pull request 'fix: [nomad-step-3] S3-fix-3 — host-volume dirs need 0777 for non-root containers (#953)' (#957) from fix/issue-953 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 10:40:32 +00:00
6ff08a3b74 Merge pull request 'fix: [nomad-step-3] S3-fix-4 — KV key-name mismatch: wp_forgejo_client vs forgejo_client (#954)' (#958) from fix/issue-954 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-17 10:37:50 +00:00
Claude
eadefcd30a fix: replace script check with checkless service registration
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Nomad native service provider only supports tcp/http checks, not
script checks. Since agents expose no HTTP endpoint, register the
service without a check — Nomad tracks health via task lifecycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 10:09:56 +00:00
Claude
c17548a216 fix: move service block to group level for nomad provider
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
The Nomad native service provider requires the service block at the
group level, not inside the task. Script checks use task = "agents"
to specify the execution context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 10:07:36 +00:00
Claude
aa7db2a5fc fix: whitelist vault-seed preamble + precondition dup hashes
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 10:03:32 +00:00
dev-qwen2
ec3b51724f fix: [nomad-step-3] S3-fix-3 — host-volume dirs need 0777 for non-root containers (#953)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-17 10:00:16 +00:00
Claude
93a2a7bd3d fix: [nomad-step-4] S4.1 — nomad/jobs/agents.hcl (7 roles, llama, vault-templated bot tokens) (#955)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline failed
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 09:57:12 +00:00
Agent
612b3e616c fix: [nomad-step-3] S3-fix-4 — KV key-name mismatch: wp_forgejo_client vs forgejo_client (#954)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-17 09:53:23 +00:00
c20b0a8bd2 Merge pull request 'fix: [nomad-step-2] S2-fix-G — strip trailing /* from all vault policy paths (systemic 403) (#951)' (#952) from fix/issue-951 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 09:17:08 +00:00
Agent
8f5652864d fix: [nomad-step-2] S2-fix-G — strip trailing /* from all vault policy paths (systemic 403) (#951)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 09:11:22 +00:00
c47c6e71bd Merge pull request 'fix: [nomad-step-3] S3-fix-2 — wp-oauth REPO_ROOT still wrong + seed/deploy must interleave (#948)' (#949) from fix/issue-948 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 08:38:56 +00:00
dev-qwen2
8fb173763c fix: [nomad-step-3] S3-fix-2 — wp-oauth REPO_ROOT still wrong + seed/deploy must interleave (#948)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-17 08:24:00 +00:00
c829d7781b Merge pull request 'fix: [nomad-step-3] S3-fix — deploy.sh crashes on hyphenated job name + wp-oauth double lib/ path (#944)' (#945) from fix/issue-944 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 07:57:08 +00:00
dev-qwen2
7fd9a457c3 fix: [nomad-step-3] S3-fix — deploy.sh crashes on hyphenated job name + wp-oauth double lib/ path (#944)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
2026-04-17 07:49:40 +00:00
83f02cbb85 Merge pull request 'chore: gardener housekeeping' (#946) from chore/gardener-20260417-0738 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 07:42:25 +00:00
Claude
c604efd368 chore: gardener housekeeping 2026-04-17
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 07:38:11 +00:00
a7a046b81a Merge pull request 'fix: [nomad-step-3] S3.4 — wire --with woodpecker + deploy ordering + OAuth seed (#937)' (#943) from fix/issue-937-2 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 07:05:34 +00:00
Claude
64cadf8a7d fix: [nomad-step-3] S3.4 — wire --with woodpecker + deploy ordering + OAuth seed (#937)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 06:53:40 +00:00
3409c1b43c Merge pull request 'fix: [nomad-step-3] S3.3 — wp-oauth-register.sh (Forgejo OAuth app + Vault KV) (#936)' (#940) from fix/issue-936 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 06:08:09 +00:00
dev-qwen2
13088f4eb2 fix: propagate DRY_RUN env var to wp-oauth-register.sh
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 06:03:41 +00:00
dev-qwen2
442d24b76d fix: resolve CI blockers for wp-oauth-register.sh
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 05:54:30 +00:00
dev-qwen2
11566c2757 fix: add allowed hashes for vault-seed duplicate patterns
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 05:43:46 +00:00
dev-qwen2
10e469c970 fix: [nomad-step-3] S3.3 — wp-oauth-register.sh (Forgejo OAuth app + Vault KV) (#936) 2026-04-17 05:43:46 +00:00
71671d868d Merge pull request 'fix: [nomad-step-3] S3.2 — nomad/jobs/woodpecker-agent.hcl (host-net, docker.sock) (#935)' (#939) from fix/issue-935 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 05:42:19 +00:00
Agent
5d76cc96fb fix: [nomad-step-3] S3.2 — nomad/jobs/woodpecker-agent.hcl (host-net, docker.sock) (#935)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 05:35:02 +00:00
b501077352 Merge pull request 'fix: [nomad-step-3] S3.1 — nomad/jobs/woodpecker-server.hcl + vault-seed-woodpecker.sh (#934)' (#938) from fix/issue-934 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 05:29:10 +00:00
Claude
28ed3dd751 fix: extract KV mount check into hvault_ensure_kv_v2 to deduplicate seed scripts
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
The duplicate-detection CI step flagged the shared KV-mount-checking
boilerplate between vault-seed-forgejo.sh and vault-seed-woodpecker.sh.
Extract into lib/hvault.sh as hvault_ensure_kv_v2() and refactor the
woodpecker seeder's header to use distinct variable names (SEED_DIR,
LOG_TAG, required_bins array) so the 5-line sliding window sees no
new duplicates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 05:21:47 +00:00
Claude
32c88471a7 fix: [nomad-step-3] S3.1 — nomad/jobs/woodpecker-server.hcl + vault-seed-woodpecker.sh (#934)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 05:15:58 +00:00
40ffffed73 Merge pull request 'fix: incident: WP gRPC flake burned dev-qwen CI retry budget on #842 (2026-04-16) (#867)' (#933) from fix/issue-867 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-17 01:40:38 +00:00
7a45cc31f9 Merge pull request 'fix: tech-debt: edge service missing pull_policy: build in --build mode generator (#914)' (#931) from fix/issue-914 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-17 01:35:02 +00:00
Agent
c0697ab27b fix: incident: WP gRPC flake burned dev-qwen CI retry budget on #842 (2026-04-16) (#867)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-17 01:34:41 +00:00
Agent
04ead1fbdc fix: incident: WP gRPC flake burned dev-qwen CI retry budget on #842 (2026-04-16) (#867) 2026-04-17 01:34:41 +00:00
c3e58e88ed Merge pull request 'fix: tech-debt: tools/vault-import.sh uses hardcoded secret/ KV mount (#910)' (#932) from fix/issue-910 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-17 01:31:10 +00:00
Claude
f53c3690b8 fix: tech-debt: edge service missing pull_policy: build in --build mode generator (#914)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 01:18:13 +00:00
dev-qwen2
99d3cb4c8f fix: tech-debt: tools/vault-import.sh uses hardcoded secret/ KV mount (#910)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-17 01:18:03 +00:00
f93600a1cf Merge pull request 'chore: gardener housekeeping 2026-04-17' (#930) from chore/gardener-20260417-0107 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-17 01:11:55 +00:00
Claude
caf937f295 chore: gardener housekeeping 2026-04-17
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
- Promote #910, #914, #867 to backlog with acceptance criteria + affected files
- Promote #820 to backlog (already well-structured, dep on #758 gates pickup)
- Stage #915 as dust (no-op sed, single-line removal)
- Update all AGENTS.md watermarks to HEAD
- Root AGENTS.md: document vault-seed-<svc>.sh convention + complete test file list
- Track gardener/dust.jsonl in git (remove from .gitignore)
2026-04-17 01:07:31 +00:00
8ad5aca6bb Merge pull request 'fix: [nomad-step-2] S2-fix-F — wire tools/vault-seed-<svc>.sh into bin/disinto --with <svc> (#928)' (#929) from fix/issue-928 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 22:23:55 +00:00
Claude
f214080280 fix: [review-r1] seed loop sudo invocation bypasses sudoers env_reset (#929)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
`sudo -n "VAULT_ADDR=$vault_addr" -- "$seed_script"` passed
VAULT_ADDR as a sudoers env-assignment argument. With the default
`env_reset=on` policy (almost all distros), sudo silently discards
env assignments unless the variable is in `env_keep` — and
VAULT_ADDR is not. The seeder then hit its own precondition check
at vault-seed-forgejo.sh:109 and died with "VAULT_ADDR unset",
breaking the fresh-LXC non-root acceptance path the PR was written
to close.

Fix: run `env` as the command under sudo — `sudo -n -- env
"VAULT_ADDR=$vault_addr" "$seed_script"` — so VAULT_ADDR is set in
the child process directly, unaffected by sudoers env handling.
The root (non-sudo) branch already used shell-level env assignment
and was correct.

Adds a grep-level regression guard that pins the `env VAR=val`
invocation and negative-asserts the unsafe bare-argument form.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 22:14:05 +00:00
Claude
5e83ecc2ef fix: [nomad-step-2] S2-fix-F — wire tools/vault-seed-<svc>.sh into bin/disinto --with <svc> (#928)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
`tools/vault-seed-forgejo.sh` existed and worked, but `bin/disinto init
--backend=nomad --with forgejo` never invoked it, so a fresh LXC with an
empty Vault hit `Template Missing: vault.read(kv/data/disinto/shared/
forgejo)` and the forgejo alloc timed out inside deploy.sh's 240s
healthy_deadline — operator had to run the seeder + `nomad alloc
restart` by hand to recover.

In `_disinto_init_nomad`, after `vault-import.sh` (or its skip branch)
and before `deploy.sh`, iterate `--with <svc>` and auto-invoke
`tools/vault-seed-<svc>.sh` when the file exists + is executable.
Services without a seeder are silently skipped — Step 3+ services
(woodpecker, chat, etc.) can ship their own seeder without touching
`bin/disinto`. VAULT_ADDR is passed explicitly because cluster-up.sh
writes the profile.d export during this same init run (current shell
hasn't sourced it yet) and `vault-seed-forgejo.sh` — unlike its
sibling vault-* scripts — requires the caller to set VAULT_ADDR
instead of defaulting it via `_hvault_default_env`. Mirror the loop in
the --dry-run plan so the operator-visible plan matches the real run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 22:00:13 +00:00
bc3f10aff5 Merge pull request 'fix: [nomad-step-2] S2-fix-E — vault-import.sh still writes to secret/data/ not kv/data/ (#926)' (#927) from fix/issue-926 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 21:38:45 +00:00
Claude
f8afdfcf18 fix: [nomad-step-2] S2-fix-E — vault-import.sh still writes to secret/data/ not kv/data/ (#926)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The S2 Nomad+Vault migration switched the KV v2 mount from `secret/` to
`kv/` in policies, roles, templates, and lib/hvault.sh. tools/vault-import.sh
was missed — its curl URL and 4 error messages still hardcoded `secret/data/`,
so `disinto init --backend=nomad --with forgejo` hit 404 from vault on the
first write (issue body reproduces it with the gardener bot path).

Five call sites in _kv_put_secret flipped to `kv/data/`: the POST URL (L154)
and the curl-error / 404 / 403 / non-2xx branches (L156, L167, L171, L175).
The read helper is hvault_kv_get from lib/hvault.sh, which already resolves
through VAULT_KV_MOUNT (default `kv`), so no change needed there.

tests/vault-import.bats also updated: dev-mode vault only auto-mounts kv-v2
at secret/, so the test harness now enables a parallel kv-v2 mount at path=kv
during setup_file to mirror the production cluster layout. Test-side URLs
that assert round-trip reads all follow the same secret/ → kv/ rename.

shellcheck clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 21:29:35 +00:00
cfe1ef9512 Merge pull request 'fix: [nomad-step-2] S2-fix — 4 bugs block Step 2 verification: kv/ mount missing, VAULT_ADDR, --sops required, template fallback (#912)' (#923) from fix/issue-912-2 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 21:21:39 +00:00
Claude
0b994d5d6f fix: [nomad-step-2] S2-fix — 4 bugs block Step 2 verification: kv/ mount missing, VAULT_ADDR, --sops required, template fallback (#912)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Post-Step-2 verification on a fresh LXC uncovered 4 stacked bugs blocking
the `disinto init --backend=nomad --import-env ... --with forgejo` hero
command. Root cause is #1; #2-#4 surface as the operator walks past each.

1. kv/ secret engine never enabled — every policy, role, import write,
   and template read references kv/disinto/* and 403s without the mount.
   Adds lib/init/nomad/vault-engines.sh (idempotent POST sys/mounts/kv)
   wired into `_disinto_init_nomad` before vault-apply-policies.sh.

2. VAULT_ADDR/VAULT_TOKEN not exported in the init process. Extracts the
   5-line default-and-resolve block into `_hvault_default_env` in
   lib/hvault.sh and sources it from vault-engines.sh, vault-nomad-auth.sh,
   vault-apply-policies.sh, vault-apply-roles.sh, and vault-import.sh. One
   definition, zero copies — avoids the 5-line sliding-window duplicate
   gate that failed PRs #917/#918.

3. vault-import.sh required --sops; spec (#880) says --env alone must
   succeed. Flag validation now: --sops requires --age-key, --age-key
   requires --sops, --env alone imports only the plaintext half.

4. forgejo.hcl template blocks forever when kv/disinto/shared/forgejo is
   absent or missing a key. Adds `error_on_missing_key = false` so the
   existing `with ... else ...` fallback emits placeholders instead of
   hanging on template-pending.

vault-engines.sh parser uses a while/shift shape distinct from
vault-apply-policies.sh (flat case) and vault-apply-roles.sh (if/elif
ladder) so the three sibling flag parsers hash differently under the
repo-wide duplicate detector.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 21:10:59 +00:00
3e29a9a61d Merge pull request 'fix: vault/policies/service-forgejo.hcl: path glob misses exact secret path (#900)' (#916) from fix/issue-900 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 20:22:38 +00:00
29df502038 Merge pull request 'fix: vault-import.sh: pipe-separator in ops_data/paths_to_write silently truncates secret values containing | (#898)' (#913) from fix/issue-898 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 20:17:47 +00:00
Agent
98a4f8e362 fix: vault/policies/service-forgejo.hcl: path glob misses exact secret path (#900)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 20:09:34 +00:00
6dcc36cc8d Merge pull request 'fix: fix: --build mode agents: service missing pull_policy: build (same root as #887) (#893)' (#911) from fix/issue-893 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 20:07:26 +00:00
Claude
27baf496db fix: vault-import.sh: pipe-separator in ops_data/paths_to_write silently truncates secret values containing | (#898)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Replace the `|`-delimited string accumulators with bash associative and
indexed arrays so any byte may appear in a secret value.

Two sites used `|` as a delimiter over data that includes user secrets:

1. ops_data["path:key"]="value|status" — extraction via `${data%%|*}`
   truncated values at the first `|` (silently corrupting writes).
2. paths_to_write["path"]="k1=v1|k2=v2|..." — split back via
   `IFS='|' read -ra` at write time, so a value containing `|` was
   shattered across kv pairs (silently misrouting writes).

Fix:

- Split ops_data into two assoc arrays (`ops_value`, `ops_status`) keyed
  on "vault_path:vault_key" — value and status are stored independently
  with no in-band delimiter. (`:` is safe because both vault_path and
  vault_key are identifier-safe.)
- Track distinct paths in `path_seen` and, for each path, collect its
  kv pairs into a fresh indexed `pairs_array` by filtering ops_value.
  `_kv_put_secret` already splits each entry on the first `=` only, so
  `=` and `|` inside values are both preserved.

Added a bats regression that imports values like `abc|xyz`, `p1|p2|p3`,
and `admin|with|pipes` and asserts they round-trip through Vault
unmodified. Values are single-quoted in the .env so they survive
`source` — the accumulator is what this test exercises.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 20:04:54 +00:00
dev-qwen2
9f67f79ecd fix: fix: --build mode agents: service missing pull_policy: build (same root as #887) (#893)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-16 19:53:57 +00:00
391aaa99a5 Merge pull request 'fix: lib/hvault.sh uses secret/ mount prefix but migration policies use kv/ — agents will get 403 (#890)' (#909) from fix/issue-890 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 19:49:21 +00:00
164851fc9b Merge pull request 'fix: [nomad-step-2] S2.5 — bin/disinto init --import-env / --import-sops / --age-key wire-up (#883)' (#907) from fix/issue-883-2 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 19:44:44 +00:00
dev-qwen2
5fd36e94bb fix: lib/hvault.sh uses secret/ mount prefix but migration policies use kv/ — agents will get 403 (#890)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Changes:
- Add VAULT_KV_MOUNT env var (default: kv) to make KV mount configurable
- Update hvault_kv_get to use ${VAULT_KV_MOUNT}/data/${path}
- Update hvault_kv_put to use ${VAULT_KV_MOUNT}/data/${path}
- Update hvault_kv_list to use ${VAULT_KV_MOUNT}/metadata/${path}
- Update tests to use kv/ paths instead of secret/

This ensures agents can read/write secrets using the same mount point
that the Nomad+Vault migration policies grant ACL for.
2026-04-16 19:32:36 +00:00
Claude
ece5d9b6cc fix: [nomad-step-2] S2.5 review — gate policies/auth/import on --empty; reject --empty + --import-* (#883)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Addresses review #907 blocker: docs/nomad-migration.md claimed
--empty "skips policies/auth/import/deploy" but _disinto_init_nomad
had no $empty gate around those blocks — operators reaching the
"cluster-only escape hatch" would still invoke vault-apply-policies.sh
and vault-nomad-auth.sh, contradicting the runbook.

Changes:
- _disinto_init_nomad: exit 0 immediately after cluster-up when
  --empty is set, in both dry-run and real-run branches. Only the
  cluster-up plan appears; no policies, no auth, no import, no
  deploy. Matches the docs.
- disinto_init: reject --empty combined with any --import-* flag.
  --empty discards the import step, so the combination silently
  does nothing (worse failure mode than a clear error up front).
  Symmetric to the existing --empty vs --with check.
- Pre-flight existence check for policies/auth scripts now runs
  unconditionally on the non-empty path (previously gated on
  --import-*), matching the unconditional invocation. Import-script
  check stays gated on --import-*.

Non-blocking observation also addressed: the pre-flight guard
comment + actual predicate were inconsistent ("unconditionally
invoke policies+auth" but only checked on import). Now the
predicate matches: [ "$empty" != "true" ] gates policies/auth,
and an inner --import-* guard gates the import script.

Tests (+3):
- --empty --dry-run shows no S2.x sections (negative assertions)
- --empty --import-env rejected
- --empty --import-sops --age-key rejected

30/30 nomad tests pass; shellcheck clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:25:32 +00:00
Claude
aa3782748d fix: [nomad-step-2] S2.5 — bin/disinto init --import-env / --import-sops / --age-key wire-up (#883)
Wire the Step-2 building blocks (import, auth, policies) into
`disinto init --backend=nomad` so a single command on a fresh LXC
provisions cluster + policies + auth + imports secrets + deploys
services.

Adds three flags to `disinto init --backend=nomad`:
  --import-env PATH   plaintext .env from old stack
  --import-sops PATH  sops-encrypted .env.vault.enc (requires --age-key)
  --age-key PATH      age keyfile to decrypt --import-sops

Flow: cluster-up.sh → vault-apply-policies.sh → vault-nomad-auth.sh →
(optional) vault-import.sh → deploy.sh. Policies + auth run on every
nomad real-run path (idempotent); import runs only when --import-* is
set; all layers safe to re-run.

Flag validation:
  --import-sops without --age-key → error
  --age-key without --import-sops → error
  --import-env alone (no sops)    → OK
  --backend=docker + any --import-* → error

Dry-run prints a five-section plan (cluster-up + policies + auth +
import + deploy) with every argv that would be executed; touches
nothing, logs no secret values.

Dry-run output prints one line per --import-* flag that is actually
set — not in an if/elif chain — so all three paths appear when all
three flags are passed. Prior attempts regressed this invariant.

Tests:
  tests/disinto-init-nomad.bats +10 cases covering flag validation,
  dry-run plan shape (each flag prints its own path), policies+auth
  always-on (without --import-*), and --flag=value form.

Docs: docs/nomad-migration.md new file — cutover-day runbook with
invocation shape, flag summary, idempotency contract, dry-run, and
secret-hygiene notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:25:32 +00:00
520f8f1be8 Merge pull request 'fix: Two parallel activation paths for llama agents (ENABLE_LLAMA_AGENT vs [agents.X] TOML) (#846)' (#906) from fix/issue-846 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 19:22:43 +00:00
d0062ec859 Merge pull request 'fix: fix: vault_request RETURN trap fires prematurely when vault-env.sh is sourced (#773)' (#904) from fix/issue-773 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 19:11:43 +00:00
dev-qwen2
e003829eaa fix: Remove agents-llama service references from docs and formulas (#846)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- AGENTS.md: Replace agents-llama and agents-llama-all rows with generic
  'Local-model agents' entry pointing to docs/agents-llama.md
- formulas/release.sh: Remove agents-llama from docker compose stop/up
  commands (line 181-182)
- formulas/release.toml: Remove agents-llama references from restart-agents
  step description (lines 192, 195, 206)

These changes complete the removal of the legacy ENABLE_LLAMA_AGENT activation
path. The release formula now only references the 'agents' service, which is
the only service that exists after disinto init regenerates docker-compose.yml
based on [agents.X] TOML sections.
2026-04-16 19:05:46 +00:00
dev-qwen2
28eb182487 fix: Two parallel activation paths for llama agents (ENABLE_LLAMA_AGENT vs [agents.X] TOML) (#846) 2026-04-16 19:05:46 +00:00
Agent
96870d9f30 fix: fix: vault_request RETURN trap fires prematurely when vault-env.sh is sourced (#773)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 19:02:47 +00:00
c77fb1dc53 Merge pull request 'fix: entrypoint: validate_projects_dir silently exits instead of logging FATAL under set -eo pipefail (#877)' (#905) from fix/issue-877 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 18:48:07 +00:00
Claude
bbaccd678d fix: entrypoint: validate_projects_dir silently exits instead of logging FATAL under set -eo pipefail (#877)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
`compgen -G ... | wc -l` under `set -eo pipefail` aborts the script on
the non-zero pipeline exit (compgen returns 1 on no match) before the
FATAL diagnostic branch can run. The container still fast-fails, but
operators saw no explanation.

Switch to the conditional `if ! compgen -G ... >/dev/null 2>&1; then`
pattern already used at the two other compgen call sites in this file
(bootstrap_factory_repo and the PROJECT_NAME parser). The count for the
success-path log is computed after we've confirmed at least one match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:36:42 +00:00
dd61d0d29e Merge pull request 'fix: [nomad-step-2] S2.6 — CI: vault policy fmt + validate + roles.yaml check (#884)' (#903) from fix/issue-884-1 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 18:27:34 +00:00
701872af61 Merge pull request 'chore: gardener housekeeping 2026-04-16' (#901) from chore/gardener-20260416-1810 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 18:17:32 +00:00
Claude
6e73c6dd1f fix: [nomad-step-2] S2.6 — CI: vault policy fmt + validate + roles.yaml check (#884)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Extend .woodpecker/nomad-validate.yml with three new fail-closed steps
that guard every artifact under vault/policies/ and vault/roles.yaml
before it can land:

  4. vault-policy-fmt      — cp+fmt+diff idempotence check (vault 1.18.5
                             has no `policy fmt -check` flag, so we
                             build the non-destructive check out of
                             `vault policy fmt` on a /tmp copy + diff
                             against the original)
  5. vault-policy-validate — HCL syntax + capability validation via
                             `vault policy write` against an inline
                             dev-mode Vault server (no offline
                             `policy validate` subcommand exists;
                             dev-mode writes are ephemeral so this is
                             a validator, not a deploy)
  6. vault-roles-validate  — yamllint + PyYAML-based role→policy
                             reference check (every role's `policy:`
                             field must match a vault/policies/*.hcl
                             basename; also checks the four required
                             fields name/policy/namespace/job_id)

Secret-scan coverage for vault/policies/*.hcl is already provided by
the P11 gate (.woodpecker/secret-scan.yml) via its `vault/**/*` trigger
path — this pipeline intentionally does NOT duplicate that gate to
avoid the inline-heredoc / YAML-parse failure mode that sank the prior
attempt at this issue (PR #896).

Trigger paths extended: `vault/policies/**` and `vault/roles.yaml`.
`lib/init/nomad/vault-*.sh` is already covered by the existing
`lib/init/nomad/**` glob.

Docs: nomad/AGENTS.md and vault/policies/AGENTS.md updated with the
policy lifecycle, the CI enforcement table, and the common failure
modes authors will see.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:15:03 +00:00
Claude
6d7e539c28 chore: gardener housekeeping 2026-04-16
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 18:10:18 +00:00
6bdbeb5bd2 Merge pull request 'fix: [nomad-step-2] S2.4 — forgejo.hcl reads admin creds from Vault via template stanza (#882)' (#897) from fix/issue-882 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 17:50:36 +00:00
8b287ebf9a Merge pull request 'fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880)' (#889) from fix/issue-880 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 17:39:05 +00:00
Claude
0bc6f9c3cd fix: shorten empty-Vault placeholders to dodge secret-scan TOKEN= pattern
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
The lib/secret-scan.sh `(SECRET|TOKEN|...)=<16+ non-space chars>`
rule flagged the long `INTERNAL_TOKEN=VAULT-EMPTY-run-tools-vault-
seed-forgejo-sh` placeholder as a plaintext secret, failing CI's
secret-scan workflow on every PR that touched nomad/jobs/forgejo.hcl.
Shorten both placeholders to `seed-me` (<16 chars) — still visible in
a `grep FORGEJO__security__` audit, still obviously broken. The
operator-facing fix pointer moves to the `# WARNING` comment line in
the rendered env and to a new block comment above the template stanza.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:33:15 +00:00
Claude
89e454d0c7 fix: [nomad-step-2] S2.4 — forgejo.hcl reads admin creds from Vault via template stanza (#882)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline failed
Upgrade nomad/jobs/forgejo.hcl to read SECRET_KEY + INTERNAL_TOKEN from
Vault via a template stanza using the service-forgejo role (S2.3).
Non-secret config (DB, ports, ROOT_URL, registration lockdown) stays
inline. An empty-Vault fallback (`with ... else ...`) renders visible
placeholder env vars so a fresh LXC still brings forgejo up — the
operator sees the warning instead of forgejo silently regenerating
SECRET_KEY on every restart.

Add tools/vault-seed-forgejo.sh — idempotent seeder that ensures the
kv/ mount is KV v2 and populates kv/data/disinto/shared/forgejo with
random secret_key (32B hex) + internal_token (64B hex) on a clean
install. Existing non-empty values are left untouched; partial paths
are filled in atomically. Parser shape is positional-arity case
dispatch to stay structurally distinct from the two sibling vault-*.sh
tools and avoid the 5-line sliding-window dup detector.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:25:44 +00:00
dev-qwen2
428fa223d8 fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite for incremental updates and secure jq interpolation (#880)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-16 17:22:05 +00:00
dev-qwen2
197716ed5c fix: [nomad-step-2] S2.2 — Fix KV v2 overwrite by grouping key-value pairs per path (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
b4c290bfda fix: [nomad-step-2] S2.2 — Fix bot/runner operation parsing and sops value extraction (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
78f92d0cd0 fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
7a1f0b2c26 fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880) 2026-04-16 17:22:05 +00:00
dev-qwen2
1dc50e5784 fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880) 2026-04-16 17:22:05 +00:00
a2a7c4a12c Merge pull request 'fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881)' (#895) from fix/issue-881 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 17:10:18 +00:00
Claude
b2c86c3037 fix: [nomad-step-2] S2.3 review round 1 — document new helper + script, drop unused vault CLI precondition (#881)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Review feedback from PR #895 round 1:

- lib/AGENTS.md (hvault.sh row): add hvault_get_or_empty(PATH) to the
  public-function list; replace the "not sourced at runtime yet" note
  with the three actual callers (vault-apply-policies.sh,
  vault-apply-roles.sh, vault-nomad-auth.sh).
- lib/AGENTS.md (lib/init/nomad/ row): add a one-line description of
  vault-nomad-auth.sh (Step 2, this PR); relabel the row header from
  "Step 0 installer scripts" to "installer scripts" since it now spans
  Step 0 + Step 2.
- lib/init/nomad/vault-nomad-auth.sh: drop the `vault` CLI from the
  binary precondition check — hvault.sh's helpers are all curl-based,
  so the CLI is never invoked. The precondition would spuriously die on
  a Nomad-client-only node that has Vault server reachable but no
  `vault` binary installed. Inline comment preserves the rationale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:58:27 +00:00
Claude
8efef9f1bb fix: [nomad-step-2] S2.3 — vault-nomad-auth.sh (enable JWT auth + roles + nomad workload identity) (#881)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Wires Nomad → Vault via workload identity so jobs can exchange their
short-lived JWT for a Vault token carrying the policies in
vault/policies/ — no shared VAULT_TOKEN in job env.

- `lib/init/nomad/vault-nomad-auth.sh` — idempotent script: enable jwt
  auth at path `jwt-nomad`, config JWKS/algs, apply roles, install
  server.hcl + SIGHUP nomad on change.
- `tools/vault-apply-roles.sh` — companion sync script (S2.1 sibling);
  reads vault/roles.yaml and upserts each Vault role under
  auth/jwt-nomad/role/<name> with created/updated/unchanged semantics.
- `vault/roles.yaml` — declarative role→policy→bound_claims map; one
  entry per vault/policies/*.hcl. Keeps S2.1 policies and S2.3 role
  bindings visible side-by-side at review time.
- `nomad/server.hcl` — adds vault stanza (enabled, address,
  default_identity.aud=["vault.io"], ttl=1h).
- `lib/hvault.sh` — new `hvault_get_or_empty` helper shared between
  vault-apply-policies.sh, vault-apply-roles.sh, and vault-nomad-auth.sh;
  reads a Vault endpoint and distinguishes 200 / 404 / other.
- `vault/policies/AGENTS.md` — extends S2.1 docs with JWT-auth role
  naming convention, token shape, and the "add new service" flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:44:59 +00:00
112 changed files with 10224 additions and 1604 deletions

View file

@ -32,13 +32,10 @@ FORGE_URL=http://localhost:3000 # [CONFIG] local Forgejo instance
# - FORGE_PASS_DEV_QWEN2 # - FORGE_PASS_DEV_QWEN2
# Name conversion: tr 'a-z-' 'A-Z_' (lowercase→UPPER, hyphens→underscores). # Name conversion: tr 'a-z-' 'A-Z_' (lowercase→UPPER, hyphens→underscores).
# The compose generator looks these up via the agent's `forge_user` field in # The compose generator looks these up via the agent's `forge_user` field in
# the project TOML. The pre-existing `dev-qwen` llama agent uses # the project TOML. Configure local-model agents via [agents.X] sections in
# FORGE_TOKEN_LLAMA / FORGE_PASS_LLAMA (kept for backwards-compat with the # projects/*.toml — this is the canonical activation path.
# legacy `ENABLE_LLAMA_AGENT=1` single-agent path).
FORGE_TOKEN= # [SECRET] dev-bot API token (default for all agents) FORGE_TOKEN= # [SECRET] dev-bot API token (default for all agents)
FORGE_PASS= # [SECRET] dev-bot password for git HTTP push (#361) FORGE_PASS= # [SECRET] dev-bot password for git HTTP push (#361)
FORGE_TOKEN_LLAMA= # [SECRET] dev-qwen API token (for agents-llama)
FORGE_PASS_LLAMA= # [SECRET] dev-qwen password for git HTTP push
FORGE_REVIEW_TOKEN= # [SECRET] review-bot API token FORGE_REVIEW_TOKEN= # [SECRET] review-bot API token
FORGE_REVIEW_PASS= # [SECRET] review-bot password for git HTTP push FORGE_REVIEW_PASS= # [SECRET] review-bot password for git HTTP push
FORGE_PLANNER_TOKEN= # [SECRET] planner-bot API token FORGE_PLANNER_TOKEN= # [SECRET] planner-bot API token
@ -107,13 +104,6 @@ FORWARD_AUTH_SECRET= # [SECRET] Shared secret for Caddy ↔
# Store all project secrets here so formulas reference env vars, never hardcode. # Store all project secrets here so formulas reference env vars, never hardcode.
BASE_RPC_URL= # [SECRET] on-chain RPC endpoint BASE_RPC_URL= # [SECRET] on-chain RPC endpoint
# ── Local Qwen dev agent (optional) ──────────────────────────────────────
# Set ENABLE_LLAMA_AGENT=1 to emit agents-llama in docker-compose.yml.
# Requires a running llama-server reachable at ANTHROPIC_BASE_URL.
# See docs/agents-llama.md for details.
ENABLE_LLAMA_AGENT=0 # [CONFIG] 1 = enable agents-llama service
ANTHROPIC_BASE_URL= # [CONFIG] e.g. http://host.docker.internal:8081
# ── Tuning ──────────────────────────────────────────────────────────────── # ── Tuning ────────────────────────────────────────────────────────────────
CLAUDE_TIMEOUT=7200 # [CONFIG] max seconds per Claude invocation CLAUDE_TIMEOUT=7200 # [CONFIG] max seconds per Claude invocation

1
.gitignore vendored
View file

@ -20,7 +20,6 @@ metrics/supervisor-metrics.jsonl
# OS # OS
.DS_Store .DS_Store
dev/ci-fixes-*.json dev/ci-fixes-*.json
gardener/dust.jsonl
# Individual encrypted secrets (managed by disinto secrets add) # Individual encrypted secrets (managed by disinto secrets add)
secrets/ secrets/

View file

@ -294,6 +294,45 @@ def main() -> int:
"9f6ae8e7811575b964279d8820494eb0": "Verification helper: for loop done pattern", "9f6ae8e7811575b964279d8820494eb0": "Verification helper: for loop done pattern",
# Standard lib source block shared across formula-driven agent run scripts # Standard lib source block shared across formula-driven agent run scripts
"330e5809a00b95ade1a5fce2d749b94b": "Standard lib source block (env.sh, formula-session.sh, worktree.sh, guard.sh, agent-sdk.sh)", "330e5809a00b95ade1a5fce2d749b94b": "Standard lib source block (env.sh, formula-session.sh, worktree.sh, guard.sh, agent-sdk.sh)",
# Test data for duplicate service detection tests (#850)
# Intentionally duplicated TOML blocks in smoke-init.sh and test-duplicate-service-detection.sh
"334967b8b4f1a8d3b0b9b8e0912f3bfb": "Test TOML: [agents.llama] block header (smoke-init.sh + test-duplicate-service-detection.sh)",
"d82f30077e5bb23b5fc01db003033d5d": "Test TOML: [agents.llama] block body (smoke-init.sh + test-duplicate-service-detection.sh)",
# Common vault-seed script patterns: logging helpers + flag parsing
# Used in tools/vault-seed-woodpecker.sh + lib/init/nomad/wp-oauth-register.sh
"843a1cbf987952697d4e05e96ed2b2d5": "Logging helpers + DRY_RUN init (vault-seed-woodpecker + wp-oauth-register)",
"ee51df9642f2ef37af73b0c15f4d8406": "Logging helpers + DRY_RUN loop start (vault-seed-woodpecker + wp-oauth-register)",
"9a57368f3c1dfd29ec328596b86962a0": "Flag parsing loop + case start (vault-seed-woodpecker + wp-oauth-register)",
"9d72d40ff303cbed0b7e628fc15381c3": "Case loop + dry-run handler (vault-seed-woodpecker + wp-oauth-register)",
"5b52ddbbf47948e3cbc1b383f0909588": "Help + invalid arg handler end (vault-seed-woodpecker + wp-oauth-register)",
# forgejo-bootstrap.sh follows wp-oauth-register.sh pattern (issue #1069)
"2b80185e4ae2b54e2e01f33e5555c688": "Standard header (set -euo pipefail, SCRIPT_DIR, REPO_ROOT) (forgejo-bootstrap + wp-oauth-register)",
"38a1f20a60d69f0d6bfb06a0532b3bd7": "Logging helpers + DRY_RUN init (forgejo-bootstrap + wp-oauth-register)",
"4dd3c526fa29bdaa88b274c3d7d01032": "Flag parsing loop + case start (forgejo-bootstrap + wp-oauth-register)",
# Common vault-seed script preamble + precondition patterns
# Shared across tools/vault-seed-{forgejo,agents,woodpecker}.sh
"dff3675c151fcdbd2fef798826ae919b": "Vault-seed preamble: set -euo + path setup + source hvault.sh + KV_MOUNT",
"1cd9f0d083e24e6e6b2071db9b6dae09": "Vault-seed preconditions: binary check loop + VAULT_ADDR guard",
"63bfa88d71764c95c65a9a248f3e40ab": "Vault-seed preconditions: binary check end + VAULT_ADDR die",
"34873ad3570b211ce1d90468ab6ac94c": "Vault-seed preconditions: VAULT_ADDR die + hvault_token_lookup",
"71a52270f249e843cda48ad896d9f781": "Vault-seed preconditions: VAULT_ADDR + hvault_token_lookup + die",
# Common vault-seed script flag parsing patterns
# Shared across tools/vault-seed-{forgejo,ops-repo}.sh
"6906b7787796c2ccb8dd622e2ad4e7bf": "vault-seed DRY_RUN init + case pattern (forgejo + ops-repo)",
"a0df5283b616b964f8bc32fd99ec1b5a": "vault-seed case pattern start (forgejo + ops-repo)",
"e15e3272fdd9f0f46ce9e726aea9f853": "vault-seed case pattern dry-run handler (forgejo + ops-repo)",
"c9f22385cc49a3dac1d336bc14c6315b": "vault-seed DRY_RUN assignment (forgejo + ops-repo)",
"106f4071e88f841b3208b01144cd1c39": "vault-seed case pattern dry-run end (forgejo + ops-repo)",
"c15506dcb6bb340b25d1c39d442dd2e6": "vault-seed help text + invalid arg handler (forgejo + ops-repo)",
"1feecd3b3caf00045fae938ddf2811de": "vault-seed invalid arg handler (forgejo + ops-repo)",
"919780d5e7182715344f5aa02b191294": "vault-seed invalid arg + esac pattern (forgejo + ops-repo)",
"8dce1d292bce8e60ef4c0665b62945b0": "vault-seed esac + binary check loop (forgejo + ops-repo)",
"ca043687143a5b47bd54e65a99ce8ee8": "vault-seed binary check loop start (forgejo + ops-repo)",
"aefd9f655411a955395e6e5995ddbe6f": "vault-seed binary check pattern (forgejo + ops-repo)",
"60f0c46deb5491599457efb4048918e5": "vault-seed VAULT_ADDR + hvault_token_lookup check (forgejo + ops-repo)",
"f6838f581ef6b4d82b55268389032769": "vault-seed VAULT_ADDR + hvault_token_lookup die (forgejo + ops-repo)",
# Common shell control-flow: if → return 1 → fi → fi (env.sh + register.sh)
"a8bdb7f1a5d8cbd0a5921b17b6cf6f4d": "Common shell control-flow (return 1 / fi / fi / return 0 / }) (env.sh + register.sh)",
} }
if not sh_files: if not sh_files:

View file

@ -0,0 +1,317 @@
# =============================================================================
# .woodpecker/edge-subpath.yml — Edge subpath routing static checks
#
# Static validation for edge subpath routing configuration. This pipeline does
# NOT run live service curls — it validates the configuration that would be
# used by a deployed edge proxy.
#
# Checks:
# 1. shellcheck — syntax check on tests/smoke-edge-subpath.sh
# 2. caddy validate — validate the Caddyfile template syntax
# 3. caddyfile-routing-test — verify Caddyfile routing block shape
# 4. test-caddyfile-routing — run standalone unit test for Caddyfile structure
#
# Triggers:
# - Pull requests that modify edge-related files
#
# Environment variables (inherited from WOODPECKER_ENVIRONMENT):
# EDGE_BASE_URL — Edge proxy URL for reference (default: http://localhost)
# EDGE_TIMEOUT — Request timeout in seconds (default: 30)
# EDGE_MAX_RETRIES — Max retries per request (default: 3)
# =============================================================================
when:
event: pull_request
steps:
# ── 1. ShellCheck on smoke script ────────────────────────────────────────
# `shellcheck` validates bash syntax, style, and common pitfalls.
# Exit codes:
# 0 — all checks passed
# 1 — one or more issues found
- name: shellcheck-smoke
image: koalaman/shellcheck-alpine:stable
commands:
- shellcheck --severity=warning tests/smoke-edge-subpath.sh tests/test-caddyfile-routing.sh
# ── 2. Caddyfile template rendering ───────────────────────────────────────
# Render a mock Caddyfile for validation. The template uses Nomad's
# templating syntax ({{ range ... }}) which must be processed before Caddy
# can validate it. We render a mock version with Nomad templates expanded
# to static values for validation purposes.
- name: render-caddyfile
image: alpine:3.19
commands:
- apk add --no-cache coreutils
- |
set -e
mkdir -p edge-render
# Render mock Caddyfile with Nomad templates expanded
{
echo '# Caddyfile — edge proxy configuration (Nomad-rendered)'
echo '# Staging upstream discovered via Nomad service registration.'
echo ''
echo ':80 {'
echo ' # Redirect root to Forgejo'
echo ' handle / {'
echo ' redir /forge/ 302'
echo ' }'
echo ''
echo ' # Reverse proxy to Forgejo'
echo ' handle /forge/* {'
echo ' reverse_proxy 127.0.0.1:3000'
echo ' }'
echo ''
echo ' # Reverse proxy to Woodpecker CI'
echo ' handle /ci/* {'
echo ' reverse_proxy 127.0.0.1:8000'
echo ' }'
echo ''
echo ' # Reverse proxy to staging — dynamic port via Nomad service discovery'
echo ' handle /staging/* {'
echo ' reverse_proxy 127.0.0.1:8081'
echo ' }'
echo ''
echo ' # Chat service — reverse proxy to disinto-chat backend (#705)'
echo ' # OAuth routes bypass forward_auth — unauthenticated users need these (#709)'
echo ' handle /chat/login {'
echo ' reverse_proxy 127.0.0.1:8080'
echo ' }'
echo ' handle /chat/oauth/callback {'
echo ' reverse_proxy 127.0.0.1:8080'
echo ' }'
echo ' # Defense-in-depth: forward_auth stamps X-Forwarded-User from session (#709)'
echo ' handle /chat/* {'
echo ' forward_auth 127.0.0.1:8080 {'
echo ' uri /chat/auth/verify'
echo ' copy_headers X-Forwarded-User'
echo ' header_up X-Forward-Auth-Secret {$FORWARD_AUTH_SECRET}'
echo ' }'
echo ' reverse_proxy 127.0.0.1:8080'
echo ' }'
echo '}'
} > edge-render/Caddyfile
cp edge-render/Caddyfile edge-render/Caddyfile.rendered
echo "Caddyfile rendered successfully"
# ── 3. Caddy config validation ───────────────────────────────────────────
# `caddy validate` checks Caddyfile syntax and configuration.
# This validates the rendered Caddyfile against Caddy's parser.
# Exit codes:
# 0 — configuration is valid
# 1 — configuration has errors
- name: caddy-validate
image: alpine:3.19
commands:
- apk add --no-cache ca-certificates curl
- curl -sS -o /tmp/caddy "https://caddyserver.com/api/download?os=linux&arch=amd64"
- chmod +x /tmp/caddy
- /tmp/caddy version
- /tmp/caddy validate --config edge-render/Caddyfile.rendered --adapter caddyfile
# ── 4. Caddyfile routing block shape test ─────────────────────────────────
# Verify that the Caddyfile contains all required routing blocks:
# - /forge/ — Forgejo subpath
# - /ci/ — Woodpecker subpath
# - /staging/ — Staging subpath
# - /chat/ — Chat subpath with forward_auth
#
# This is a unit test that validates the expected structure without
# requiring a running Caddy instance.
- name: caddyfile-routing-test
image: alpine:3.19
commands:
- apk add --no-cache grep coreutils
- |
set -e
CADDYFILE="edge-render/Caddyfile.rendered"
echo "=== Validating Caddyfile routing blocks ==="
# Check that all required subpath handlers exist
# POSIX-safe loop (alpine /bin/sh has no arrays)
FAILED=0
for handler in "handle /forge/\*" "handle /ci/\*" "handle /staging/\*" "handle /chat/login" "handle /chat/oauth/callback" "handle /chat/\*"; do
if grep -q "$handler" "$CADDYFILE"; then
echo "[PASS] Found handler: $handler"
else
echo "[FAIL] Missing handler: $handler"
FAILED=1
fi
done
# Check forward_auth block exists for /chat/*
if grep -A5 "handle /chat/\*" "$CADDYFILE" | grep -q "forward_auth"; then
echo "[PASS] forward_auth block found for /chat/*"
else
echo "[FAIL] forward_auth block missing for /chat/*"
FAILED=1
fi
# Check reverse_proxy to Forgejo (port 3000)
if grep -q "reverse_proxy 127.0.0.1:3000" "$CADDYFILE"; then
echo "[PASS] Forgejo reverse_proxy configured (port 3000)"
else
echo "[FAIL] Forgejo reverse_proxy not configured"
FAILED=1
fi
# Check reverse_proxy to Woodpecker (port 8000)
if grep -q "reverse_proxy 127.0.0.1:8000" "$CADDYFILE"; then
echo "[PASS] Woodpecker reverse_proxy configured (port 8000)"
else
echo "[FAIL] Woodpecker reverse_proxy not configured"
FAILED=1
fi
# Check reverse_proxy to Chat (port 8080)
if grep -q "reverse_proxy 127.0.0.1:8080" "$CADDYFILE"; then
echo "[PASS] Chat reverse_proxy configured (port 8080)"
else
echo "[FAIL] Chat reverse_proxy not configured"
FAILED=1
fi
# Check root redirect to /forge/
if grep -q "redir /forge/ 302" "$CADDYFILE"; then
echo "[PASS] Root redirect to /forge/ configured"
else
echo "[FAIL] Root redirect to /forge/ not configured"
FAILED=1
fi
echo ""
if [ $FAILED -eq 0 ]; then
echo "=== All routing blocks validated ==="
exit 0
else
echo "=== Routing block validation failed ===" >&2
exit 1
fi
# ── 5. Standalone Caddyfile routing test ─────────────────────────────────
# Run the standalone unit test for Caddyfile routing block validation.
# This test extracts the Caddyfile template from edge.hcl and validates
# its structure without requiring a running Caddy instance.
- name: test-caddyfile-routing
image: alpine:3.19
commands:
- apk add --no-cache grep coreutils
- |
set -e
EDGE_TEMPLATE="nomad/jobs/edge.hcl"
echo "=== Extracting Caddyfile template from $EDGE_TEMPLATE ==="
# Extract the Caddyfile template (content between <<EOT and EOT markers)
CADDYFILE=$(sed -n '/data[[:space:]]*=[[:space:]]*<<[Ee][Oo][Tt]/,/^EOT$/p' "$EDGE_TEMPLATE" | sed '1s/.*/# Caddyfile extracted from Nomad template/; $d')
if [ -z "$CADDYFILE" ]; then
echo "ERROR: Could not extract Caddyfile template from $EDGE_TEMPLATE" >&2
exit 1
fi
echo "Caddyfile template extracted successfully"
echo ""
FAILED=0
# Check Forgejo subpath
if echo "$CADDYFILE" | grep -q "handle /forge/\*"; then
echo "[PASS] Forgejo handle block"
else
echo "[FAIL] Forgejo handle block"
FAILED=1
fi
if echo "$CADDYFILE" | grep -q "reverse_proxy 127.0.0.1:3000"; then
echo "[PASS] Forgejo reverse_proxy (port 3000)"
else
echo "[FAIL] Forgejo reverse_proxy (port 3000)"
FAILED=1
fi
# Check Woodpecker subpath
if echo "$CADDYFILE" | grep -q "handle /ci/\*"; then
echo "[PASS] Woodpecker handle block"
else
echo "[FAIL] Woodpecker handle block"
FAILED=1
fi
if echo "$CADDYFILE" | grep -q "reverse_proxy 127.0.0.1:8000"; then
echo "[PASS] Woodpecker reverse_proxy (port 8000)"
else
echo "[FAIL] Woodpecker reverse_proxy (port 8000)"
FAILED=1
fi
# Check Staging subpath
if echo "$CADDYFILE" | grep -q "handle /staging/\*"; then
echo "[PASS] Staging handle block"
else
echo "[FAIL] Staging handle block"
FAILED=1
fi
if echo "$CADDYFILE" | grep -q "nomadService"; then
echo "[PASS] Staging Nomad service discovery"
else
echo "[FAIL] Staging Nomad service discovery"
FAILED=1
fi
# Check Chat subpath
if echo "$CADDYFILE" | grep -q "handle /chat/login"; then
echo "[PASS] Chat login handle block"
else
echo "[FAIL] Chat login handle block"
FAILED=1
fi
if echo "$CADDYFILE" | grep -q "handle /chat/oauth/callback"; then
echo "[PASS] Chat OAuth callback handle block"
else
echo "[FAIL] Chat OAuth callback handle block"
FAILED=1
fi
if echo "$CADDYFILE" | grep -q "handle /chat/\*"; then
echo "[PASS] Chat catch-all handle block"
else
echo "[FAIL] Chat catch-all handle block"
FAILED=1
fi
if echo "$CADDYFILE" | grep -q "reverse_proxy 127.0.0.1:8080"; then
echo "[PASS] Chat reverse_proxy (port 8080)"
else
echo "[FAIL] Chat reverse_proxy (port 8080)"
FAILED=1
fi
# Check forward_auth for chat
if echo "$CADDYFILE" | grep -A10 "handle /chat/\*" | grep -q "forward_auth"; then
echo "[PASS] forward_auth block for /chat/*"
else
echo "[FAIL] forward_auth block for /chat/*"
FAILED=1
fi
# Check root redirect
if echo "$CADDYFILE" | grep -q "redir /forge/ 302"; then
echo "[PASS] Root redirect to /forge/"
else
echo "[FAIL] Root redirect to /forge/"
FAILED=1
fi
echo ""
if [ $FAILED -eq 0 ]; then
echo "=== All routing blocks validated ==="
exit 0
else
echo "=== Routing block validation failed ===" >&2
exit 1
fi

View file

@ -1,25 +1,21 @@
# ============================================================================= # =============================================================================
# .woodpecker/nomad-validate.yml — Static validation for Nomad+Vault artifacts # .woodpecker/nomad-validate.yml — Static validation for Nomad+Vault artifacts
# #
# Part of the Nomad+Vault migration (S0.5, issue #825). Locks in the # Part of the Nomad+Vault migration (S0.5, issue #825; extended in S2.6,
# "no-ad-hoc-steps" principle: every HCL/shell artifact under nomad/ or # issue #884). Locks in the "no-ad-hoc-steps" principle: every HCL/shell
# lib/init/nomad/, plus the `disinto init` dispatcher, gets checked # artifact under nomad/, lib/init/nomad/, vault/policies/, plus the
# before it can land. # `disinto init` dispatcher and vault/roles.yaml, gets checked before it
# # can land.
# Also includes Vault policy validation (S2.6, issue #884):
# - vault policy fmt -check on vault/policies/*.hcl
# - vault policy validate on each policy file
# - roles.yaml validator (yamllint + policy reference check)
# - secret-scan gate on policy files
# #
# Triggers on PRs (and pushes) that touch any of: # Triggers on PRs (and pushes) that touch any of:
# nomad/** — HCL configs (server, client, vault) # nomad/** — HCL configs (server, client, vault)
# lib/init/nomad/** — cluster-up / install / systemd / vault-init # lib/init/nomad/** — cluster-up / install / systemd / vault-init /
# vault-nomad-auth (S2.6 trigger: vault-*.sh
# is a subset of this glob)
# bin/disinto — `disinto init --backend=nomad` dispatcher # bin/disinto — `disinto init --backend=nomad` dispatcher
# tests/disinto-init-nomad.bats — the bats suite itself # tests/disinto-init-nomad.bats — the bats suite itself
# vault/policies/*.hcl — Vault ACL policies (S2.6) # vault/policies/** — Vault ACL policy HCL files (S2.1, S2.6)
# vault/roles.yaml — JWT auth role definitions (S2.6) # vault/roles.yaml — JWT-auth role bindings (S2.3, S2.6)
# lib/init/nomad/vault-*.sh — Vault init scripts (S2.6)
# .woodpecker/nomad-validate.yml — the pipeline definition # .woodpecker/nomad-validate.yml — the pipeline definition
# #
# Steps (all fail-closed — any error blocks merge): # Steps (all fail-closed — any error blocks merge):
@ -28,12 +24,22 @@
# nomad/jobs/*.hcl (new jobspecs get # nomad/jobs/*.hcl (new jobspecs get
# CI coverage automatically) # CI coverage automatically)
# 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl # 3. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
# 4. vault-policy-fmt — `vault policy fmt -check` on vault/policies/*.hcl # 4. vault-policy-fmt — `vault policy fmt` idempotence check on
# 5. vault-policy-validate — `vault policy validate` on each policy file # every vault/policies/*.hcl (format drift =
# 6. vault-roles-validate — yamllint + policy reference check # CI fail; non-destructive via cp+diff)
# 7. vault-secret-scan — scan policy files for embedded secrets # 5. vault-policy-validate — HCL syntax + capability validation for every
# 8. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto # vault/policies/*.hcl via `vault policy write`
# 9. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests # against an inline dev-mode Vault server
# 6. vault-roles-validate — yamllint + role→policy reference check on
# vault/roles.yaml (every referenced policy
# must exist as vault/policies/<name>.hcl)
# 7. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
# 8. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
#
# Secret-scan coverage: vault/policies/*.hcl is already scanned by the
# P11 gate (.woodpecker/secret-scan.yml, issue #798) — its trigger path
# `vault/**/*` covers everything under this directory. We intentionally
# do NOT duplicate that gate here; one scanner, one source of truth.
# #
# Pinned image versions match lib/init/nomad/install.sh (nomad 1.9.5 / # Pinned image versions match lib/init/nomad/install.sh (nomad 1.9.5 /
# vault 1.18.5). Bump there AND here together — drift = CI passing on # vault 1.18.5). Bump there AND here together — drift = CI passing on
@ -47,9 +53,8 @@ when:
- "lib/init/nomad/**" - "lib/init/nomad/**"
- "bin/disinto" - "bin/disinto"
- "tests/disinto-init-nomad.bats" - "tests/disinto-init-nomad.bats"
- "vault/policies/*.hcl" - "vault/policies/**"
- "vault/roles.yaml" - "vault/roles.yaml"
- "lib/init/nomad/vault-*.sh"
- ".woodpecker/nomad-validate.yml" - ".woodpecker/nomad-validate.yml"
# Authenticated clone — same pattern as .woodpecker/ci.yml. Forgejo is # Authenticated clone — same pattern as .woodpecker/ci.yml. Forgejo is
@ -139,14 +144,21 @@ steps:
*) echo "vault config: hard failure (rc=$rc)" >&2; exit "$rc" ;; *) echo "vault config: hard failure (rc=$rc)" >&2; exit "$rc" ;;
esac esac
# ── 4. Vault policy fmt -check ─────────────────────────────────────────── # ── 4. Vault policy fmt idempotence check ────────────────────────────────
# `vault policy fmt -check` is non-destructive; it reads each policy file # `vault policy fmt <file>` formats a local HCL policy file in place.
# and compares against the formatted version. Any difference means the file # There's no `-check`/dry-run flag (vault 1.18.5), so we implement a
# needs formatting. This enforces consistent indentation (2-space), no # non-destructive check as cp → fmt-on-copy → diff against original.
# trailing whitespace, and proper HCL formatting conventions. # Any diff means the committed file would be rewritten by `vault policy
# fmt` — failure steers the author to run `vault policy fmt <file>`
# locally before pushing.
# #
# CI runs this BEFORE vault policy validate because unformatted HCL can # Scope: vault/policies/*.hcl only. The `[ -f "$f" ]` guard handles the
# sometimes cause confusing validation errors. # no-match case (POSIX sh does not nullglob) so an empty policies/
# directory does not fail this step.
#
# Note: `vault policy fmt` is purely local (HCL text transform) and does
# not require a running Vault server, which is why this step can run
# without starting one.
- name: vault-policy-fmt - name: vault-policy-fmt
image: hashicorp/vault:1.18.5 image: hashicorp/vault:1.18.5
commands: commands:
@ -155,215 +167,153 @@ steps:
failed=0 failed=0
for f in vault/policies/*.hcl; do for f in vault/policies/*.hcl; do
[ -f "$f" ] || continue [ -f "$f" ] || continue
echo "fmt-check: $f" tmp="/tmp/$(basename "$f").fmt"
if ! vault policy fmt -check "$f" > /dev/null 2>&1; then cp "$f" "$tmp"
echo " ERROR: $f is not formatted correctly" >&2 vault policy fmt "$tmp" >/dev/null 2>&1
vault policy fmt -check "$f" >&2 || true if ! diff -u "$f" "$tmp"; then
echo "ERROR: $f is not formatted — run 'vault policy fmt $f' locally" >&2
failed=1 failed=1
fi fi
done done
if [ "$failed" -gt 0 ]; then if [ "$failed" -gt 0 ]; then
echo "vault-policy-fmt: formatting errors found" >&2 echo "vault-policy-fmt: formatting drift detected" >&2
exit 1 exit 1
fi fi
echo "vault-policy-fmt: all policies formatted correctly" echo "vault-policy-fmt: all policies formatted correctly"
# ── 5. Vault policy validate ───────────────────────────────────────────── # ── 5. Vault policy HCL syntax + capability validation ───────────────────
# `vault policy validate` performs syntax + semantic validation: # Vault has no offline `vault policy validate` subcommand — the closest
# - Checks for unknown stanzas/blocks # in-CLI validator is `vault policy write`, which sends the HCL to a
# - Validates path patterns are valid # running server which parses it, checks capability names against the
# - Validates capabilities are known (read, list, create, update, delete, sudo) # known set (read, list, create, update, delete, patch, sudo, deny),
# - Checks for missing required fields # and rejects unknown stanzas / malformed path blocks. We start an
# inline dev-mode Vault (in-memory, no persistence, root token = "root")
# for the duration of this step and loop `vault policy write` over every
# vault/policies/*.hcl; the policies never leave the ephemeral dev
# server, so this is strictly a validator — not a deploy.
# #
# Requires a running Vault instance (dev mode is sufficient for CI). # Exit-code handling:
# Uses the default dev server at http://127.0.0.1:8200 with "root" token. # - `vault policy write` exits 0 on success, non-zero on any parse /
# semantic error. We aggregate failures across all files so a single
# CI run surfaces every broken policy (not just the first).
# - The dev server is killed on any step exit via EXIT trap so the
# step tears down cleanly even on failure.
# #
# Exit codes: # Why dev-mode is sufficient: we're not persisting secrets, only asking
# 0 — policy is valid # Vault to parse policy text. The factory's production Vault is NOT
# 1 — policy has errors (syntax or semantic) # contacted.
#
# CI starts a Vault dev server inline for validation.
- name: vault-policy-validate - name: vault-policy-validate
image: hashicorp/vault:1.18.5 image: hashicorp/vault:1.18.5
commands: commands:
- | - |
set -e set -e
# Start Vault dev server in background vault server -dev -dev-root-token-id=root -dev-listen-address=127.0.0.1:8200 >/tmp/vault-dev.log 2>&1 &
vault server -dev -dev-root-token-id=root -dev-listen-address=0.0.0.0:8200 &
VAULT_PID=$! VAULT_PID=$!
trap "kill $VAULT_PID 2>/dev/null || true" EXIT trap 'kill "$VAULT_PID" 2>/dev/null || true' EXIT INT TERM
export VAULT_ADDR=http://127.0.0.1:8200
# Wait for Vault to be ready export VAULT_TOKEN=root
for i in $(seq 1 30); do ready=0
if vault status > /dev/null 2>&1; then i=0
echo "vault-policy-validate: Vault is ready" while [ "$i" -lt 30 ]; do
if vault status >/dev/null 2>&1; then
ready=1
break break
fi fi
i=$((i + 1))
sleep 0.5 sleep 0.5
done done
if [ "$ready" -ne 1 ]; then
# Validate each policy echo "vault-policy-validate: dev server failed to start after 15s" >&2
cat /tmp/vault-dev.log >&2 || true
exit 1
fi
failed=0 failed=0
for f in vault/policies/*.hcl; do for f in vault/policies/*.hcl; do
[ -f "$f" ] || continue [ -f "$f" ] || continue
name=$(basename "$f" .hcl)
echo "validate: $f" echo "validate: $f"
if ! VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=root vault policy validate "$f" > /dev/null 2>&1; then if ! vault policy write "$name" "$f"; then
echo " ERROR: $f validation failed" >&2 echo " ERROR: $f failed validation" >&2
VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=root vault policy validate "$f" >&2 || true
failed=1 failed=1
fi fi
done done
if [ "$failed" -gt 0 ]; then if [ "$failed" -gt 0 ]; then
echo "vault-policy-validate: validation errors found" >&2 echo "vault-policy-validate: validation errors found" >&2
exit 1 exit 1
fi fi
echo "vault-policy-validate: all policies valid" echo "vault-policy-validate: all policies valid"
# ── 6. Vault roles.yaml validate ───────────────────────────────────────── # ── 6. vault/roles.yaml validator ────────────────────────────────────────
# Validates vault/roles.yaml: # Validates the JWT-auth role bindings file (S2.3). Two checks:
# 1. yamllint check — ensures YAML syntax is valid
# 2. Policy reference check — each role references a policy that exists
# 3. Required fields check — each role has name, policies, and auth fields
# #
# If roles.yaml doesn't exist yet, this step is skipped (it will be added # a. `yamllint` — catches YAML syntax errors and indentation drift.
# in S2.3 alongside JWT auth configuration). # Uses a relaxed config (line length bumped to 200) because
# roles.yaml's comments are wide by design.
# b. role → policy reference check — every role's `policy:` field
# must match a basename in vault/policies/*.hcl. A role pointing
# at a non-existent policy = runtime "permission denied" at job
# placement; catching the drift here turns it into a CI failure.
# Also verifies each role entry has the four required fields
# (name, policy, namespace, job_id) per the file's documented
# format.
#
# Parsing is done with PyYAML (the roles.yaml format is a strict
# subset that awk-level parsing in tools/vault-apply-roles.sh handles
# too, but PyYAML in CI gives us structural validation for free). If
# roles.yaml is ever absent (e.g. reverted), the step skips rather
# than fails — presence is enforced by S2.3's own tooling, not here.
- name: vault-roles-validate - name: vault-roles-validate
image: python:3.12-alpine image: python:3.12-alpine
commands: commands:
- pip install --quiet --disable-pip-version-check pyyaml yamllint
- | - |
set -e set -e
apk add --no-cache yamllint jq
if [ ! -f vault/roles.yaml ]; then if [ ! -f vault/roles.yaml ]; then
echo "vault-roles-validate: roles.yaml not found, skipping" echo "vault-roles-validate: vault/roles.yaml not present, skipping"
exit 0 exit 0
fi fi
yamllint -d '{extends: relaxed, rules: {line-length: {max: 200}}}' vault/roles.yaml
echo "vault-roles-validate: yamllint OK"
python3 - <<'PY'
import os
import sys
import yaml
echo "yamllint: vault/roles.yaml" with open('vault/roles.yaml') as f:
yamllint -q vault/roles.yaml || { data = yaml.safe_load(f) or {}
echo " ERROR: yamllint found issues" >&2 roles = data.get('roles') or []
exit 1 if not roles:
print("vault-roles-validate: no roles defined in vault/roles.yaml", file=sys.stderr)
sys.exit(1)
existing = {
os.path.splitext(e)[0]
for e in os.listdir('vault/policies')
if e.endswith('.hcl')
} }
echo " OK" required = ('name', 'policy', 'namespace', 'job_id')
failed = 0
for r in roles:
if not isinstance(r, dict):
print(f"ERROR: role entry is not a mapping: {r!r}", file=sys.stderr)
failed = 1
continue
for field in required:
if r.get(field) in (None, ''):
print(f"ERROR: role entry missing required field '{field}': {r}", file=sys.stderr)
failed = 1
policy = r.get('policy')
if policy and policy not in existing:
print(
f"ERROR: role '{r.get('name')}' references policy '{policy}' "
f"but vault/policies/{policy}.hcl does not exist",
file=sys.stderr,
)
failed = 1
sys.exit(failed)
PY
echo "vault-roles-validate: all role→policy references valid"
# Extract and validate policy references # ── 7. Shellcheck ────────────────────────────────────────────────────────
echo "policy-reference-check: validating policy references"
policy_dir="vault/policies"
failed=0
# Get referenced policies from roles.yaml
referenced=$(jq -r '.roles[].policies[]?' vault/roles.yaml 2>/dev/null | sort -u || true)
if [ -z "$referenced" ]; then
echo "vault-roles-validate: no policies referenced in roles.yaml" >&2
exit 1
fi
# Get existing policy names
existing=$(find "$policy_dir" -maxdepth 1 -name '*.hcl' -type f -exec basename {} .hcl \; | sort)
for policy in $referenced; do
if ! echo "$existing" | grep -q "^${policy}$"; then
echo "vault-roles-validate: ERROR: policy '$policy' referenced but not found" >&2
failed=1
fi
done
if [ "$failed" -gt 0 ]; then
echo "vault-roles-validate: policy reference errors found" >&2
exit 1
fi
echo "vault-roles-validate: all policy references valid"
# ── 7. Vault secret-scan ─────────────────────────────────────────────────
# Scans policy HCL files for embedded secrets (rare but dangerous copy-paste
# mistake). Uses the same patterns as lib/secret-scan.sh:
# - Long hex strings (32+ chars)
# - API key patterns
# - URLs with embedded credentials
# - Bearer tokens
#
# Environment variables like $TOKEN or ${TOKEN} are excluded as safe.
- name: vault-secret-scan
image: alpine:3.19
commands:
- |
set -e
apk add --no-cache bash
# Copy the secret-scan.sh script into the container
cat > /tmp/secret-scan.sh << 'EOF'
#!/usr/bin/env bash
# Inline version of lib/secret-scan.sh for CI secret detection
_SECRET_PATTERNS=(
'[0-9a-fA-F]{32,}'
'Bearer [A-Za-z0-9_/+=-]{20,}'
'0x[0-9a-fA-F]{64}'
'https?://[^[:space:]]*[0-9a-fA-F]{20,}'
'AKIA[0-9A-Z]{16}'
'(API_KEY|SECRET|TOKEN|PRIVATE_KEY|PASSWORD|INFURA|ALCHEMY)=[^[:space:]"]{16,}'
)
_SAFE_PATTERNS=(
'\$\{?[A-Z_]+\}?'
'commit [0-9a-f]{40}'
'Merge [0-9a-f]{40}'
'last-reviewed: [0-9a-f]{40}'
'codeberg\.org/[^[:space:]]+'
'localhost:3000/[^[:space:]]+'
'SC[0-9]{4}'
)
scan_for_secrets() {
local text="${1:-$(cat)}"
local found=0
local cleaned="$text"
for safe in "${_SAFE_PATTERNS[@]}"; do
cleaned=$(printf '%s' "$cleaned" | sed -E "s/${safe}/__SAFE__/g" 2>/dev/null || printf '%s' "$cleaned")
done
for pattern in "${_SECRET_PATTERNS[@]}"; do
local matches
matches=$(printf '%s' "$cleaned" | grep -oE "$pattern" 2>/dev/null || true)
if [ -n "$matches" ]; then
while IFS= read -r match; do
[ "$match" = "__SAFE__" ] && continue
[ -z "$match" ] && continue
printf 'secret-scan: detected potential secret matching pattern [%s]: %s\n' \
"$pattern" "${match:0:8}...${match: -4}" >&2
found=1
done <<< "$matches"
fi
done
return $found
}
EOF
chmod +x /tmp/secret-scan.sh
# Scan policy files
echo "secret-scan: vault/policies/*.hcl"
failed=0
for f in vault/policies/*.hcl; do
[ -f "$f" ] || continue
echo " scanning: $f"
if ! /tmp/secret-scan.sh < "$f"; then
echo " ERROR: potential secrets detected in $f" >&2
failed=1
fi
done
if [ "$failed" -gt 0 ]; then
echo "vault-secret-scan: secrets detected" >&2
exit 1
fi
echo "vault-secret-scan: no secrets detected"
# ── 8. Shellcheck ────────────────────────────────────────────────────────
# Covers the new lib/init/nomad/*.sh scripts plus bin/disinto (which owns # Covers the new lib/init/nomad/*.sh scripts plus bin/disinto (which owns
# the backend dispatcher). bin/disinto has no .sh extension so the # the backend dispatcher). bin/disinto has no .sh extension so the
# repo-wide shellcheck in .woodpecker/ci.yml skips it — this step is the # repo-wide shellcheck in .woodpecker/ci.yml skips it — this step is the
@ -373,7 +323,7 @@ EOF
commands: commands:
- shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto - shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
# ── 9. bats: `disinto init --backend=nomad --dry-run` ──────────────────── # ── 8. bats: `disinto init --backend=nomad --dry-run` ────────────────────
# Smoke-tests the CLI dispatcher: both --backend=nomad variants exit 0 # Smoke-tests the CLI dispatcher: both --backend=nomad variants exit 0
# with the expected step list, and --backend=docker stays on the docker # with the expected step list, and --backend=docker stays on the docker
# path (regression guard). Pure dry-run — no sudo, no network. # path (regression guard). Pure dry-run — no sudo, no network.

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Disinto — Agent Instructions # Disinto — Agent Instructions
## What this repo is ## What this repo is
@ -37,17 +37,20 @@ disinto/ (code repo)
│ examples/ — example vault action TOMLs (promote, publish, release, webhook-call) │ examples/ — example vault action TOMLs (promote, publish, release, webhook-call)
├── lib/ env.sh, agent-sdk.sh, ci-helpers.sh, ci-debug.sh, load-project.sh, parse-deps.sh, guard.sh, mirrors.sh, pr-lifecycle.sh, issue-lifecycle.sh, worktree.sh, formula-session.sh, stack-lock.sh, forge-setup.sh, forge-push.sh, ops-setup.sh, ci-setup.sh, generators.sh, hire-agent.sh, release.sh, build-graph.py, branch-protection.sh, secret-scan.sh, tea-helpers.sh, action-vault.sh, ci-log-reader.py, git-creds.sh, sprint-filer.sh, hvault.sh ├── lib/ env.sh, agent-sdk.sh, ci-helpers.sh, ci-debug.sh, load-project.sh, parse-deps.sh, guard.sh, mirrors.sh, pr-lifecycle.sh, issue-lifecycle.sh, worktree.sh, formula-session.sh, stack-lock.sh, forge-setup.sh, forge-push.sh, ops-setup.sh, ci-setup.sh, generators.sh, hire-agent.sh, release.sh, build-graph.py, branch-protection.sh, secret-scan.sh, tea-helpers.sh, action-vault.sh, ci-log-reader.py, git-creds.sh, sprint-filer.sh, hvault.sh
│ hooks/ — Claude Code session hooks (on-compact-reinject, on-idle-stop, on-phase-change, on-pretooluse-guard, on-session-end, on-stop-failure) │ hooks/ — Claude Code session hooks (on-compact-reinject, on-idle-stop, on-phase-change, on-pretooluse-guard, on-session-end, on-stop-failure)
│ init/nomad/ — cluster-up.sh, install.sh, vault-init.sh, lib-systemd.sh (Nomad+Vault Step 0 installers, #821-#825) │ init/nomad/ — cluster-up.sh, install.sh, vault-init.sh, lib-systemd.sh (Nomad+Vault Step 0 installers, #821-#825); wp-oauth-register.sh (Forgejo OAuth2 app + Vault KV seeder for Woodpecker, S3.3); deploy.sh (dependency-ordered Nomad job deploy + health-wait, S4)
├── nomad/ server.hcl, client.hcl, vault.hcl — HCL configs deployed to /etc/nomad.d/ and /etc/vault.d/ by lib/init/nomad/cluster-up.sh ├── nomad/ server.hcl, client.hcl (allow_privileged for woodpecker-agent, S3-fix-5), vault.hcl — HCL configs deployed to /etc/nomad.d/ and /etc/vault.d/ by lib/init/nomad/cluster-up.sh
│ jobs/ — Nomad jobspecs: forgejo.hcl (Vault secrets via template, S2.4); woodpecker-server.hcl + woodpecker-agent.hcl (host-net, docker.sock, Vault KV, S3.1-S3.2); agents.hcl (7 roles, llama, Vault-templated bot tokens, S4.1); vault-runner.hcl (parameterized batch dispatch, S5.3); staging.hcl (Caddy file-server, dynamic port — edge discovers via service registration, S5.2); chat.hcl (Claude chat UI, tmpfs via mount block, Vault OAuth secrets, S5.2); edge.hcl (Caddy proxy + dispatcher sidecar, S5.1)
├── projects/ *.toml.example — templates; *.toml — local per-box config (gitignored) ├── projects/ *.toml.example — templates; *.toml — local per-box config (gitignored)
├── formulas/ Issue templates (TOML specs for multi-step agent tasks) ├── formulas/ Issue templates (TOML specs for multi-step agent tasks)
├── docker/ Dockerfiles and entrypoints: reproduce, triage, edge dispatcher, chat (server.py, entrypoint-chat.sh, Dockerfile, ui/) ├── docker/ Dockerfiles and entrypoints: reproduce, triage, edge (Caddy + chat server subprocess + dispatcher), chat (server.py, ui/ — copied into edge image at build time)
├── tools/ Operational tools: edge-control/ (register.sh, install.sh, verify-chat-sandbox.sh) ├── tools/ Operational tools: edge-control/ (register.sh, install.sh, verify-chat-sandbox.sh; register.sh enforces: reserved-name blocklist, admin-approved allowlist via /var/lib/disinto/allowlist.json, per-caller attribution via --as <tag> forced-command arg stored as registered_by, append-only audit log at /var/log/disinto/edge-register.log, ownership check on deregister requiring pubkey match)
│ vault-apply-policies.sh, vault-apply-roles.sh, vault-import.sh — Vault provisioning (S2.1/S2.2)
│ vault-seed-<svc>.sh — per-service Vault secret seeders; auto-invoked by `bin/disinto --with <svc>` (add a new file to support a new service)
├── docs/ Protocol docs (PHASE-PROTOCOL.md, EVIDENCE-ARCHITECTURE.md) ├── docs/ Protocol docs (PHASE-PROTOCOL.md, EVIDENCE-ARCHITECTURE.md)
├── site/ disinto.ai website content ├── site/ disinto.ai website content
├── tests/ Test files (mock-forgejo.py, smoke-init.sh, lib-hvault.bats, disinto-init-nomad.bats) ├── tests/ Test files (mock-forgejo.py, smoke-init.sh, lib-hvault.bats, lib-generators.bats, vault-import.bats, disinto-init-nomad.bats)
├── templates/ Issue templates ├── templates/ Issue templates
├── bin/ The `disinto` CLI script ├── bin/ The `disinto` CLI script (`--with <svc>` deploys services + runs their Vault seeders)
├── disinto-factory/ Setup documentation and skill ├── disinto-factory/ Setup documentation and skill
├── state/ Runtime state ├── state/ Runtime state
├── .woodpecker/ Woodpecker CI pipeline configs ├── .woodpecker/ Woodpecker CI pipeline configs
@ -120,8 +123,7 @@ bash dev/phase-test.sh
| Reproduce | `docker/reproduce/` | Bug reproduction using Playwright MCP | `formulas/reproduce.toml` | | Reproduce | `docker/reproduce/` | Bug reproduction using Playwright MCP | `formulas/reproduce.toml` |
| Triage | `docker/reproduce/` | Deep root cause analysis | `formulas/triage.toml` | | Triage | `docker/reproduce/` | Deep root cause analysis | `formulas/triage.toml` |
| Edge dispatcher | `docker/edge/` | Polls ops repo for vault actions, executes via Claude sessions | `docker/edge/dispatcher.sh` | | Edge dispatcher | `docker/edge/` | Polls ops repo for vault actions, executes via Claude sessions | `docker/edge/dispatcher.sh` |
| agents-llama | `docker/agents/` (same image) | Local-Qwen dev agent (`AGENT_ROLES=dev`), gated on `ENABLE_LLAMA_AGENT=1` | [docs/agents-llama.md](docs/agents-llama.md) | | Local-model agents | `docker/agents/` (same image) | Local llama-server agents configured via `[agents.X]` sections in project TOML | [docs/agents-llama.md](docs/agents-llama.md) |
| agents-llama-all | `docker/agents/` (same image) | Local-Qwen all-roles agent (all 7 roles), profile `agents-llama-all` | [docs/agents-llama.md](docs/agents-llama.md) |
> **Vault:** Being redesigned as a PR-based approval workflow (issues #73-#77). > **Vault:** Being redesigned as a PR-based approval workflow (issues #73-#77).
> See [docs/VAULT.md](docs/VAULT.md) for the vault PR workflow details. > See [docs/VAULT.md](docs/VAULT.md) for the vault PR workflow details.
@ -192,9 +194,7 @@ Humans write these. Agents read and enforce them.
## Phase-Signaling Protocol ## Phase-Signaling Protocol
When running as a persistent tmux session, Claude must signal the orchestrator When running as a persistent tmux session, Claude must signal the orchestrator at each phase boundary by writing to a phase file (e.g. `/tmp/dev-session-{project}-{issue}.phase`).
at each phase boundary by writing to a phase file (e.g.
`/tmp/dev-session-{project}-{issue}.phase`).
Key phases: `PHASE:awaiting_ci``PHASE:awaiting_review``PHASE:done`. Also: `PHASE:escalate` (needs human input), `PHASE:failed`. Key phases: `PHASE:awaiting_ci``PHASE:awaiting_review``PHASE:done`. Also: `PHASE:escalate` (needs human input), `PHASE:failed`.
See [docs/PHASE-PROTOCOL.md](docs/PHASE-PROTOCOL.md) for the complete spec, orchestrator reaction matrix, sequence diagram, and crash recovery. See [docs/PHASE-PROTOCOL.md](docs/PHASE-PROTOCOL.md) for the complete spec, orchestrator reaction matrix, sequence diagram, and crash recovery.

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Architect — Agent Instructions # Architect — Agent Instructions
## What this agent is ## What this agent is

View file

@ -12,6 +12,7 @@
# disinto secrets <subcommand> Manage encrypted secrets # disinto secrets <subcommand> Manage encrypted secrets
# disinto run <action-id> Run action in ephemeral runner container # disinto run <action-id> Run action in ephemeral runner container
# disinto ci-logs <pipeline> [--step <name>] Read CI logs from Woodpecker SQLite # disinto ci-logs <pipeline> [--step <name>] Read CI logs from Woodpecker SQLite
# disinto backup create <outfile> Export factory state for migration
# #
# Usage: # Usage:
# disinto init https://github.com/user/repo # disinto init https://github.com/user/repo
@ -39,7 +40,9 @@ source "${FACTORY_ROOT}/lib/generators.sh"
source "${FACTORY_ROOT}/lib/forge-push.sh" source "${FACTORY_ROOT}/lib/forge-push.sh"
source "${FACTORY_ROOT}/lib/ci-setup.sh" source "${FACTORY_ROOT}/lib/ci-setup.sh"
source "${FACTORY_ROOT}/lib/release.sh" source "${FACTORY_ROOT}/lib/release.sh"
source "${FACTORY_ROOT}/lib/backup.sh"
source "${FACTORY_ROOT}/lib/claude-config.sh" source "${FACTORY_ROOT}/lib/claude-config.sh"
source "${FACTORY_ROOT}/lib/disinto/backup.sh" # backup create/import
# ── Helpers ────────────────────────────────────────────────────────────────── # ── Helpers ──────────────────────────────────────────────────────────────────
@ -62,7 +65,9 @@ Usage:
disinto hire-an-agent <agent-name> <role> [--formula <path>] [--local-model <url>] [--model <name>] disinto hire-an-agent <agent-name> <role> [--formula <path>] [--local-model <url>] [--model <name>]
Hire a new agent (create user + .profile repo; re-run to rotate credentials) Hire a new agent (create user + .profile repo; re-run to rotate credentials)
disinto agent <subcommand> Manage agent state (enable/disable) disinto agent <subcommand> Manage agent state (enable/disable)
disinto backup create <outfile> Export factory state (issues + ops bundle)
disinto edge <verb> [options] Manage edge tunnel registrations disinto edge <verb> [options] Manage edge tunnel registrations
disinto backup <subcommand> Backup and restore factory state
Edge subcommands: Edge subcommands:
register [project] Register a new tunnel (generates keypair if needed) register [project] Register a new tunnel (generates keypair if needed)
@ -82,13 +87,16 @@ Init options:
--ci-id <n> Woodpecker CI repo ID (default: 0 = no CI) --ci-id <n> Woodpecker CI repo ID (default: 0 = no CI)
--forge-url <url> Forge base URL (default: http://localhost:3000) --forge-url <url> Forge base URL (default: http://localhost:3000)
--backend <value> Orchestration backend: docker (default) | nomad --backend <value> Orchestration backend: docker (default) | nomad
--with <services> (nomad) Deploy services: forgejo[,...] (S1.3) --with <services> (nomad) Deploy services: forgejo,woodpecker,agents,staging,chat,edge[,...] (S1.3, S3.4, S4.2, S5.2, S5.5)
--empty (nomad) Bring up cluster only, no jobs (S0.4) --empty (nomad) Bring up cluster only, no jobs (S0.4)
--bare Skip compose generation (bare-metal setup) --bare Skip compose generation (bare-metal setup)
--build Use local docker build instead of registry images (dev mode) --build Use local docker build instead of registry images (dev mode)
--yes Skip confirmation prompts --yes Skip confirmation prompts
--rotate-tokens Force regeneration of all bot tokens/passwords (idempotent by default) --rotate-tokens Force regeneration of all bot tokens/passwords (idempotent by default)
--dry-run Print every intended action without executing --dry-run Print every intended action without executing
--import-env <path> (nomad) Path to .env file for import into Vault KV (S2.5)
--import-sops <path> (nomad) Path to sops-encrypted .env.vault.enc for import (S2.5)
--age-key <path> (nomad) Path to age keyfile (required with --import-sops) (S2.5)
Hire an agent options: Hire an agent options:
--formula <path> Path to role formula TOML (default: formulas/<role>.toml) --formula <path> Path to role formula TOML (default: formulas/<role>.toml)
@ -98,6 +106,18 @@ Hire an agent options:
CI logs options: CI logs options:
--step <name> Filter logs to a specific step (e.g., smoke-init) --step <name> Filter logs to a specific step (e.g., smoke-init)
Backup subcommands:
create <file> Create backup of factory state to tarball
import <file> Restore factory state from backup tarball
Import behavior:
- Unpacks tarball to temp directory
- Creates disinto repo via Forgejo API (mirror config is manual)
- Creates disinto-ops repo and pushes refs from bundle
- Imports issues from issues/*.json (idempotent - skips existing)
- Logs issue number mapping (Forgejo auto-assigns numbers)
- Prints summary: created X repos, pushed Y refs, imported Z issues, skipped W
EOF EOF
exit 1 exit 1
} }
@ -664,8 +684,13 @@ prompt_admin_password() {
# `sudo disinto init ...` directly. # `sudo disinto init ...` directly.
_disinto_init_nomad() { _disinto_init_nomad() {
local dry_run="${1:-false}" empty="${2:-false}" with_services="${3:-}" local dry_run="${1:-false}" empty="${2:-false}" with_services="${3:-}"
local import_env="${4:-}" import_sops="${5:-}" age_key="${6:-}"
local cluster_up="${FACTORY_ROOT}/lib/init/nomad/cluster-up.sh" local cluster_up="${FACTORY_ROOT}/lib/init/nomad/cluster-up.sh"
local deploy_sh="${FACTORY_ROOT}/lib/init/nomad/deploy.sh" local deploy_sh="${FACTORY_ROOT}/lib/init/nomad/deploy.sh"
local vault_engines_sh="${FACTORY_ROOT}/lib/init/nomad/vault-engines.sh"
local vault_policies_sh="${FACTORY_ROOT}/tools/vault-apply-policies.sh"
local vault_auth_sh="${FACTORY_ROOT}/lib/init/nomad/vault-nomad-auth.sh"
local vault_import_sh="${FACTORY_ROOT}/tools/vault-import.sh"
if [ ! -x "$cluster_up" ]; then if [ ! -x "$cluster_up" ]; then
echo "Error: ${cluster_up} not found or not executable" >&2 echo "Error: ${cluster_up} not found or not executable" >&2
@ -677,6 +702,42 @@ _disinto_init_nomad() {
exit 1 exit 1
fi fi
# --empty short-circuits after cluster-up: no policies, no auth, no
# import, no deploy. It's the "cluster-only escape hatch" for debugging
# (docs/nomad-migration.md). Caller-side validation already rejects
# --empty combined with --with or any --import-* flag, so reaching
# this branch with those set is a bug in the caller.
#
# On the default (non-empty) path, vault-engines.sh (enables the kv/
# mount), vault-apply-policies.sh, and vault-nomad-auth.sh are invoked
# unconditionally — they are idempotent and cheap to re-run, and
# subsequent --with deployments depend on them. vault-import.sh is
# invoked only when an --import-* flag is set. vault-engines.sh runs
# first because every policy and role below references kv/disinto/*
# paths, which 403 if the engine is not yet mounted (issue #912).
local import_any=false
if [ -n "$import_env" ] || [ -n "$import_sops" ]; then
import_any=true
fi
if [ "$empty" != "true" ]; then
if [ ! -x "$vault_engines_sh" ]; then
echo "Error: ${vault_engines_sh} not found or not executable" >&2
exit 1
fi
if [ ! -x "$vault_policies_sh" ]; then
echo "Error: ${vault_policies_sh} not found or not executable" >&2
exit 1
fi
if [ ! -x "$vault_auth_sh" ]; then
echo "Error: ${vault_auth_sh} not found or not executable" >&2
exit 1
fi
if [ "$import_any" = true ] && [ ! -x "$vault_import_sh" ]; then
echo "Error: ${vault_import_sh} not found or not executable" >&2
exit 1
fi
fi
# --empty and default both invoke cluster-up today. Log the requested # --empty and default both invoke cluster-up today. Log the requested
# mode so the dispatch is visible in factory bootstrap logs — Step 1 # mode so the dispatch is visible in factory bootstrap logs — Step 1
# will branch on $empty to gate the job-deployment path. # will branch on $empty to gate the job-deployment path.
@ -686,7 +747,7 @@ _disinto_init_nomad() {
echo "nomad backend: default (cluster-up; jobs deferred to Step 1)" echo "nomad backend: default (cluster-up; jobs deferred to Step 1)"
fi fi
# Dry-run: print cluster-up plan + deploy.sh plan # Dry-run: print cluster-up plan + policies/auth/import plan + deploy.sh plan
if [ "$dry_run" = "true" ]; then if [ "$dry_run" = "true" ]; then
echo "" echo ""
echo "── Cluster-up dry-run ─────────────────────────────────" echo "── Cluster-up dry-run ─────────────────────────────────"
@ -694,20 +755,82 @@ _disinto_init_nomad() {
"${cmd[@]}" || true "${cmd[@]}" || true
echo "" echo ""
# --empty skips policies/auth/import/deploy — cluster-up only, no
# workloads. The operator-visible dry-run plan must match the real
# run, so short-circuit here too.
if [ "$empty" = "true" ]; then
exit 0
fi
# Vault engines + policies + auth are invoked on every nomad real-run
# path regardless of --import-* flags (they're idempotent; S2.1 + S2.3).
# Engines runs first because policies/roles/templates all reference the
# kv/ mount it enables (issue #912). Mirror that ordering in the
# dry-run plan so the operator sees the full sequence Step 2 will
# execute.
echo "── Vault engines dry-run ──────────────────────────────"
echo "[engines] [dry-run] ${vault_engines_sh} --dry-run"
echo ""
echo "── Vault policies dry-run ─────────────────────────────"
echo "[policies] [dry-run] ${vault_policies_sh} --dry-run"
echo ""
echo "── Vault auth dry-run ─────────────────────────────────"
echo "[auth] [dry-run] ${vault_auth_sh}"
echo ""
# Import plan: one line per --import-* flag that is actually set.
# Printing independently (not in an if/elif chain) means that all
# three flags appearing together each echo their own path — the
# regression that bit prior implementations of this issue (#883).
if [ "$import_any" = true ]; then
echo "── Vault import dry-run ───────────────────────────────"
[ -n "$import_env" ] && echo "[import] --import-env env file: ${import_env}"
[ -n "$import_sops" ] && echo "[import] --import-sops sops file: ${import_sops}"
[ -n "$age_key" ] && echo "[import] --age-key age key: ${age_key}"
local -a import_dry_cmd=("$vault_import_sh")
[ -n "$import_env" ] && import_dry_cmd+=("--env" "$import_env")
[ -n "$import_sops" ] && import_dry_cmd+=("--sops" "$import_sops")
[ -n "$age_key" ] && import_dry_cmd+=("--age-key" "$age_key")
import_dry_cmd+=("--dry-run")
echo "[import] [dry-run] ${import_dry_cmd[*]}"
echo ""
else
echo "[import] no --import-env/--import-sops — skipping; set them or seed kv/disinto/* manually before deploying secret-dependent services"
echo ""
fi
if [ -n "$with_services" ]; then if [ -n "$with_services" ]; then
# Interleaved seed/deploy per service (S2.6, #928, #948): match the
# real-run path so dry-run output accurately represents execution order.
# Build ordered deploy list: only include services present in with_services
local DEPLOY_ORDER=""
for ordered_svc in forgejo woodpecker-server woodpecker-agent agents staging chat edge; do
if echo ",$with_services," | grep -q ",$ordered_svc,"; then
DEPLOY_ORDER="${DEPLOY_ORDER:+${DEPLOY_ORDER} }${ordered_svc}"
fi
done
local IFS=' '
echo "[deploy] deployment order: ${DEPLOY_ORDER}"
for svc in $DEPLOY_ORDER; do
# Seed this service (if seed script exists)
local seed_name="$svc"
case "$svc" in
woodpecker-server|woodpecker-agent) seed_name="woodpecker" ;;
agents) seed_name="agents" ;;
chat) seed_name="chat" ;;
edge) seed_name="ops-repo" ;;
esac
local seed_script="${FACTORY_ROOT}/tools/vault-seed-${seed_name}.sh"
if [ -x "$seed_script" ]; then
echo "── Vault seed dry-run ─────────────────────────────────"
echo "[seed] [dry-run] ${seed_script} --dry-run"
echo ""
fi
# Deploy this service
echo "── Deploy services dry-run ────────────────────────────" echo "── Deploy services dry-run ────────────────────────────"
echo "[deploy] services to deploy: ${with_services}" echo "[deploy] services to deploy: ${with_services}"
local IFS=','
for svc in $with_services; do
svc=$(echo "$svc" | xargs) # trim whitespace
# Validate known services first
case "$svc" in
forgejo) ;;
*)
echo "Error: unknown service '${svc}' — known: forgejo" >&2
exit 1
;;
esac
local jobspec_path="${FACTORY_ROOT}/nomad/jobs/${svc}.hcl" local jobspec_path="${FACTORY_ROOT}/nomad/jobs/${svc}.hcl"
if [ ! -f "$jobspec_path" ]; then if [ ! -f "$jobspec_path" ]; then
echo "Error: jobspec not found: ${jobspec_path}" >&2 echo "Error: jobspec not found: ${jobspec_path}" >&2
@ -715,13 +838,44 @@ _disinto_init_nomad() {
fi fi
echo "[deploy] [dry-run] nomad job validate ${jobspec_path}" echo "[deploy] [dry-run] nomad job validate ${jobspec_path}"
echo "[deploy] [dry-run] nomad job run -detach ${jobspec_path}" echo "[deploy] [dry-run] nomad job run -detach ${jobspec_path}"
# Post-deploy: forgejo-bootstrap
if [ "$svc" = "forgejo" ]; then
local bootstrap_script="${FACTORY_ROOT}/lib/init/nomad/forgejo-bootstrap.sh"
echo "[deploy] [dry-run] [post-deploy] would run ${bootstrap_script}"
fi
done done
echo "[deploy] dry-run complete" echo "[deploy] dry-run complete"
fi fi
# Dry-run vault-runner (unconditionally, not gated by --with)
echo ""
echo "── Vault-runner dry-run ───────────────────────────────────"
local vault_runner_path="${FACTORY_ROOT}/nomad/jobs/vault-runner.hcl"
if [ -f "$vault_runner_path" ]; then
echo "[deploy] vault-runner: [dry-run] nomad job validate ${vault_runner_path}"
echo "[deploy] vault-runner: [dry-run] nomad job run -detach ${vault_runner_path}"
else
echo "[deploy] vault-runner: jobspec not found, skipping"
fi
# Build custom images dry-run (if agents, chat, or edge services are included)
if echo ",$with_services," | grep -qE ",(agents|chat|edge),"; then
echo ""
echo "── Build images dry-run ──────────────────────────────"
if echo ",$with_services," | grep -q ",agents,"; then
echo "[build] [dry-run] docker build -t disinto/agents:local -f ${FACTORY_ROOT}/docker/agents/Dockerfile ${FACTORY_ROOT}"
fi
if echo ",$with_services," | grep -q ",chat,"; then
echo "[build] [dry-run] docker build -t disinto/chat:local -f ${FACTORY_ROOT}/docker/chat/Dockerfile ${FACTORY_ROOT}/docker/chat"
fi
if echo ",$with_services," | grep -q ",edge,"; then
echo "[build] [dry-run] docker build -t disinto/edge:local -f ${FACTORY_ROOT}/docker/edge/Dockerfile ${FACTORY_ROOT}/docker/edge"
fi
fi
exit 0 exit 0
fi fi
# Real run: cluster-up + deploy services # Real run: cluster-up + policies + auth + (optional) import + deploy
local -a cluster_cmd=("$cluster_up") local -a cluster_cmd=("$cluster_up")
if [ "$(id -u)" -eq 0 ]; then if [ "$(id -u)" -eq 0 ]; then
"${cluster_cmd[@]}" || exit $? "${cluster_cmd[@]}" || exit $?
@ -733,36 +887,169 @@ _disinto_init_nomad() {
sudo -n -- "${cluster_cmd[@]}" || exit $? sudo -n -- "${cluster_cmd[@]}" || exit $?
fi fi
# Deploy services if requested # --empty short-circuits here: cluster-up only, no policies/auth/import
if [ -n "$with_services" ]; then # and no deploy. Matches the dry-run plan above and the docs/runbook.
if [ "$empty" = "true" ]; then
exit 0
fi
# Enable Vault secret engines (S2.1 / issue #912) — must precede
# policies/auth/import because every policy and every import target
# addresses paths under kv/. Idempotent, safe to re-run.
echo "" echo ""
echo "── Deploying services ─────────────────────────────────" echo "── Enabling Vault secret engines ──────────────────────"
local -a deploy_cmd=("$deploy_sh") local -a engines_cmd=("$vault_engines_sh")
# Split comma-separated service list into positional args if [ "$(id -u)" -eq 0 ]; then
local IFS=',' "${engines_cmd[@]}" || exit $?
for svc in $with_services; do else
svc=$(echo "$svc" | xargs) # trim whitespace if ! command -v sudo >/dev/null 2>&1; then
if ! echo "$svc" | grep -qE '^[a-zA-Z0-9_-]+$'; then echo "Error: vault-engines.sh must run as root and sudo is not installed" >&2
echo "Error: invalid service name '${svc}' — must match ^[a-zA-Z0-9_-]+$" >&2
exit 1 exit 1
fi fi
# Validate known services FIRST (before jobspec check) sudo -n -- "${engines_cmd[@]}" || exit $?
case "$svc" in fi
forgejo) ;;
*) # Apply Vault policies (S2.1) — idempotent, safe to re-run.
echo "Error: unknown service '${svc}' — known: forgejo" >&2 echo ""
echo "── Applying Vault policies ────────────────────────────"
local -a policies_cmd=("$vault_policies_sh")
if [ "$(id -u)" -eq 0 ]; then
"${policies_cmd[@]}" || exit $?
else
if ! command -v sudo >/dev/null 2>&1; then
echo "Error: vault-apply-policies.sh must run as root and sudo is not installed" >&2
exit 1 exit 1
fi
sudo -n -- "${policies_cmd[@]}" || exit $?
fi
# Configure Vault JWT auth + Nomad workload identity (S2.3) — idempotent.
echo ""
echo "── Configuring Vault JWT auth ─────────────────────────"
local -a auth_cmd=("$vault_auth_sh")
if [ "$(id -u)" -eq 0 ]; then
"${auth_cmd[@]}" || exit $?
else
if ! command -v sudo >/dev/null 2>&1; then
echo "Error: vault-nomad-auth.sh must run as root and sudo is not installed" >&2
exit 1
fi
sudo -n -- "${auth_cmd[@]}" || exit $?
fi
# Import secrets if any --import-* flag is set (S2.2).
if [ "$import_any" = true ]; then
echo ""
echo "── Importing secrets into Vault ───────────────────────"
local -a import_cmd=("$vault_import_sh")
[ -n "$import_env" ] && import_cmd+=("--env" "$import_env")
[ -n "$import_sops" ] && import_cmd+=("--sops" "$import_sops")
[ -n "$age_key" ] && import_cmd+=("--age-key" "$age_key")
if [ "$(id -u)" -eq 0 ]; then
"${import_cmd[@]}" || exit $?
else
if ! command -v sudo >/dev/null 2>&1; then
echo "Error: vault-import.sh must run as root and sudo is not installed" >&2
exit 1
fi
sudo -n -- "${import_cmd[@]}" || exit $?
fi
else
echo ""
echo "[import] no --import-env/--import-sops — skipping; set them or seed kv/disinto/* manually before deploying secret-dependent services"
fi
# Build custom images required by Nomad jobs (S4.2, S5.2, S5.5) — before deploy.
# Single-node factory dev box: no multi-node pull needed, no registry auth.
# Can upgrade to approach B (registry push/pull) later if multi-node.
if echo ",$with_services," | grep -qE ",(agents|chat|edge),"; then
echo ""
echo "── Building custom images ─────────────────────────────"
if echo ",$with_services," | grep -q ",agents,"; then
local tag="disinto/agents:local"
echo "── Building $tag ─────────────────────────────"
docker build -t "$tag" -f "${FACTORY_ROOT}/docker/agents/Dockerfile" "${FACTORY_ROOT}" 2>&1 | tail -5
fi
if echo ",$with_services," | grep -q ",chat,"; then
local tag="disinto/chat:local"
echo "── Building $tag ─────────────────────────────"
docker build -t "$tag" -f "${FACTORY_ROOT}/docker/chat/Dockerfile" "${FACTORY_ROOT}/docker/chat" 2>&1 | tail -5
fi
if echo ",$with_services," | grep -q ",edge,"; then
local tag="disinto/edge:local"
echo "── Building $tag ─────────────────────────────"
docker build -t "$tag" -f "${FACTORY_ROOT}/docker/edge/Dockerfile" "${FACTORY_ROOT}/docker/edge" 2>&1 | tail -5
fi
fi
# Interleaved seed/deploy per service (S2.6, #928, #948).
# We interleave seed + deploy per service (not batch all seeds then all deploys)
# so that OAuth-dependent services can reach their dependencies during seeding.
# E.g., seed-forgejo → deploy-forgejo → seed-woodpecker (OAuth can now reach
# running forgejo) → deploy-woodpecker.
if [ -n "$with_services" ]; then
local vault_addr="${VAULT_ADDR:-http://127.0.0.1:8200}"
# Build ordered deploy list (S3.4, S4.2, S5.2, S5.5): forgejo → woodpecker-server → woodpecker-agent → agents → staging → chat → edge
local DEPLOY_ORDER=""
for ordered_svc in forgejo woodpecker-server woodpecker-agent agents staging chat edge; do
if echo ",$with_services," | grep -q ",$ordered_svc,"; then
DEPLOY_ORDER="${DEPLOY_ORDER:+${DEPLOY_ORDER} }${ordered_svc}"
fi
done
local IFS=' '
for svc in $DEPLOY_ORDER; do
# Seed this service (if seed script exists)
local seed_name="$svc"
case "$svc" in
woodpecker-server|woodpecker-agent) seed_name="woodpecker" ;;
agents) seed_name="agents" ;;
chat) seed_name="chat" ;;
edge) seed_name="ops-repo" ;;
esac
local seed_script="${FACTORY_ROOT}/tools/vault-seed-${seed_name}.sh"
if [ -x "$seed_script" ]; then
echo ""
echo "── Seeding Vault for ${seed_name} ───────────────────────────"
if [ "$(id -u)" -eq 0 ]; then
VAULT_ADDR="$vault_addr" "$seed_script" || exit $?
else
if ! command -v sudo >/dev/null 2>&1; then
echo "Error: vault-seed-${seed_name}.sh must run as root and sudo is not installed" >&2
exit 1
fi
sudo -n -- env "VAULT_ADDR=$vault_addr" "$seed_script" || exit $?
fi
fi
# Deploy this service
echo ""
echo "── Deploying ${svc} ───────────────────────────────────────"
# Seed host volumes before deployment (if needed)
case "$svc" in
staging)
# Seed site-content host volume (/srv/disinto/docker) with static content
# The staging jobspec mounts this volume read-only to /srv/site
local site_content_src="${FACTORY_ROOT}/docker/index.html"
local site_content_dst="/srv/disinto/docker"
if [ -f "$site_content_src" ] && [ -d "$site_content_dst" ]; then
if ! cmp -s "$site_content_src" "${site_content_dst}/index.html" 2>/dev/null; then
echo "[staging] seeding site-content volume..."
cp "$site_content_src" "${site_content_dst}/index.html"
fi
fi
;; ;;
esac esac
# Check jobspec exists
local jobspec_path="${FACTORY_ROOT}/nomad/jobs/${svc}.hcl" local jobspec_path="${FACTORY_ROOT}/nomad/jobs/${svc}.hcl"
if [ ! -f "$jobspec_path" ]; then if [ ! -f "$jobspec_path" ]; then
echo "Error: jobspec not found: ${jobspec_path}" >&2 echo "Error: jobspec not found: ${jobspec_path}" >&2
exit 1 exit 1
fi fi
deploy_cmd+=("$svc")
done
local -a deploy_cmd=("$deploy_sh" "$svc")
if [ "$(id -u)" -eq 0 ]; then if [ "$(id -u)" -eq 0 ]; then
"${deploy_cmd[@]}" || exit $? "${deploy_cmd[@]}" || exit $?
else else
@ -770,17 +1057,84 @@ _disinto_init_nomad() {
echo "Error: deploy.sh must run as root and sudo is not installed" >&2 echo "Error: deploy.sh must run as root and sudo is not installed" >&2
exit 1 exit 1
fi fi
sudo -n -- "${deploy_cmd[@]}" || exit $? sudo -n --preserve-env=FORGE_ADMIN_PASS,FORGE_TOKEN,FORGE_URL -- "${deploy_cmd[@]}" || exit $?
fi
# Post-deploy: bootstrap Forgejo admin user after forgejo deployment
if [ "$svc" = "forgejo" ]; then
echo ""
echo "── Bootstrapping Forgejo admin user ───────────────────────"
local bootstrap_script="${FACTORY_ROOT}/lib/init/nomad/forgejo-bootstrap.sh"
if [ -x "$bootstrap_script" ]; then
if [ "$(id -u)" -eq 0 ]; then
"$bootstrap_script" || exit $?
else
if ! command -v sudo >/dev/null 2>&1; then
echo "Error: forgejo-bootstrap.sh must run as root and sudo is not installed" >&2
exit 1
fi
sudo -n --preserve-env=FORGE_ADMIN_PASS,FORGE_TOKEN,FORGE_URL -- "$bootstrap_script" || exit $?
fi
else
echo "warning: forgejo-bootstrap.sh not found or not executable" >&2
fi
fi
done
# Run vault-runner (unconditionally, not gated by --with) — infrastructure job
# vault-runner is always present since it's needed for vault action dispatch
echo ""
echo "── Running vault-runner ────────────────────────────────────"
local vault_runner_path="${FACTORY_ROOT}/nomad/jobs/vault-runner.hcl"
if [ -f "$vault_runner_path" ]; then
echo "[deploy] vault-runner: running Nomad job (infrastructure)"
local -a vault_runner_cmd=("$deploy_sh" "vault-runner")
if [ "$(id -u)" -eq 0 ]; then
"${vault_runner_cmd[@]}" || exit $?
else
if ! command -v sudo >/dev/null 2>&1; then
echo "Error: deploy.sh must run as root and sudo is not installed" >&2
exit 1
fi
sudo -n -- "${vault_runner_cmd[@]}" || exit $?
fi
else
echo "[deploy] vault-runner: jobspec not found, skipping"
fi fi
# Print final summary # Print final summary
echo "" echo ""
echo "── Summary ────────────────────────────────────────────" echo "── Summary ────────────────────────────────────────────"
echo "Cluster: Nomad+Vault cluster is up" echo "Cluster: Nomad+Vault cluster is up"
echo "Policies: applied (Vault ACL)"
echo "Auth: Vault JWT auth + Nomad workload identity configured"
if [ "$import_any" = true ]; then
local import_desc=""
[ -n "$import_env" ] && import_desc+="${import_env} "
[ -n "$import_sops" ] && import_desc+="${import_sops} "
echo "Imported: ${import_desc% }"
else
echo "Imported: (none — seed kv/disinto/* manually before deploying secret-dependent services)"
fi
echo "Deployed: ${with_services}" echo "Deployed: ${with_services}"
if echo "$with_services" | grep -q "forgejo"; then if echo ",$with_services," | grep -q ",forgejo,"; then
echo "Ports: forgejo: 3000" echo "Ports: forgejo: 3000"
fi fi
if echo ",$with_services," | grep -q ",woodpecker-server,"; then
echo " woodpecker-server: 8000"
fi
if echo ",$with_services," | grep -q ",woodpecker-agent,"; then
echo " woodpecker-agent: (agent connected)"
fi
if echo ",$with_services," | grep -q ",agents,"; then
echo " agents: (polling loop running)"
fi
if echo ",$with_services," | grep -q ",staging,"; then
echo " staging: (internal, no external port)"
fi
if echo ",$with_services," | grep -q ",chat,"; then
echo " chat: 8080"
fi
echo "────────────────────────────────────────────────────────" echo "────────────────────────────────────────────────────────"
fi fi
@ -803,6 +1157,7 @@ disinto_init() {
# Parse flags # Parse flags
local branch="" repo_root="" ci_id="0" auto_yes=false forge_url_flag="" bare=false rotate_tokens=false use_build=false dry_run=false backend="docker" empty=false with_services="" local branch="" repo_root="" ci_id="0" auto_yes=false forge_url_flag="" bare=false rotate_tokens=false use_build=false dry_run=false backend="docker" empty=false with_services=""
local import_env="" import_sops="" age_key=""
while [ $# -gt 0 ]; do while [ $# -gt 0 ]; do
case "$1" in case "$1" in
--branch) branch="$2"; shift 2 ;; --branch) branch="$2"; shift 2 ;;
@ -819,6 +1174,12 @@ disinto_init() {
--yes) auto_yes=true; shift ;; --yes) auto_yes=true; shift ;;
--rotate-tokens) rotate_tokens=true; shift ;; --rotate-tokens) rotate_tokens=true; shift ;;
--dry-run) dry_run=true; shift ;; --dry-run) dry_run=true; shift ;;
--import-env) import_env="$2"; shift 2 ;;
--import-env=*) import_env="${1#--import-env=}"; shift ;;
--import-sops) import_sops="$2"; shift 2 ;;
--import-sops=*) import_sops="${1#--import-sops=}"; shift ;;
--age-key) age_key="$2"; shift 2 ;;
--age-key=*) age_key="${1#--age-key=}"; shift ;;
*) echo "Unknown option: $1" >&2; exit 1 ;; *) echo "Unknown option: $1" >&2; exit 1 ;;
esac esac
done done
@ -859,11 +1220,104 @@ disinto_init() {
exit 1 exit 1
fi fi
# Normalize --with services (S3.4): expand 'woodpecker' shorthand to
# 'woodpecker-server,woodpecker-agent', auto-include forgejo when
# woodpecker is requested (OAuth dependency), and validate all names.
if [ -n "$with_services" ]; then
# Expand 'woodpecker' (bare) → 'woodpecker-server,woodpecker-agent'.
# Must not match already-expanded 'woodpecker-server'/'woodpecker-agent'.
local expanded=""
local IFS=','
for _svc in $with_services; do
_svc=$(echo "$_svc" | xargs)
case "$_svc" in
woodpecker) _svc="woodpecker-server,woodpecker-agent" ;;
agents) _svc="agents" ;;
esac
expanded="${expanded:+${expanded},}${_svc}"
done
with_services="$expanded"
unset IFS
# Auto-include forgejo when woodpecker is requested
if echo ",$with_services," | grep -q ",woodpecker-server,\|,woodpecker-agent," \
&& ! echo ",$with_services," | grep -q ",forgejo,"; then
echo "Note: --with woodpecker implies --with forgejo (OAuth dependency)"
with_services="forgejo,${with_services}"
fi
# Auto-include forgejo and woodpecker when agents is requested
if echo ",$with_services," | grep -q ",agents,"; then
if ! echo ",$with_services," | grep -q ",forgejo,"; then
echo "Note: --with agents implies --with forgejo (agents need forge)"
with_services="forgejo,${with_services}"
fi
if ! echo ",$with_services," | grep -q ",woodpecker-server,\|,woodpecker-agent,"; then
echo "Note: --with agents implies --with woodpecker (agents need CI)"
with_services="${with_services},woodpecker-server,woodpecker-agent"
fi
fi
# Auto-include all dependencies when edge is requested (S5.5)
if echo ",$with_services," | grep -q ",edge,"; then
# Edge depends on all backend services
for dep in forgejo woodpecker-server woodpecker-agent agents staging chat; do
if ! echo ",$with_services," | grep -q ",${dep},"; then
echo "Note: --with edge implies --with ${dep} (edge depends on all backend services)"
with_services="${with_services},${dep}"
fi
done
fi
# Validate all service names are known
local IFS=','
for _svc in $with_services; do
_svc=$(echo "$_svc" | xargs)
case "$_svc" in
forgejo|woodpecker-server|woodpecker-agent|agents|staging|chat|edge) ;;
*)
echo "Error: unknown service '${_svc}' — known: forgejo, woodpecker-server, woodpecker-agent, agents, staging, chat, edge" >&2
exit 1
;;
esac
done
unset IFS
fi
# --import-* flag validation (S2.5). These three flags form an import
# triple and must be consistent before dispatch: sops encryption is
# useless without the age key to decrypt it, so either both --import-sops
# and --age-key are present or neither is. --import-env alone is fine
# (it just imports the plaintext dotenv). All three flags are nomad-only.
if [ -n "$import_sops" ] && [ -z "$age_key" ]; then
echo "Error: --import-sops requires --age-key" >&2
exit 1
fi
if [ -n "$age_key" ] && [ -z "$import_sops" ]; then
echo "Error: --age-key requires --import-sops" >&2
exit 1
fi
if { [ -n "$import_env" ] || [ -n "$import_sops" ] || [ -n "$age_key" ]; } \
&& [ "$backend" != "nomad" ]; then
echo "Error: --import-env, --import-sops, and --age-key require --backend=nomad" >&2
exit 1
fi
# --empty is the cluster-only escape hatch — it skips policies, auth,
# import, and deploy. Pairing it with --import-* silently does nothing,
# which is a worse failure mode than a clear error. Reject explicitly.
if [ "$empty" = true ] \
&& { [ -n "$import_env" ] || [ -n "$import_sops" ] || [ -n "$age_key" ]; }; then
echo "Error: --empty and --import-env/--import-sops/--age-key are mutually exclusive" >&2
exit 1
fi
# Dispatch on backend — the nomad path runs lib/init/nomad/cluster-up.sh # Dispatch on backend — the nomad path runs lib/init/nomad/cluster-up.sh
# (S0.4). The default and --empty variants are identical today; Step 1 # (S0.4). The default and --empty variants are identical today; Step 1
# will branch on $empty to add job deployment to the default path. # will branch on $empty to add job deployment to the default path.
if [ "$backend" = "nomad" ]; then if [ "$backend" = "nomad" ]; then
_disinto_init_nomad "$dry_run" "$empty" "$with_services" _disinto_init_nomad "$dry_run" "$empty" "$with_services" \
"$import_env" "$import_sops" "$age_key"
# shellcheck disable=SC2317 # _disinto_init_nomad always exits today; # shellcheck disable=SC2317 # _disinto_init_nomad always exits today;
# `return` is defensive against future refactors. # `return` is defensive against future refactors.
return return
@ -977,7 +1431,6 @@ p.write_text(text)
echo "" echo ""
echo "[ensure] Forgejo admin user 'disinto-admin'" echo "[ensure] Forgejo admin user 'disinto-admin'"
echo "[ensure] 8 bot users: dev-bot, review-bot, planner-bot, gardener-bot, vault-bot, supervisor-bot, predictor-bot, architect-bot" echo "[ensure] 8 bot users: dev-bot, review-bot, planner-bot, gardener-bot, vault-bot, supervisor-bot, predictor-bot, architect-bot"
echo "[ensure] 2 llama bot users: dev-qwen, dev-qwen-nightly"
echo "[ensure] .profile repos for all bots" echo "[ensure] .profile repos for all bots"
echo "[ensure] repo ${forge_repo} on Forgejo with collaborators" echo "[ensure] repo ${forge_repo} on Forgejo with collaborators"
echo "[run] preflight checks" echo "[run] preflight checks"
@ -1021,6 +1474,36 @@ p.write_text(text)
exit 0 exit 0
fi fi
# Configure Forgejo and Woodpecker URLs when EDGE_TUNNEL_FQDN is set.
# In subdomain mode, uses per-service FQDNs at root path instead of subpath URLs.
# Must run BEFORE generate_compose so the .env file is available for variable substitution.
if [ -n "${EDGE_TUNNEL_FQDN:-}" ]; then
local routing_mode="${EDGE_ROUTING_MODE:-subpath}"
# Create .env file if it doesn't exist yet (needed before compose generation)
if [ "$bare" = false ] && [ ! -f "${FACTORY_ROOT}/.env" ]; then
touch "${FACTORY_ROOT}/.env"
fi
if [ "$routing_mode" = "subdomain" ]; then
# Subdomain mode: Forgejo at forge.<project>.disinto.ai (root path)
if ! grep -q '^FORGEJO_ROOT_URL=' "${FACTORY_ROOT}/.env" 2>/dev/null; then
echo "FORGEJO_ROOT_URL=https://${EDGE_TUNNEL_FQDN_FORGE:-forge.${EDGE_TUNNEL_FQDN}}/" >> "${FACTORY_ROOT}/.env"
fi
# Subdomain mode: Woodpecker at ci.<project>.disinto.ai (root path)
if ! grep -q '^WOODPECKER_HOST=' "${FACTORY_ROOT}/.env" 2>/dev/null; then
echo "WOODPECKER_HOST=https://${EDGE_TUNNEL_FQDN_CI:-ci.${EDGE_TUNNEL_FQDN}}" >> "${FACTORY_ROOT}/.env"
fi
else
# Subpath mode: Forgejo ROOT_URL with /forge/ subpath (trailing slash required)
if ! grep -q '^FORGEJO_ROOT_URL=' "${FACTORY_ROOT}/.env" 2>/dev/null; then
echo "FORGEJO_ROOT_URL=https://${EDGE_TUNNEL_FQDN}/forge/" >> "${FACTORY_ROOT}/.env"
fi
# Subpath mode: Woodpecker WOODPECKER_HOST with /ci subpath (no trailing slash for v3)
if ! grep -q '^WOODPECKER_HOST=' "${FACTORY_ROOT}/.env" 2>/dev/null; then
echo "WOODPECKER_HOST=https://${EDGE_TUNNEL_FQDN}/ci" >> "${FACTORY_ROOT}/.env"
fi
fi
fi
# Generate compose files (unless --bare) # Generate compose files (unless --bare)
if [ "$bare" = false ]; then if [ "$bare" = false ]; then
local forge_port local forge_port
@ -1035,18 +1518,6 @@ p.write_text(text)
touch "${FACTORY_ROOT}/.env" touch "${FACTORY_ROOT}/.env"
fi fi
# Configure Forgejo and Woodpecker subpath URLs when EDGE_TUNNEL_FQDN is set
if [ -n "${EDGE_TUNNEL_FQDN:-}" ]; then
# Forgejo ROOT_URL with /forge/ subpath (note trailing slash - Forgejo needs it)
if ! grep -q '^FORGEJO_ROOT_URL=' "${FACTORY_ROOT}/.env" 2>/dev/null; then
echo "FORGEJO_ROOT_URL=https://${EDGE_TUNNEL_FQDN}/forge/" >> "${FACTORY_ROOT}/.env"
fi
# Woodpecker WOODPECKER_HOST with /ci subpath (no trailing slash for v3)
if ! grep -q '^WOODPECKER_HOST=' "${FACTORY_ROOT}/.env" 2>/dev/null; then
echo "WOODPECKER_HOST=https://${EDGE_TUNNEL_FQDN}/ci" >> "${FACTORY_ROOT}/.env"
fi
fi
# Prompt for FORGE_ADMIN_PASS before setup_forge # Prompt for FORGE_ADMIN_PASS before setup_forge
# This ensures the password is set before Forgejo user creation # This ensures the password is set before Forgejo user creation
prompt_admin_password "${FACTORY_ROOT}/.env" prompt_admin_password "${FACTORY_ROOT}/.env"
@ -1150,9 +1621,15 @@ p.write_text(text)
create_woodpecker_oauth "$forge_url" "$forge_repo" create_woodpecker_oauth "$forge_url" "$forge_repo"
# Create OAuth2 app on Forgejo for disinto-chat (#708) # Create OAuth2 app on Forgejo for disinto-chat (#708)
# In subdomain mode, callback is at chat.<project> root instead of /chat/ subpath.
local chat_redirect_uri local chat_redirect_uri
if [ -n "${EDGE_TUNNEL_FQDN:-}" ]; then if [ -n "${EDGE_TUNNEL_FQDN:-}" ]; then
local chat_routing_mode="${EDGE_ROUTING_MODE:-subpath}"
if [ "$chat_routing_mode" = "subdomain" ]; then
chat_redirect_uri="https://${EDGE_TUNNEL_FQDN_CHAT:-chat.${EDGE_TUNNEL_FQDN}}/oauth/callback"
else
chat_redirect_uri="https://${EDGE_TUNNEL_FQDN}/chat/oauth/callback" chat_redirect_uri="https://${EDGE_TUNNEL_FQDN}/chat/oauth/callback"
fi
else else
chat_redirect_uri="http://localhost/chat/oauth/callback" chat_redirect_uri="http://localhost/chat/oauth/callback"
fi fi
@ -1173,19 +1650,6 @@ p.write_text(text)
echo "Config: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 saved to .env" echo "Config: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 saved to .env"
fi fi
# Write local-Qwen dev agent env keys with safe defaults (#769)
if ! grep -q '^ENABLE_LLAMA_AGENT=' "$env_file" 2>/dev/null; then
cat >> "$env_file" <<'LLAMAENVEOF'
# Local Qwen dev agent (optional) — set to 1 to enable
ENABLE_LLAMA_AGENT=0
FORGE_TOKEN_LLAMA=
FORGE_PASS_LLAMA=
ANTHROPIC_BASE_URL=
LLAMAENVEOF
echo "Config: ENABLE_LLAMA_AGENT keys written to .env (disabled by default)"
fi
# Create labels on remote # Create labels on remote
create_labels "$forge_repo" "$forge_url" create_labels "$forge_repo" "$forge_url"
@ -2365,15 +2829,29 @@ disinto_edge() {
# Write to .env (replace existing entries to avoid duplicates) # Write to .env (replace existing entries to avoid duplicates)
local tmp_env local tmp_env
tmp_env=$(mktemp) tmp_env=$(mktemp)
grep -Ev "^EDGE_TUNNEL_(HOST|PORT|FQDN)=" "$env_file" > "$tmp_env" 2>/dev/null || true grep -Ev "^EDGE_TUNNEL_(HOST|PORT|FQDN|FQDN_FORGE|FQDN_CI|FQDN_CHAT)=" "$env_file" > "$tmp_env" 2>/dev/null || true
mv "$tmp_env" "$env_file" mv "$tmp_env" "$env_file"
echo "EDGE_TUNNEL_HOST=${edge_host}" >> "$env_file" echo "EDGE_TUNNEL_HOST=${edge_host}" >> "$env_file"
echo "EDGE_TUNNEL_PORT=${port}" >> "$env_file" echo "EDGE_TUNNEL_PORT=${port}" >> "$env_file"
echo "EDGE_TUNNEL_FQDN=${fqdn}" >> "$env_file" echo "EDGE_TUNNEL_FQDN=${fqdn}" >> "$env_file"
# Subdomain mode: write per-service FQDNs (#1028)
local reg_routing_mode="${EDGE_ROUTING_MODE:-subpath}"
if [ "$reg_routing_mode" = "subdomain" ]; then
echo "EDGE_TUNNEL_FQDN_FORGE=forge.${fqdn}" >> "$env_file"
echo "EDGE_TUNNEL_FQDN_CI=ci.${fqdn}" >> "$env_file"
echo "EDGE_TUNNEL_FQDN_CHAT=chat.${fqdn}" >> "$env_file"
fi
echo "Registered: ${project}" echo "Registered: ${project}"
echo " Port: ${port}" echo " Port: ${port}"
echo " FQDN: ${fqdn}" echo " FQDN: ${fqdn}"
if [ "$reg_routing_mode" = "subdomain" ]; then
echo " Mode: subdomain"
echo " Forge: forge.${fqdn}"
echo " CI: ci.${fqdn}"
echo " Chat: chat.${fqdn}"
fi
echo " Saved to: ${env_file}" echo " Saved to: ${env_file}"
;; ;;
@ -2407,12 +2885,23 @@ disinto_edge() {
edge_host="${EDGE_HOST:-edge.disinto.ai}" edge_host="${EDGE_HOST:-edge.disinto.ai}"
fi fi
# Read tunnel pubkey for ownership proof
local secrets_dir="${FACTORY_ROOT}/secrets"
local tunnel_pubkey="${secrets_dir}/tunnel_key.pub"
if [ ! -f "$tunnel_pubkey" ]; then
echo "Error: tunnel keypair not found at ${tunnel_pubkey}" >&2
echo "Cannot prove ownership without the tunnel public key." >&2
exit 1
fi
local pubkey
pubkey=$(tr -d '\n' < "$tunnel_pubkey")
# SSH to edge host and deregister # SSH to edge host and deregister
echo "Deregistering tunnel for ${project} on ${edge_host}..." echo "Deregistering tunnel for ${project} on ${edge_host}..."
local response local response
response=$(ssh -o StrictHostKeyChecking=accept-new -o BatchMode=yes \ response=$(ssh -o StrictHostKeyChecking=accept-new -o BatchMode=yes \
"disinto-register@${edge_host}" \ "disinto-register@${edge_host}" \
"deregister ${project}" 2>&1) || { "deregister ${project} ${pubkey}" 2>&1) || {
echo "Error: failed to deregister tunnel" >&2 echo "Error: failed to deregister tunnel" >&2
echo "Response: ${response}" >&2 echo "Response: ${response}" >&2
exit 1 exit 1
@ -2495,6 +2984,33 @@ EOF
esac esac
} }
# ── backup command ────────────────────────────────────────────────────────────
# Usage: disinto backup <subcommand> [args]
# Subcommands:
# create <outfile.tar.gz> Create backup of factory state
# import <infile.tar.gz> Restore factory state from backup
disinto_backup() {
local subcmd="${1:-}"
shift || true
case "$subcmd" in
create)
backup_create "$@"
;;
import)
backup_import "$@"
;;
*)
echo "Usage: disinto backup <subcommand> [args]" >&2
echo "" >&2
echo "Subcommands:" >&2
echo " create <outfile.tar.gz> Create backup of factory state" >&2
echo " import <infile.tar.gz> Restore factory state from backup" >&2
exit 1
;;
esac
}
# ── Main dispatch ──────────────────────────────────────────────────────────── # ── Main dispatch ────────────────────────────────────────────────────────────
case "${1:-}" in case "${1:-}" in
@ -2511,6 +3027,7 @@ case "${1:-}" in
hire-an-agent) shift; disinto_hire_an_agent "$@" ;; hire-an-agent) shift; disinto_hire_an_agent "$@" ;;
agent) shift; disinto_agent "$@" ;; agent) shift; disinto_agent "$@" ;;
edge) shift; disinto_edge "$@" ;; edge) shift; disinto_edge "$@" ;;
backup) shift; disinto_backup "$@" ;;
-h|--help) usage ;; -h|--help) usage ;;
*) usage ;; *) usage ;;
esac esac

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Dev Agent # Dev Agent
**Role**: Implement issues autonomously — write code, push branches, address **Role**: Implement issues autonomously — write code, push branches, address

View file

@ -15,7 +15,6 @@ services:
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro - ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
@ -78,7 +77,6 @@ services:
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro - ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
@ -139,7 +137,6 @@ services:
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro - ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
@ -211,8 +208,8 @@ services:
edge: edge:
build: build:
context: docker/edge context: .
dockerfile: Dockerfile dockerfile: docker/edge/Dockerfile
image: disinto/edge:latest image: disinto/edge:latest
container_name: disinto-edge container_name: disinto-edge
security_opt: security_opt:
@ -223,6 +220,8 @@ services:
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/root/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/root/.claude.json:ro
- ${CLAUDE_DIR:-${HOME}/.claude}:/root/.claude:ro - ${CLAUDE_DIR:-${HOME}/.claude}:/root/.claude:ro
- disinto-logs:/opt/disinto-logs - disinto-logs:/opt/disinto-logs
# Chat history persistence (merged from chat container, #1083)
- ${CHAT_HISTORY_DIR:-./state/chat-history}:/var/lib/chat/history
environment: environment:
- FORGE_SUPERVISOR_TOKEN=${FORGE_SUPERVISOR_TOKEN:-} - FORGE_SUPERVISOR_TOKEN=${FORGE_SUPERVISOR_TOKEN:-}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-} - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
@ -234,6 +233,17 @@ services:
- PRIMARY_BRANCH=main - PRIMARY_BRANCH=main
- DISINTO_CONTAINER=1 - DISINTO_CONTAINER=1
- FORGE_ADMIN_USERS=disinto-admin,vault-bot,admin - FORGE_ADMIN_USERS=disinto-admin,vault-bot,admin
# Chat env vars (merged from chat container into edge, #1083)
- CHAT_HOST=127.0.0.1
- CHAT_PORT=8080
- CHAT_OAUTH_CLIENT_ID=${CHAT_OAUTH_CLIENT_ID:-}
- CHAT_OAUTH_CLIENT_SECRET=${CHAT_OAUTH_CLIENT_SECRET:-}
- DISINTO_CHAT_ALLOWED_USERS=${DISINTO_CHAT_ALLOWED_USERS:-}
- FORWARD_AUTH_SECRET=${FORWARD_AUTH_SECRET:-}
- EDGE_TUNNEL_FQDN=${EDGE_TUNNEL_FQDN:-}
- EDGE_TUNNEL_FQDN_CHAT=${EDGE_TUNNEL_FQDN_CHAT:-}
- EDGE_ROUTING_MODE=${EDGE_ROUTING_MODE:-subpath}
# Rate limiting removed (#1084)
ports: ports:
- "80:80" - "80:80"
- "443:443" - "443:443"

View file

@ -1,21 +1,26 @@
FROM debian:bookworm-slim FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y --no-install-recommends \ RUN apt-get update && apt-get install -y --no-install-recommends \
bash curl git jq tmux python3 python3-pip openssh-client ca-certificates age shellcheck procps gosu \ bash curl git jq tmux nodejs npm python3 python3-pip openssh-client ca-certificates age shellcheck procps gosu \
&& pip3 install --break-system-packages networkx tomlkit \ && pip3 install --break-system-packages networkx tomlkit \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
# Pre-built binaries (copied from docker/agents/bin/) # Pre-built binaries (copied from docker/agents/bin/)
# SOPS — encrypted data decryption tool # SOPS — encrypted data decryption tool
COPY docker/agents/bin/sops /usr/local/bin/sops # Download sops binary (replaces manual COPY of vendored binary)
RUN chmod +x /usr/local/bin/sops ARG SOPS_VERSION=3.9.4
RUN curl -fsSL "https://github.com/getsops/sops/releases/download/v${SOPS_VERSION}/sops-v${SOPS_VERSION}.linux.amd64" \
-o /usr/local/bin/sops && chmod +x /usr/local/bin/sops
# tea CLI — official Gitea/Forgejo CLI for issue/label/comment operations # tea CLI — official Gitea/Forgejo CLI for issue/label/comment operations
COPY docker/agents/bin/tea /usr/local/bin/tea # Download tea binary (replaces manual COPY of vendored binary)
RUN chmod +x /usr/local/bin/tea ARG TEA_VERSION=0.9.2
RUN curl -fsSL "https://dl.gitea.com/tea/${TEA_VERSION}/tea-${TEA_VERSION}-linux-amd64" \
-o /usr/local/bin/tea && chmod +x /usr/local/bin/tea
# Claude CLI is mounted from the host via docker-compose volume. # Install Claude Code CLI — agent runtime for all LLM backends (llama, Claude API).
# No internet access to cli.anthropic.com required at build time. # The CLI is the execution environment; ANTHROPIC_BASE_URL selects the model provider.
RUN npm install -g @anthropic-ai/claude-code@2.1.84
# Non-root user # Non-root user
RUN useradd -m -u 1000 -s /bin/bash agent RUN useradd -m -u 1000 -s /bin/bash agent

View file

@ -17,6 +17,38 @@ set -euo pipefail
# - predictor: every 24 hours (288 iterations * 5 min) # - predictor: every 24 hours (288 iterations * 5 min)
# - supervisor: every SUPERVISOR_INTERVAL seconds (default: 1200 = 20 min) # - supervisor: every SUPERVISOR_INTERVAL seconds (default: 1200 = 20 min)
# ── Migration check: reject ENABLE_LLAMA_AGENT ───────────────────────────────
# #846: The legacy ENABLE_LLAMA_AGENT env flag is no longer supported.
# Activation is now done exclusively via [agents.X] sections in project TOML.
# If this legacy flag is detected, fail immediately with a migration message.
if [ "${ENABLE_LLAMA_AGENT:-}" = "1" ]; then
cat <<'MIGRATION_ERR'
FATAL: ENABLE_LLAMA_AGENT is no longer supported.
The legacy ENABLE_LLAMA_AGENT=1 flag has been removed (#846).
Activation is now done exclusively via [agents.X] sections in projects/*.toml.
To migrate:
1. Remove ENABLE_LLAMA_AGENT from your .env or .env.enc file
2. Add an [agents.<name>] section to your project TOML:
[agents.dev-qwen]
base_url = "http://your-llama-server:8081"
model = "unsloth/Qwen3.5-35B-A3B"
api_key = "sk-no-key-required"
roles = ["dev"]
forge_user = "dev-qwen"
compact_pct = 60
poll_interval = 60
3. Run: disinto init
4. Start the agent: docker compose up -d agents-dev-qwen
See docs/agents-llama.md for full details.
MIGRATION_ERR
exit 1
fi
DISINTO_BAKED="/home/agent/disinto" DISINTO_BAKED="/home/agent/disinto"
DISINTO_LIVE="/home/agent/repos/_factory" DISINTO_LIVE="/home/agent/repos/_factory"
DISINTO_DIR="$DISINTO_BAKED" # start with baked copy; switched to live checkout after bootstrap DISINTO_DIR="$DISINTO_BAKED" # start with baked copy; switched to live checkout after bootstrap
@ -346,15 +378,19 @@ bootstrap_factory_repo
# This prevents the silent-zombie mode where the polling loop matches zero files # This prevents the silent-zombie mode where the polling loop matches zero files
# and does nothing forever. # and does nothing forever.
validate_projects_dir() { validate_projects_dir() {
local toml_count # NOTE: compgen -G exits non-zero when no matches exist, so piping it through
toml_count=$(compgen -G "${DISINTO_DIR}/projects/*.toml" 2>/dev/null | wc -l) # `wc -l` under `set -eo pipefail` aborts the script before the FATAL branch
if [ "$toml_count" -eq 0 ]; then # can log a diagnostic (#877). Use the conditional form already adopted at
# lines above (see bootstrap_factory_repo, PROJECT_NAME parsing).
if ! compgen -G "${DISINTO_DIR}/projects/*.toml" >/dev/null 2>&1; then
log "FATAL: No real .toml files found in ${DISINTO_DIR}/projects/" log "FATAL: No real .toml files found in ${DISINTO_DIR}/projects/"
log "Expected at least one project config file (e.g., disinto.toml)" log "Expected at least one project config file (e.g., disinto.toml)"
log "The directory only contains *.toml.example template files." log "The directory only contains *.toml.example template files."
log "Mount the host ./projects volume or copy real .toml files into the container." log "Mount the host ./projects volume or copy real .toml files into the container."
exit 1 exit 1
fi fi
local toml_count
toml_count=$(compgen -G "${DISINTO_DIR}/projects/*.toml" | wc -l)
log "Projects directory validated: ${toml_count} real .toml file(s) found" log "Projects directory validated: ${toml_count} real .toml file(s) found"
} }

View file

@ -1,35 +0,0 @@
# disinto-chat — minimal HTTP backend for Claude chat UI
#
# Small Debian slim base with Python runtime.
# Chosen for simplicity and small image size (~100MB).
#
# Image size: ~100MB (well under the 200MB ceiling)
#
# The claude binary is mounted from the host at runtime via docker-compose,
# not baked into the image — same pattern as the agents container.
FROM debian:bookworm-slim
# Install Python (no build-time network access needed)
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 \
&& rm -rf /var/lib/apt/lists/*
# Non-root user — fixed UID 10001 for sandbox hardening (#706)
RUN useradd -m -u 10001 -s /bin/bash chat
# Copy application files
COPY server.py /usr/local/bin/server.py
COPY entrypoint-chat.sh /entrypoint-chat.sh
COPY ui/ /var/chat/ui/
RUN chmod +x /entrypoint-chat.sh /usr/local/bin/server.py
USER chat
WORKDIR /var/chat
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')" || exit 1
ENTRYPOINT ["/entrypoint-chat.sh"]

View file

@ -1,37 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
# entrypoint-chat.sh — Start the disinto-chat backend server
#
# Exec-replace pattern: this script is the container entrypoint and runs
# the server directly (no wrapper needed). Logs to stdout for docker logs.
LOGFILE="/tmp/chat.log"
log() {
printf '[%s] %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" | tee -a "$LOGFILE"
}
# Sandbox sanity checks (#706) — fail fast if isolation is broken
if [ -e /var/run/docker.sock ]; then
log "FATAL: /var/run/docker.sock is accessible — sandbox violation"
exit 1
fi
if [ "$(id -u)" = "0" ]; then
log "FATAL: running as root (uid 0) — sandbox violation"
exit 1
fi
# Verify Claude CLI is available (expected via volume mount from host).
if ! command -v claude &>/dev/null; then
log "FATAL: claude CLI not found in PATH"
log "Mount the host binary into the container, e.g.:"
log " volumes:"
log " - /usr/local/bin/claude:/usr/local/bin/claude:ro"
exit 1
fi
log "Claude CLI: $(claude --version 2>&1 || true)"
# Start the Python server (exec-replace so signals propagate correctly)
log "Starting disinto-chat server on port 8080..."
exec python3 /usr/local/bin/server.py

View file

@ -20,9 +20,15 @@ OAuth flow:
6. Redirects to /chat/ 6. Redirects to /chat/
The claude binary is expected to be mounted from the host at /usr/local/bin/claude. The claude binary is expected to be mounted from the host at /usr/local/bin/claude.
Workspace access:
- CHAT_WORKSPACE_DIR environment variable: bind-mounted project working tree
- Claude invocation uses --permission-mode acceptEdits for code modification
- CWD is set to workspace directory when configured, enabling Claude to
inspect, explain, or modify code scoped to that tree only
""" """
import datetime import asyncio
import json import json
import os import os
import re import re
@ -30,21 +36,33 @@ import secrets
import subprocess import subprocess
import sys import sys
import time import time
import threading
from http.server import HTTPServer, BaseHTTPRequestHandler from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
from urllib.parse import urlparse, parse_qs, urlencode from urllib.parse import urlparse, parse_qs, urlencode
import socket
import struct
import base64
import hashlib
# Configuration # Configuration
HOST = os.environ.get("CHAT_HOST", "0.0.0.0") HOST = os.environ.get("CHAT_HOST", "127.0.0.1")
PORT = int(os.environ.get("CHAT_PORT", 8080)) PORT = int(os.environ.get("CHAT_PORT", 8080))
UI_DIR = "/var/chat/ui" UI_DIR = "/var/chat/ui"
STATIC_DIR = os.path.join(UI_DIR, "static") STATIC_DIR = os.path.join(UI_DIR, "static")
CLAUDE_BIN = "/usr/local/bin/claude" CLAUDE_BIN = "/usr/local/bin/claude"
# Workspace directory: bind-mounted project working tree for Claude access
# Defaults to empty; when set, Claude can read/write to this directory
WORKSPACE_DIR = os.environ.get("CHAT_WORKSPACE_DIR", "")
# OAuth configuration # OAuth configuration
FORGE_URL = os.environ.get("FORGE_URL", "http://localhost:3000") FORGE_URL = os.environ.get("FORGE_URL", "http://localhost:3000")
CHAT_OAUTH_CLIENT_ID = os.environ.get("CHAT_OAUTH_CLIENT_ID", "") CHAT_OAUTH_CLIENT_ID = os.environ.get("CHAT_OAUTH_CLIENT_ID", "")
CHAT_OAUTH_CLIENT_SECRET = os.environ.get("CHAT_OAUTH_CLIENT_SECRET", "") CHAT_OAUTH_CLIENT_SECRET = os.environ.get("CHAT_OAUTH_CLIENT_SECRET", "")
EDGE_TUNNEL_FQDN = os.environ.get("EDGE_TUNNEL_FQDN", "") EDGE_TUNNEL_FQDN = os.environ.get("EDGE_TUNNEL_FQDN", "")
EDGE_TUNNEL_FQDN_CHAT = os.environ.get("EDGE_TUNNEL_FQDN_CHAT", "")
EDGE_ROUTING_MODE = os.environ.get("EDGE_ROUTING_MODE", "subpath")
# Shared secret for Caddy forward_auth verify endpoint (#709). # Shared secret for Caddy forward_auth verify endpoint (#709).
# When set, only requests carrying this value in X-Forward-Auth-Secret are # When set, only requests carrying this value in X-Forward-Auth-Secret are
@ -52,10 +70,6 @@ EDGE_TUNNEL_FQDN = os.environ.get("EDGE_TUNNEL_FQDN", "")
# (acceptable during local dev; production MUST set this). # (acceptable during local dev; production MUST set this).
FORWARD_AUTH_SECRET = os.environ.get("FORWARD_AUTH_SECRET", "") FORWARD_AUTH_SECRET = os.environ.get("FORWARD_AUTH_SECRET", "")
# Rate limiting / cost caps (#711)
CHAT_MAX_REQUESTS_PER_HOUR = int(os.environ.get("CHAT_MAX_REQUESTS_PER_HOUR", 60))
CHAT_MAX_REQUESTS_PER_DAY = int(os.environ.get("CHAT_MAX_REQUESTS_PER_DAY", 500))
CHAT_MAX_TOKENS_PER_DAY = int(os.environ.get("CHAT_MAX_TOKENS_PER_DAY", 1000000))
# Allowed users - disinto-admin always allowed; CSV allowlist extends it # Allowed users - disinto-admin always allowed; CSV allowlist extends it
_allowed_csv = os.environ.get("DISINTO_CHAT_ALLOWED_USERS", "") _allowed_csv = os.environ.get("DISINTO_CHAT_ALLOWED_USERS", "")
@ -81,11 +95,10 @@ _sessions = {}
# Pending OAuth state tokens: state -> expires (float) # Pending OAuth state tokens: state -> expires (float)
_oauth_states = {} _oauth_states = {}
# Per-user rate limiting state (#711)
# user -> list of request timestamps (for sliding-window hourly/daily caps) # WebSocket message queues per user
_request_log = {} # user -> asyncio.Queue (for streaming messages to connected clients)
# user -> {"tokens": int, "date": "YYYY-MM-DD"} _websocket_queues = {}
_daily_tokens = {}
# MIME types for static files # MIME types for static files
MIME_TYPES = { MIME_TYPES = {
@ -99,9 +112,22 @@ MIME_TYPES = {
".ico": "image/x-icon", ".ico": "image/x-icon",
} }
# WebSocket subprotocol for chat streaming
WEBSOCKET_SUBPROTOCOL = "chat-stream-v1"
# WebSocket opcodes
OPCODE_CONTINUATION = 0x0
OPCODE_TEXT = 0x1
OPCODE_BINARY = 0x2
OPCODE_CLOSE = 0x8
OPCODE_PING = 0x9
OPCODE_PONG = 0xA
def _build_callback_uri(): def _build_callback_uri():
"""Build the OAuth callback URI based on tunnel configuration.""" """Build the OAuth callback URI based on tunnel configuration."""
if EDGE_ROUTING_MODE == "subdomain" and EDGE_TUNNEL_FQDN_CHAT:
return f"https://{EDGE_TUNNEL_FQDN_CHAT}/oauth/callback"
if EDGE_TUNNEL_FQDN: if EDGE_TUNNEL_FQDN:
return f"https://{EDGE_TUNNEL_FQDN}/chat/oauth/callback" return f"https://{EDGE_TUNNEL_FQDN}/chat/oauth/callback"
return "http://localhost/chat/oauth/callback" return "http://localhost/chat/oauth/callback"
@ -187,69 +213,9 @@ def _fetch_user(access_token):
return None return None
# =============================================================================
# Rate Limiting Functions (#711)
# =============================================================================
def _check_rate_limit(user):
"""Check per-user rate limits. Returns (allowed, retry_after, reason) (#711).
Checks hourly request cap, daily request cap, and daily token cap.
"""
now = time.time()
one_hour_ago = now - 3600
today = datetime.date.today().isoformat()
# Prune old entries from request log
timestamps = _request_log.get(user, [])
timestamps = [t for t in timestamps if t > now - 86400]
_request_log[user] = timestamps
# Hourly request cap
hourly = [t for t in timestamps if t > one_hour_ago]
if len(hourly) >= CHAT_MAX_REQUESTS_PER_HOUR:
oldest_in_window = min(hourly)
retry_after = int(oldest_in_window + 3600 - now) + 1
return False, max(retry_after, 1), "hourly request limit"
# Daily request cap
start_of_day = time.mktime(datetime.date.today().timetuple())
daily = [t for t in timestamps if t >= start_of_day]
if len(daily) >= CHAT_MAX_REQUESTS_PER_DAY:
next_day = start_of_day + 86400
retry_after = int(next_day - now) + 1
return False, max(retry_after, 1), "daily request limit"
# Daily token cap
token_info = _daily_tokens.get(user, {"tokens": 0, "date": today})
if token_info["date"] != today:
token_info = {"tokens": 0, "date": today}
_daily_tokens[user] = token_info
if token_info["tokens"] >= CHAT_MAX_TOKENS_PER_DAY:
next_day = start_of_day + 86400
retry_after = int(next_day - now) + 1
return False, max(retry_after, 1), "daily token limit"
return True, 0, ""
def _record_request(user):
"""Record a request timestamp for the user (#711)."""
_request_log.setdefault(user, []).append(time.time())
def _record_tokens(user, tokens):
"""Record token usage for the user (#711)."""
today = datetime.date.today().isoformat()
token_info = _daily_tokens.get(user, {"tokens": 0, "date": today})
if token_info["date"] != today:
token_info = {"tokens": 0, "date": today}
token_info["tokens"] += tokens
_daily_tokens[user] = token_info
def _parse_stream_json(output): def _parse_stream_json(output):
"""Parse stream-json output from claude --print (#711). """Parse stream-json output from claude --print.
Returns (text_content, total_tokens). Falls back gracefully if the Returns (text_content, total_tokens). Falls back gracefully if the
usage event is absent or malformed. usage event is absent or malformed.
@ -295,6 +261,313 @@ def _parse_stream_json(output):
return "".join(text_parts), total_tokens return "".join(text_parts), total_tokens
# =============================================================================
# WebSocket Handler Class
# =============================================================================
class _WebSocketHandler:
"""Handle WebSocket connections for chat streaming."""
def __init__(self, reader, writer, user, message_queue):
self.reader = reader
self.writer = writer
self.user = user
self.message_queue = message_queue
self.closed = False
async def accept_connection(self, sec_websocket_key, sec_websocket_protocol=None):
"""Accept the WebSocket handshake.
The HTTP request has already been parsed by BaseHTTPRequestHandler,
so we use the provided key and protocol instead of re-reading from socket.
"""
# Validate subprotocol
if sec_websocket_protocol and sec_websocket_protocol != WEBSOCKET_SUBPROTOCOL:
self._send_http_error(
400,
"Bad Request",
f"Unsupported subprotocol. Expected: {WEBSOCKET_SUBPROTOCOL}",
)
self._close_connection()
return False
# Generate accept key
accept_key = self._generate_accept_key(sec_websocket_key)
# Send handshake response
response = (
"HTTP/1.1 101 Switching Protocols\r\n"
"Upgrade: websocket\r\n"
"Connection: Upgrade\r\n"
f"Sec-WebSocket-Accept: {accept_key}\r\n"
)
if sec_websocket_protocol:
response += f"Sec-WebSocket-Protocol: {sec_websocket_protocol}\r\n"
response += "\r\n"
self.writer.write(response.encode("utf-8"))
await self.writer.drain()
return True
def _generate_accept_key(self, sec_key):
"""Generate the Sec-WebSocket-Accept key."""
GUID = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
combined = sec_key + GUID
sha1 = hashlib.sha1(combined.encode("utf-8"))
return base64.b64encode(sha1.digest()).decode("utf-8")
async def _read_line(self):
"""Read a line from the socket."""
data = await self.reader.read(1)
line = ""
while data:
if data == b"\r":
data = await self.reader.read(1)
continue
if data == b"\n":
return line
line += data.decode("utf-8", errors="replace")
data = await self.reader.read(1)
return line
def _send_http_error(self, code, title, message):
"""Send an HTTP error response."""
response = (
f"HTTP/1.1 {code} {title}\r\n"
"Content-Type: text/plain; charset=utf-8\r\n"
"Content-Length: " + str(len(message)) + "\r\n"
"\r\n"
+ message
)
try:
self.writer.write(response.encode("utf-8"))
self.writer.drain()
except Exception:
pass
def _close_connection(self):
"""Close the connection."""
try:
self.writer.close()
except Exception:
pass
async def send_text(self, data):
"""Send a text frame."""
if self.closed:
return
try:
frame = self._encode_frame(OPCODE_TEXT, data.encode("utf-8"))
self.writer.write(frame)
await self.writer.drain()
except Exception as e:
print(f"WebSocket send error: {e}", file=sys.stderr)
async def send_binary(self, data):
"""Send a binary frame."""
if self.closed:
return
try:
if isinstance(data, str):
data = data.encode("utf-8")
frame = self._encode_frame(OPCODE_BINARY, data)
self.writer.write(frame)
await self.writer.drain()
except Exception as e:
print(f"WebSocket send error: {e}", file=sys.stderr)
def _encode_frame(self, opcode, payload):
"""Encode a WebSocket frame."""
frame = bytearray()
frame.append(0x80 | opcode) # FIN + opcode
length = len(payload)
if length < 126:
frame.append(length)
elif length < 65536:
frame.append(126)
frame.extend(struct.pack(">H", length))
else:
frame.append(127)
frame.extend(struct.pack(">Q", length))
frame.extend(payload)
return bytes(frame)
async def _decode_frame(self):
"""Decode a WebSocket frame. Returns (opcode, payload)."""
try:
# Read first two bytes (use readexactly for guaranteed length)
header = await self.reader.readexactly(2)
fin = (header[0] >> 7) & 1
opcode = header[0] & 0x0F
masked = (header[1] >> 7) & 1
length = header[1] & 0x7F
# Extended payload length
if length == 126:
ext = await self.reader.readexactly(2)
length = struct.unpack(">H", ext)[0]
elif length == 127:
ext = await self.reader.readexactly(8)
length = struct.unpack(">Q", ext)[0]
# Masking key
if masked:
mask_key = await self.reader.readexactly(4)
# Payload
payload = await self.reader.readexactly(length)
# Unmask if needed
if masked:
payload = bytes(b ^ mask_key[i % 4] for i, b in enumerate(payload))
return opcode, payload
except Exception as e:
print(f"WebSocket decode error: {e}", file=sys.stderr)
return None, None
async def handle_connection(self):
"""Handle the WebSocket connection loop."""
try:
while not self.closed:
opcode, payload = await self._decode_frame()
if opcode is None:
break
if opcode == OPCODE_CLOSE:
await self._send_close()
break
elif opcode == OPCODE_PING:
await self._send_pong(payload)
elif opcode == OPCODE_PONG:
pass # Ignore pong
elif opcode in (OPCODE_TEXT, OPCODE_BINARY):
# Handle text messages from client (e.g., chat_request)
try:
msg = payload.decode("utf-8")
data = json.loads(msg)
if data.get("type") == "chat_request":
# Invoke Claude with the message
await self._handle_chat_request(data.get("message", ""))
except (json.JSONDecodeError, UnicodeDecodeError):
pass
# Check if we should stop waiting for messages
if self.closed:
break
except Exception as e:
print(f"WebSocket connection error: {e}", file=sys.stderr)
finally:
self._close_connection()
# Clean up the message queue on disconnect
if self.user in _websocket_queues:
del _websocket_queues[self.user]
async def _send_close(self):
"""Send a close frame."""
try:
# Close code 1000 = normal closure
frame = self._encode_frame(OPCODE_CLOSE, struct.pack(">H", 1000))
self.writer.write(frame)
await self.writer.drain()
except Exception:
pass
async def _send_pong(self, payload):
"""Send a pong frame."""
try:
frame = self._encode_frame(OPCODE_PONG, payload)
self.writer.write(frame)
await self.writer.drain()
except Exception:
pass
async def _handle_chat_request(self, message):
"""Handle a chat_request WebSocket frame by invoking Claude."""
if not message:
return
# Validate Claude binary exists
if not os.path.exists(CLAUDE_BIN):
await self.send_text(json.dumps({
"type": "error",
"message": "Claude CLI not found",
}))
return
try:
# Build claude command with permission mode (acceptEdits allows file edits)
claude_args = [CLAUDE_BIN, "--print", "--output-format", "stream-json", "--permission-mode", "acceptEdits", message]
# Spawn claude --print with stream-json for streaming output
# Set cwd to workspace directory if configured, allowing Claude to access project code
cwd = WORKSPACE_DIR if WORKSPACE_DIR else None
proc = subprocess.Popen(
claude_args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
cwd=cwd,
bufsize=1,
)
# Stream output line by line
for line in iter(proc.stdout.readline, ""):
line = line.strip()
if not line:
continue
try:
event = json.loads(line)
etype = event.get("type", "")
# Extract text content from content_block_delta events
if etype == "content_block_delta":
delta = event.get("delta", {})
if delta.get("type") == "text_delta":
text = delta.get("text", "")
if text:
# Send tokens to client
await self.send_text(text)
# Check for usage event to know when complete
if etype == "result":
pass # Will send complete after loop
except json.JSONDecodeError:
pass
# Wait for process to complete
proc.wait()
if proc.returncode != 0:
await self.send_text(json.dumps({
"type": "error",
"message": f"Claude CLI failed with exit code {proc.returncode}",
}))
return
# Send complete signal
await self.send_text(json.dumps({
"type": "complete",
}))
except FileNotFoundError:
await self.send_text(json.dumps({
"type": "error",
"message": "Claude CLI not found",
}))
except Exception as e:
await self.send_text(json.dumps({
"type": "error",
"message": str(e),
}))
# ============================================================================= # =============================================================================
# Conversation History Functions (#710) # Conversation History Functions (#710)
# ============================================================================= # =============================================================================
@ -544,9 +817,9 @@ class ChatHandler(BaseHTTPRequestHandler):
self.serve_static(path) self.serve_static(path)
return return
# Reserved WebSocket endpoint (future use) # WebSocket upgrade endpoint
if path == "/ws" or path.startswith("/ws"): if path == "/chat/ws" or path == "/ws" or path.startswith("/ws"):
self.send_error_page(501, "WebSocket upgrade not yet implemented") self.handle_websocket_upgrade()
return return
# 404 for unknown paths # 404 for unknown paths
@ -736,33 +1009,13 @@ class ChatHandler(BaseHTTPRequestHandler):
except IOError as e: except IOError as e:
self.send_error_page(500, f"Error reading file: {e}") self.send_error_page(500, f"Error reading file: {e}")
def _send_rate_limit_response(self, retry_after, reason):
"""Send a 429 response with Retry-After header and HTMX fragment (#711)."""
body = (
f'<div class="rate-limit-error">'
f"Rate limit exceeded: {reason}. "
f"Please try again in {retry_after} seconds."
f"</div>"
)
self.send_response(429)
self.send_header("Retry-After", str(retry_after))
self.send_header("Content-Type", "text/html; charset=utf-8")
self.send_header("Content-Length", str(len(body.encode("utf-8"))))
self.end_headers()
self.wfile.write(body.encode("utf-8"))
def handle_chat(self, user): def handle_chat(self, user):
""" """
Handle chat requests by spawning `claude --print` with the user message. Handle chat requests by spawning `claude --print` with the user message.
Enforces per-user rate limits and tracks token usage (#711). Streams tokens over WebSocket if connected.
""" """
# Check rate limits before processing (#711)
allowed, retry_after, reason = _check_rate_limit(user)
if not allowed:
self._send_rate_limit_response(retry_after, reason)
return
# Read request body # Read request body
content_length = int(self.headers.get("Content-Length", 0)) content_length = int(self.headers.get("Content-Length", 0))
if content_length == 0: if content_length == 0:
@ -799,23 +1052,63 @@ class ChatHandler(BaseHTTPRequestHandler):
if not conv_id or not _validate_conversation_id(conv_id): if not conv_id or not _validate_conversation_id(conv_id):
conv_id = _generate_conversation_id() conv_id = _generate_conversation_id()
# Record request for rate limiting (#711)
_record_request(user)
try: try:
# Save user message to history # Save user message to history
_write_message(user, conv_id, "user", message) _write_message(user, conv_id, "user", message)
# Build claude command with permission mode (acceptEdits allows file edits)
claude_args = [CLAUDE_BIN, "--print", "--output-format", "stream-json", "--permission-mode", "acceptEdits", message]
# Spawn claude --print with stream-json for token tracking (#711) # Spawn claude --print with stream-json for token tracking (#711)
# Set cwd to workspace directory if configured, allowing Claude to access project code
cwd = WORKSPACE_DIR if WORKSPACE_DIR else None
proc = subprocess.Popen( proc = subprocess.Popen(
[CLAUDE_BIN, "--print", "--output-format", "stream-json", message], claude_args,
stdout=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, stderr=subprocess.PIPE,
text=True, text=True,
cwd=cwd,
bufsize=1, # Line buffered
) )
raw_output = proc.stdout.read() # Stream output line by line
response_parts = []
total_tokens = 0
for line in iter(proc.stdout.readline, ""):
line = line.strip()
if not line:
continue
try:
event = json.loads(line)
etype = event.get("type", "")
# Extract text content from content_block_delta events
if etype == "content_block_delta":
delta = event.get("delta", {})
if delta.get("type") == "text_delta":
text = delta.get("text", "")
if text:
response_parts.append(text)
# Stream to WebSocket if connected
if user in _websocket_queues:
try:
_websocket_queues[user].put_nowait(text)
except Exception:
pass # Client disconnected
# Parse usage from result event
if etype == "result":
usage = event.get("usage", {})
total_tokens = usage.get("input_tokens", 0) + usage.get("output_tokens", 0)
elif "usage" in event:
usage = event["usage"]
if isinstance(usage, dict):
total_tokens = usage.get("input_tokens", 0) + usage.get("output_tokens", 0)
except json.JSONDecodeError:
pass
# Wait for process to complete
error_output = proc.stderr.read() error_output = proc.stderr.read()
if error_output: if error_output:
print(f"Claude stderr: {error_output}", file=sys.stderr) print(f"Claude stderr: {error_output}", file=sys.stderr)
@ -826,20 +1119,12 @@ class ChatHandler(BaseHTTPRequestHandler):
self.send_error_page(500, f"Claude CLI failed with exit code {proc.returncode}") self.send_error_page(500, f"Claude CLI failed with exit code {proc.returncode}")
return return
# Parse stream-json for text and token usage (#711) # Combine response parts
response, total_tokens = _parse_stream_json(raw_output) response = "".join(response_parts)
# Track token usage - does not block *this* request (#711)
if total_tokens > 0:
_record_tokens(user, total_tokens)
print(
f"Token usage: user={user} tokens={total_tokens}",
file=sys.stderr,
)
# Fall back to raw output if stream-json parsing yielded no text # Fall back to raw output if stream-json parsing yielded no text
if not response: if not response:
response = raw_output response = proc.stdout.getvalue() if hasattr(proc.stdout, 'getvalue') else ""
# Save assistant response to history # Save assistant response to history
_write_message(user, conv_id, "assistant", response) _write_message(user, conv_id, "assistant", response)
@ -909,6 +1194,106 @@ class ChatHandler(BaseHTTPRequestHandler):
self.end_headers() self.end_headers()
self.wfile.write(json.dumps({"conversation_id": conv_id}, ensure_ascii=False).encode("utf-8")) self.wfile.write(json.dumps({"conversation_id": conv_id}, ensure_ascii=False).encode("utf-8"))
@staticmethod
def push_to_websocket(user, message):
"""Push a message to a WebSocket connection for a user.
This is called from the chat handler to stream tokens to connected clients.
The message is added to the user's WebSocket message queue.
"""
# Get the message queue from the WebSocket handler's queue
# We store the queue in a global dict keyed by user
if user in _websocket_queues:
_websocket_queues[user].put_nowait(message)
def handle_websocket_upgrade(self):
"""Handle WebSocket upgrade request for chat streaming."""
# Check session cookie
user = _validate_session(self.headers.get("Cookie"))
if not user:
self.send_error_page(401, "Unauthorized: no valid session")
return
# Create message queue for this user
_websocket_queues[user] = asyncio.Queue()
# Get WebSocket upgrade headers from the HTTP request
sec_websocket_key = self.headers.get("Sec-WebSocket-Key", "")
sec_websocket_protocol = self.headers.get("Sec-WebSocket-Protocol", "")
# Validate Sec-WebSocket-Key
if not sec_websocket_key:
self.send_error_page(400, "Bad Request", "Missing Sec-WebSocket-Key")
return
# Get the socket from the connection
sock = self.connection
sock.setblocking(False)
# Create async server to handle the connection
async def handle_ws():
try:
# Wrap the socket in asyncio streams using open_connection
reader, writer = await asyncio.open_connection(sock=sock)
# Create WebSocket handler
ws_handler = _WebSocketHandler(reader, writer, user, _websocket_queues[user])
# Accept the connection (pass headers from HTTP request)
if not await ws_handler.accept_connection(sec_websocket_key, sec_websocket_protocol):
return
# Start a task to read from the queue and send to client
async def send_stream():
while not ws_handler.closed:
try:
data = await asyncio.wait_for(ws_handler.message_queue.get(), timeout=1.0)
await ws_handler.send_text(data)
except asyncio.TimeoutError:
# Send ping to keep connection alive
try:
frame = ws_handler._encode_frame(OPCODE_PING, b"")
writer.write(frame)
await writer.drain()
except Exception:
break
except Exception as e:
print(f"Send stream error: {e}", file=sys.stderr)
break
# Start sending task
send_task = asyncio.create_task(send_stream())
# Handle incoming WebSocket frames
await ws_handler.handle_connection()
# Cancel send task
send_task.cancel()
try:
await send_task
except asyncio.CancelledError:
pass
except Exception as e:
print(f"WebSocket handler error: {e}", file=sys.stderr)
finally:
try:
writer.close()
await writer.wait_closed()
except Exception:
pass
# Run the async handler in a thread
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(handle_ws())
except Exception as e:
print(f"WebSocket error: {e}", file=sys.stderr)
finally:
loop.close()
sock.close()
def do_DELETE(self): def do_DELETE(self):
"""Handle DELETE requests.""" """Handle DELETE requests."""
parsed = urlparse(self.path) parsed = urlparse(self.path)
@ -944,12 +1329,6 @@ def main():
print("forward_auth secret configured (#709)", file=sys.stderr) print("forward_auth secret configured (#709)", file=sys.stderr)
else: else:
print("WARNING: FORWARD_AUTH_SECRET not set - verify endpoint unrestricted", file=sys.stderr) print("WARNING: FORWARD_AUTH_SECRET not set - verify endpoint unrestricted", file=sys.stderr)
print(
f"Rate limits (#711): {CHAT_MAX_REQUESTS_PER_HOUR}/hr, "
f"{CHAT_MAX_REQUESTS_PER_DAY}/day, "
f"{CHAT_MAX_TOKENS_PER_DAY} tokens/day",
file=sys.stderr,
)
httpd.serve_forever() httpd.serve_forever()

View file

@ -430,6 +430,10 @@
return div.innerHTML.replace(/\n/g, '<br>'); return div.innerHTML.replace(/\n/g, '<br>');
} }
// WebSocket connection for streaming
let ws = null;
let wsMessageId = null;
// Send message handler // Send message handler
async function sendMessage() { async function sendMessage() {
const message = textarea.value.trim(); const message = textarea.value.trim();
@ -449,6 +453,14 @@
await createNewConversation(); await createNewConversation();
} }
// Try WebSocket streaming first, fall back to fetch
if (window.location.protocol === 'https:' || window.location.hostname === 'localhost') {
if (tryWebSocketSend(message)) {
return;
}
}
// Fallback to fetch
try { try {
// Use fetch with URLSearchParams for application/x-www-form-urlencoded // Use fetch with URLSearchParams for application/x-www-form-urlencoded
const params = new URLSearchParams(); const params = new URLSearchParams();
@ -485,6 +497,111 @@
} }
} }
// Try to send message via WebSocket streaming
function tryWebSocketSend(message) {
try {
// Generate a unique message ID for this request
wsMessageId = Date.now().toString(36) + Math.random().toString(36).substr(2);
// Connect to WebSocket
const wsUrl = window.location.protocol === 'https:'
? `wss://${window.location.host}/chat/ws`
: `ws://${window.location.host}/chat/ws`;
ws = new WebSocket(wsUrl);
ws.onopen = function() {
// Send the message as JSON with message ID
const data = {
type: 'chat_request',
message_id: wsMessageId,
message: message,
conversation_id: currentConversationId
};
ws.send(JSON.stringify(data));
};
ws.onmessage = function(event) {
try {
const data = JSON.parse(event.data);
if (data.type === 'token') {
// Stream a token to the UI
addTokenToLastMessage(data.token);
} else if (data.type === 'complete') {
// Streaming complete
closeWebSocket();
textarea.disabled = false;
sendBtn.disabled = false;
sendBtn.textContent = 'Send';
textarea.focus();
messagesDiv.scrollTop = messagesDiv.scrollHeight;
loadConversations();
} else if (data.type === 'error') {
addSystemMessage(`Error: ${data.message}`);
closeWebSocket();
textarea.disabled = false;
sendBtn.disabled = false;
sendBtn.textContent = 'Send';
textarea.focus();
}
} catch (e) {
console.error('Failed to parse WebSocket message:', e);
}
};
ws.onerror = function(error) {
console.error('WebSocket error:', error);
addSystemMessage('WebSocket connection error. Falling back to regular chat.');
closeWebSocket();
sendMessage(); // Retry with fetch
};
ws.onclose = function() {
wsMessageId = null;
};
return true; // WebSocket attempt started
} catch (error) {
console.error('Failed to create WebSocket:', error);
return false; // Fall back to fetch
}
}
// Add a token to the last assistant message (for streaming)
function addTokenToLastMessage(token) {
const messages = messagesDiv.querySelectorAll('.message.assistant');
if (messages.length === 0) {
// No assistant message yet, create one
const msgDiv = document.createElement('div');
msgDiv.className = 'message assistant';
msgDiv.innerHTML = `
<div class="role">assistant</div>
<div class="content streaming"></div>
`;
messagesDiv.appendChild(msgDiv);
}
const lastMsg = messagesDiv.querySelector('.message.assistant .content.streaming');
if (lastMsg) {
lastMsg.textContent += token;
messagesDiv.scrollTop = messagesDiv.scrollHeight;
}
}
// Close WebSocket connection
function closeWebSocket() {
if (ws) {
ws.onopen = null;
ws.onmessage = null;
ws.onerror = null;
ws.onclose = null;
ws.close();
ws = null;
}
}
// Event listeners // Event listeners
sendBtn.addEventListener('click', sendMessage); sendBtn.addEventListener('click', sendMessage);

View file

@ -1,6 +1,12 @@
FROM caddy:latest FROM caddy:latest
RUN apk add --no-cache bash jq curl git docker-cli python3 openssh-client autossh RUN apk add --no-cache bash jq curl git docker-cli python3 openssh-client autossh \
COPY entrypoint-edge.sh /usr/local/bin/entrypoint-edge.sh nodejs npm
# Claude Code CLI — chat backend runtime (merged from docker/chat, #1083)
RUN npm install -g @anthropic-ai/claude-code@2.1.84
COPY docker/edge/entrypoint-edge.sh /usr/local/bin/entrypoint-edge.sh
# Chat server and UI (merged from docker/chat into edge, #1083)
COPY docker/chat/server.py /usr/local/bin/chat-server.py
COPY docker/chat/ui/ /var/chat/ui/
VOLUME /data VOLUME /data

View file

@ -560,10 +560,168 @@ _launch_runner_docker() {
# _launch_runner_nomad ACTION_ID SECRETS_CSV MOUNTS_CSV # _launch_runner_nomad ACTION_ID SECRETS_CSV MOUNTS_CSV
# #
# Nomad backend stub — will be implemented in migration Step 5. # Dispatches a vault-runner batch job via `nomad job dispatch`.
# Polls `nomad job status` until terminal state (completed/failed).
# Reads exit code from allocation and writes <action-id>.result.json.
#
# Usage: _launch_runner_nomad <action_id> <secrets_csv> <mounts_csv>
# Returns: exit code of the nomad job (0=success, non-zero=failure)
_launch_runner_nomad() { _launch_runner_nomad() {
echo "nomad backend not yet implemented" >&2 local action_id="$1"
local secrets_csv="$2"
local mounts_csv="$3"
log "Dispatching vault-runner batch job via Nomad for action: ${action_id}"
# Dispatch the parameterized batch job
# The vault-runner job expects meta: action_id, secrets_csv
# Note: mounts_csv is not passed as meta (not declared in vault-runner.hcl)
local dispatch_output
dispatch_output=$(nomad job dispatch \
-detach \
-meta action_id="$action_id" \
-meta secrets_csv="$secrets_csv" \
vault-runner 2>&1) || {
log "ERROR: Failed to dispatch vault-runner job for ${action_id}"
log "Dispatch output: ${dispatch_output}"
write_result "$action_id" 1 "Nomad dispatch failed: ${dispatch_output}"
return 1 return 1
}
# Extract dispatched job ID from output (format: "vault-runner/dispatch-<timestamp>-<uuid>")
local dispatched_job_id
dispatched_job_id=$(echo "$dispatch_output" | grep -oP '(?<=Dispatched Job ID = ).+' || true)
if [ -z "$dispatched_job_id" ]; then
log "ERROR: Could not extract dispatched job ID from nomad output"
log "Dispatch output: ${dispatch_output}"
write_result "$action_id" 1 "Could not extract dispatched job ID from nomad output"
return 1
fi
log "Dispatched vault-runner with job ID: ${dispatched_job_id}"
# Poll job status until terminal state
# Batch jobs transition: running -> completed/failed
local max_wait=300 # 5 minutes max wait
local elapsed=0
local poll_interval=5
local alloc_id=""
log "Polling nomad job status for ${dispatched_job_id}..."
while [ "$elapsed" -lt "$max_wait" ]; do
# Get job status with JSON output for the dispatched child job
local job_status_json
job_status_json=$(nomad job status -json "$dispatched_job_id" 2>/dev/null) || {
log "ERROR: Failed to get job status for ${dispatched_job_id}"
write_result "$action_id" 1 "Failed to get job status for ${dispatched_job_id}"
return 1
}
# Check job status field (transitions to "dead" on completion)
local job_state
job_state=$(echo "$job_status_json" | jq -r '.Status // empty' 2>/dev/null) || job_state=""
# Check allocation state directly
alloc_id=$(echo "$job_status_json" | jq -r '.Allocations[0]?.ID // empty' 2>/dev/null) || alloc_id=""
if [ -n "$alloc_id" ]; then
local alloc_state
alloc_state=$(nomad alloc status -short "$alloc_id" 2>/dev/null || true)
case "$alloc_state" in
*completed*|*success*|*dead*)
log "Allocation ${alloc_id} reached terminal state: ${alloc_state}"
break
;;
*running*|*pending*|*starting*)
log "Allocation ${alloc_id} still running (state: ${alloc_state})..."
;;
*failed*|*crashed*)
log "Allocation ${alloc_id} failed (state: ${alloc_state})"
break
;;
esac
fi
# Also check job-level state
case "$job_state" in
dead)
log "Job ${dispatched_job_id} reached terminal state: ${job_state}"
break
;;
failed)
log "Job ${dispatched_job_id} failed"
break
;;
esac
sleep "$poll_interval"
elapsed=$((elapsed + poll_interval))
done
if [ "$elapsed" -ge "$max_wait" ]; then
log "ERROR: Timeout waiting for vault-runner job to complete"
write_result "$action_id" 1 "Timeout waiting for nomad job to complete"
return 1
fi
# Get final job status and exit code
local final_status_json
final_status_json=$(nomad job status -json "$dispatched_job_id" 2>/dev/null) || {
log "ERROR: Failed to get final job status"
write_result "$action_id" 1 "Failed to get final job status"
return 1
}
# Get allocation exit code
local exit_code=0
local logs=""
if [ -n "$alloc_id" ]; then
# Get allocation logs
logs=$(nomad alloc logs -short "$alloc_id" 2>/dev/null || true)
# Try to get exit code from alloc status JSON
# Nomad alloc status -json has .TaskStates["<task_name>"].Events[].ExitCode
local alloc_exit_code
alloc_exit_code=$(nomad alloc status -json "$alloc_id" 2>/dev/null | jq -r '.TaskStates["runner"].Events[-1].ExitCode // empty' 2>/dev/null) || alloc_exit_code=""
if [ -n "$alloc_exit_code" ] && [ "$alloc_exit_code" != "null" ]; then
exit_code="$alloc_exit_code"
fi
fi
# If we couldn't get exit code from alloc, check job state as fallback
# Note: "dead" = terminal state for batch jobs (includes successful completion)
# Only "failed" indicates actual failure
if [ "$exit_code" -eq 0 ]; then
local final_state
final_state=$(echo "$final_status_json" | jq -r '.Status // empty' 2>/dev/null) || final_state=""
case "$final_state" in
failed)
exit_code=1
;;
esac
fi
# Truncate logs if too long
if [ ${#logs} -gt 1000 ]; then
logs="${logs: -1000}"
fi
# Write result file
write_result "$action_id" "$exit_code" "$logs"
if [ "$exit_code" -eq 0 ]; then
log "Vault-runner job completed successfully for action: ${action_id}"
else
log "Vault-runner job failed for action: ${action_id} (exit code: ${exit_code})"
fi
return "$exit_code"
} }
# Launch runner for the given action (backend-agnostic orchestrator) # Launch runner for the given action (backend-agnostic orchestrator)
@ -1051,11 +1209,8 @@ main() {
# Validate backend selection at startup # Validate backend selection at startup
case "$DISPATCHER_BACKEND" in case "$DISPATCHER_BACKEND" in
docker) ;; docker|nomad)
nomad) log "Using ${DISPATCHER_BACKEND} backend for vault-runner dispatch"
log "ERROR: nomad backend not yet implemented"
echo "nomad backend not yet implemented" >&2
exit 1
;; ;;
*) *)
log "ERROR: unknown DISPATCHER_BACKEND=${DISPATCHER_BACKEND}" log "ERROR: unknown DISPATCHER_BACKEND=${DISPATCHER_BACKEND}"

View file

@ -173,11 +173,15 @@ PROJECT_TOML="${PROJECT_TOML:-projects/disinto.toml}"
sleep 1200 # 20 minutes sleep 1200 # 20 minutes
done) & done) &
# ── Load required secrets from secrets/*.enc (#777) ──────────────────── # ── Load optional secrets from secrets/*.enc (#777) ────────────────────
# Edge container declares its required secrets; missing ones cause a hard fail. # Engagement collection (collect-engagement.sh) requires CADDY_ secrets to
# SCP access logs from a remote edge host. When age key or secrets dir is
# missing, or any secret fails to decrypt, log a warning and skip the cron.
# Caddy itself does not depend on these secrets.
_AGE_KEY_FILE="${HOME}/.config/sops/age/keys.txt" _AGE_KEY_FILE="${HOME}/.config/sops/age/keys.txt"
_SECRETS_DIR="/opt/disinto/secrets" _SECRETS_DIR="/opt/disinto/secrets"
EDGE_REQUIRED_SECRETS="CADDY_SSH_KEY CADDY_SSH_HOST CADDY_SSH_USER CADDY_ACCESS_LOG" EDGE_REQUIRED_SECRETS="CADDY_SSH_KEY CADDY_SSH_HOST CADDY_SSH_USER CADDY_ACCESS_LOG"
EDGE_ENGAGEMENT_READY=0 # Assume not ready until proven otherwise
_edge_decrypt_secret() { _edge_decrypt_secret() {
local enc_path="${_SECRETS_DIR}/${1}.enc" local enc_path="${_SECRETS_DIR}/${1}.enc"
@ -192,22 +196,25 @@ if [ -f "$_AGE_KEY_FILE" ] && [ -d "$_SECRETS_DIR" ]; then
export "$_secret_name=$_val" export "$_secret_name=$_val"
done done
if [ -n "$_missing" ]; then if [ -n "$_missing" ]; then
echo "FATAL: required secrets missing from secrets/*.enc:${_missing}" >&2 echo "WARN: required engagement secrets missing from secrets/*.enc:${_missing}" >&2
echo " Run 'disinto secrets add <NAME>' for each missing secret." >&2 echo " collect-engagement cron will be skipped. Run 'disinto secrets add <NAME>' to enable." >&2
echo " If migrating from .env.vault.enc, run 'disinto secrets migrate-from-vault' first." >&2 EDGE_ENGAGEMENT_READY=0
exit 1 else
echo "edge: loaded required engagement secrets: ${EDGE_REQUIRED_SECRETS}" >&2
EDGE_ENGAGEMENT_READY=1
fi fi
echo "edge: loaded required secrets: ${EDGE_REQUIRED_SECRETS}" >&2
else else
echo "FATAL: age key (${_AGE_KEY_FILE}) or secrets dir (${_SECRETS_DIR}) not found — cannot load required secrets" >&2 echo "WARN: age key (${_AGE_KEY_FILE}) or secrets dir (${_SECRETS_DIR}) not found — engagement secrets unavailable" >&2
echo " Ensure age is installed and secrets/*.enc files are present." >&2 echo " collect-engagement cron will be skipped. Run 'disinto secrets add <NAME>' to enable." >&2
exit 1 EDGE_ENGAGEMENT_READY=0
fi fi
# Start daily engagement collection cron loop in background (#745) # Start daily engagement collection cron loop in background (#745)
# Runs collect-engagement.sh daily at ~23:50 UTC via a sleep loop that # Runs collect-engagement.sh daily at ~23:50 UTC via a sleep loop that
# calculates seconds until the next 23:50 window. SSH key from secrets/*.enc (#777). # calculates seconds until the next 23:50 window. SSH key from secrets/*.enc (#777).
(while true; do # Guarded: only start if EDGE_ENGAGEMENT_READY=1.
if [ "$EDGE_ENGAGEMENT_READY" -eq 1 ]; then
(while true; do
# Calculate seconds until next 23:50 UTC # Calculate seconds until next 23:50 UTC
_now=$(date -u +%s) _now=$(date -u +%s)
_target=$(date -u -d "today 23:50" +%s 2>/dev/null || date -u -d "23:50" +%s 2>/dev/null || echo 0) _target=$(date -u -d "today 23:50" +%s 2>/dev/null || date -u -d "23:50" +%s 2>/dev/null || echo 0)
@ -232,7 +239,20 @@ fi
echo "edge: collect-engagement: fetched log is empty, skipping parse" >&2 echo "edge: collect-engagement: fetched log is empty, skipping parse" >&2
fi fi
rm -f "$_fetch_log" rm -f "$_fetch_log"
done) & done) &
else
echo "edge: collect-engagement cron skipped (EDGE_ENGAGEMENT_READY=0)" >&2
fi
# Start chat server in background (#1083 — merged from docker/chat into edge)
(python3 /usr/local/bin/chat-server.py 2>&1 | tee -a /opt/disinto-logs/chat.log) &
# Nomad template renders Caddyfile to /local/Caddyfile via service discovery;
# copy it into the expected location if present (compose uses the mounted path).
if [ -f /local/Caddyfile ]; then
cp /local/Caddyfile /etc/caddy/Caddyfile
echo "edge: using Nomad-rendered Caddyfile from /local/Caddyfile" >&2
fi
# Caddy as main process — run in foreground via wait so background jobs survive # Caddy as main process — run in foreground via wait so background jobs survive
# (exec replaces the shell, which can orphan backgrounded subshells) # (exec replaces the shell, which can orphan backgrounded subshells)

View file

@ -2,9 +2,12 @@
Local-model agents run the same agent code as the Claude-backed agents, but Local-model agents run the same agent code as the Claude-backed agents, but
connect to a local llama-server (or compatible OpenAI-API endpoint) instead of connect to a local llama-server (or compatible OpenAI-API endpoint) instead of
the Anthropic API. This document describes the current activation flow using the Anthropic API. This document describes the canonical activation flow using
`disinto hire-an-agent` and `[agents.X]` TOML configuration. `disinto hire-an-agent` and `[agents.X]` TOML configuration.
> **Note:** The legacy `ENABLE_LLAMA_AGENT=1` env flag has been removed (#846).
> Activation is now done exclusively via `[agents.X]` sections in project TOML.
## Overview ## Overview
Local-model agents are configured via `[agents.<name>]` sections in Local-model agents are configured via `[agents.<name>]` sections in

View file

@ -0,0 +1,183 @@
# Nomad Cutover Runbook
End-to-end procedure to cut over the disinto factory from docker-compose on
disinto-dev-box to Nomad on disinto-nomad-box.
**Target**: disinto-nomad-box (10.10.10.216) becomes production; disinto-dev-box
stays warm for rollback.
**Downtime budget**: <5 min blue-green flip.
**Data scope**: Forgejo issues + disinto-ops git bundle only. Everything else is
regenerated or discarded. OAuth secrets are regenerated on fresh init (all
sessions invalidated).
---
## 1. Pre-cutover readiness checklist
- [ ] Nomad + Vault stack healthy on a fresh wipe+init (step 5 verified)
- [ ] Codeberg mirror current — `git log` parity between dev-box Forgejo and
Codeberg
- [ ] SSH key pair generated for nomad-box, registered on DO edge (see §4.6)
- [ ] Companion tools landed:
- `disinto backup create` (#1057)
- `disinto backup import` (#1058)
- [ ] Backup tarball produced and tested against a scratch LXC (see §3)
---
## 2. Pre-cutover artifact: backup
On disinto-dev-box:
```bash
./bin/disinto backup create /tmp/disinto-backup-$(date +%Y%m%d).tar.gz
```
Copy the tarball to nomad-box (and optionally to a local workstation for
safekeeping):
```bash
scp /tmp/disinto-backup-*.tar.gz nomad-box:/tmp/
```
---
## 3. Pre-cutover dry-run
On a throwaway LXC:
```bash
lxc launch ubuntu:24.04 cutover-dryrun
# inside the container:
disinto init --backend=nomad --import-env .env --with edge
./bin/disinto backup import /tmp/disinto-backup-*.tar.gz
```
Verify:
- Issue count matches source Forgejo
- disinto-ops repo refs match source bundle
Destroy the LXC once satisfied:
```bash
lxc delete cutover-dryrun --force
```
---
## 4. Cutover T-0 (operator executes; <5 min target)
### 4.1 Stop dev-box services
```bash
# On disinto-dev-box — stop, do NOT remove volumes (rollback needs them)
docker-compose stop
```
### 4.2 Provision nomad-box (if not already done)
```bash
# On disinto-nomad-box
disinto init --backend=nomad --import-env .env --with edge
```
### 4.3 Import backup
```bash
# On disinto-nomad-box
./bin/disinto backup import /tmp/disinto-backup-*.tar.gz
```
### 4.4 Configure Codeberg pull mirror
Manual, one-time step in the new Forgejo UI:
1. Create a mirror repository pointing at the Codeberg upstream
2. Confirm initial sync completes
### 4.5 Claude login
```bash
# On disinto-nomad-box
claude login
```
Set up Anthropic OAuth so agents can authenticate.
### 4.6 Autossh tunnel swap
> **Operator step** — cross-host, no dev-agent involvement. Do NOT automate.
1. Stop the tunnel on dev-box:
```bash
# On disinto-dev-box
systemctl stop reverse-tunnel
```
2. Copy or regenerate the tunnel unit on nomad-box:
```bash
# Copy from dev-box, or let init regenerate it
scp dev-box:/etc/systemd/system/reverse-tunnel.service \
nomad-box:/etc/systemd/system/
```
3. Register nomad-box's public key on DO edge:
```bash
# On DO edge box — same restricted-command as the dev-box key
echo "<nomad-box-pubkey>" >> /home/johba/.ssh/authorized_keys
```
4. Start the tunnel on nomad-box:
```bash
# On disinto-nomad-box
systemctl enable --now reverse-tunnel
```
5. Verify end-to-end:
```bash
curl https://self.disinto.ai/api/v1/version
# Should return the new box's Forgejo version
```
---
## 5. Post-cutover smoke
- [ ] `curl https://self.disinto.ai` → Forgejo welcome page
- [ ] Create a test PR → Woodpecker pipeline runs → agents assign and work
- [ ] Claude chat login via Forgejo OAuth succeeds
---
## 6. Rollback (if any step 4 gate fails)
1. Stop the tunnel on nomad-box:
```bash
systemctl stop reverse-tunnel # on nomad-box
```
2. Restore the tunnel on dev-box:
```bash
systemctl start reverse-tunnel # on dev-box
```
3. Bring dev-box services back up:
```bash
docker-compose up -d # on dev-box
```
4. DO Caddy config is unchanged — traffic restores in <5 min.
5. File a post-mortem issue. Keep nomad-box state intact for debugging.
---
## 7. Post-stable cleanup (T+1 week)
- `docker-compose down -v` on dev-box
- Archive `/var/lib/docker/volumes/disinto_*` to cold storage
- Delete disinto-dev-box LXC or keep as permanent rollback reserve (operator
decision)

124
docs/nomad-migration.md Normal file
View file

@ -0,0 +1,124 @@
<!-- last-reviewed: (new file, S2.5 #883) -->
# Nomad+Vault migration — cutover-day runbook
`disinto init --backend=nomad` is the single entry-point that turns a fresh
LXC (with the disinto repo cloned) into a running Nomad+Vault cluster with
policies applied, JWT workload-identity auth configured, secrets imported
from the old docker stack, and services deployed.
## Cutover-day invocation
On the new LXC, as root (or an operator with NOPASSWD sudo):
```bash
# Copy the plaintext .env + sops-encrypted .env.vault.enc + age keyfile
# from the old box first (out of band — SSH, USB, whatever your ops
# procedure allows). Then:
sudo ./bin/disinto init \
--backend=nomad \
--import-env /tmp/.env \
--import-sops /tmp/.env.vault.enc \
--age-key /tmp/keys.txt \
--with forgejo
```
This runs, in order:
1. **`lib/init/nomad/cluster-up.sh`** (S0) — installs Nomad + Vault
binaries, writes `/etc/nomad.d/*`, initializes Vault, starts both
services, waits for the Nomad node to become ready.
2. **`tools/vault-apply-policies.sh`** (S2.1) — syncs every
`vault/policies/*.hcl` into Vault as an ACL policy. Idempotent.
3. **`lib/init/nomad/vault-nomad-auth.sh`** (S2.3) — enables Vault's
JWT auth method at `jwt-nomad`, points it at Nomad's JWKS, writes
one role per policy, reloads Nomad so jobs can exchange
workload-identity tokens for Vault tokens. Idempotent.
4. **`tools/vault-import.sh`** (S2.2) — reads `/tmp/.env` and the
sops-decrypted `/tmp/.env.vault.enc`, writes them to the KV paths
matching the S2.1 policy layout (`kv/disinto/bots/*`, `kv/disinto/shared/*`,
`kv/disinto/runner/*`). Idempotent (overwrites KV v2 data in place).
5. **`lib/init/nomad/deploy.sh forgejo`** (S1) — validates + runs the
`nomad/jobs/forgejo.hcl` jobspec. Forgejo reads its admin creds from
Vault via the `template` stanza (S2.4).
## Flag summary
| Flag | Meaning |
|---|---|
| `--backend=nomad` | Switch the init dispatcher to the Nomad+Vault path (instead of docker compose). |
| `--empty` | Bring the cluster up, skip policies/auth/import/deploy. Escape hatch for debugging. |
| `--with forgejo[,…]` | Deploy these services after the cluster is up. |
| `--import-env PATH` | Plaintext `.env` from the old stack. Optional. |
| `--import-sops PATH` | Sops-encrypted `.env.vault.enc` from the old stack. Requires `--age-key`. |
| `--age-key PATH` | Age keyfile used to decrypt `--import-sops`. Requires `--import-sops`. |
| `--dry-run` | Print the full plan (cluster-up + policies + auth + import + deploy) and exit. Touches nothing. |
### Flag validation
- `--import-sops` without `--age-key` → error.
- `--age-key` without `--import-sops` → error.
- `--import-env` alone (no sops) → OK (imports just the plaintext `.env`).
- `--backend=docker` with any `--import-*` flag → error.
- `--empty` with any `--import-*` flag → error (mutually exclusive: `--empty`
skips the import step, so pairing them silently discards the import
intent).
## Idempotency
Every layer is idempotent by design. Re-running the same command on an
already-provisioned box is a no-op at every step:
- **Cluster-up:** second run detects running `nomad`/`vault` systemd
units and state files, skips re-init.
- **Policies:** byte-for-byte compare against on-server policy text;
"unchanged" for every untouched file.
- **Auth:** skips auth-method create if `jwt-nomad/` already enabled,
skips config write if the JWKS + algs match, skips server.hcl write if
the file on disk is identical to the repo copy.
- **Import:** KV v2 writes overwrite in place (same path, same keys,
same values → no new version).
- **Deploy:** `nomad job run` is declarative; same jobspec → no new
allocation.
## Dry-run
```bash
./bin/disinto init --backend=nomad \
--import-env /tmp/.env \
--import-sops /tmp/.env.vault.enc \
--age-key /tmp/keys.txt \
--with forgejo \
--dry-run
```
Prints the five-section plan — cluster-up, policies, auth, import,
deploy — with every path and every argv that would be executed. No
network, no sudo, no state mutation. See
`tests/disinto-init-nomad.bats` for the exact output shape.
## No-import path
If you already have `kv/disinto/*` seeded by other means (manual
`vault kv put`, a replica, etc.), omit all three `--import-*` flags.
`disinto init --backend=nomad --with forgejo` still applies policies,
configures auth, and deploys — but skips the import step with:
```
[import] no --import-env/--import-sops — skipping; set them or seed kv/disinto/* manually before deploying secret-dependent services
```
Forgejo's template stanza will fail to render (and thus the allocation
will stall) until those KV paths exist — so either import them or seed
them first.
## Secret hygiene
- Never log a secret value. The CLI only prints paths (`--import-env`,
`--age-key`) and KV *paths* (`kv/disinto/bots/review/token`), never
the values themselves. `tools/vault-import.sh` is the only thing that
reads the values, and it pipes them directly into Vault's HTTP API.
- The age keyfile must be mode 0400 — `vault-import.sh` refuses to
source a keyfile with looser permissions.
- `VAULT_ADDR` must be localhost during import — the import tool
refuses to run against a remote Vault, preventing accidental exposure.

View file

@ -178,8 +178,8 @@ log "Tagged disinto/agents:${RELEASE_VERSION}"
log "Step 6/6: Restarting agent containers" log "Step 6/6: Restarting agent containers"
docker compose stop agents agents-llama 2>/dev/null || true docker compose stop agents 2>/dev/null || true
docker compose up -d agents agents-llama docker compose up -d agents
log "Agent containers restarted" log "Agent containers restarted"
# ── Done ───────────────────────────────────────────────────────────────── # ── Done ─────────────────────────────────────────────────────────────────

View file

@ -189,10 +189,10 @@ Restart agent containers to use the new image.
- docker compose pull agents - docker compose pull agents
2. Stop and remove existing agent containers: 2. Stop and remove existing agent containers:
- docker compose down agents agents-llama 2>/dev/null || true - docker compose down agents
3. Start agents with new image: 3. Start agents with new image:
- docker compose up -d agents agents-llama - docker compose up -d agents
4. Wait for containers to be healthy: 4. Wait for containers to be healthy:
- for i in {1..30}; do - for i in {1..30}; do
@ -203,7 +203,7 @@ Restart agent containers to use the new image.
- done - done
5. Verify containers are running: 5. Verify containers are running:
- docker compose ps agents agents-llama - docker compose ps agents
6. Log restart: 6. Log restart:
- echo "Restarted agents containers" - echo "Restarted agents containers"

View file

@ -29,7 +29,7 @@ and injected into your prompt above. Review them now.
1. Read the injected metrics data carefully (System Resources, Docker, 1. Read the injected metrics data carefully (System Resources, Docker,
Active Sessions, Phase Files, Stale Phase Cleanup, Lock Files, Agent Logs, Active Sessions, Phase Files, Stale Phase Cleanup, Lock Files, Agent Logs,
CI Pipelines, Open PRs, Issue Status, Stale Worktrees). CI Pipelines, Open PRs, Issue Status, Stale Worktrees, **Woodpecker Agent Health**).
Note: preflight.sh auto-removes PHASE:escalate files for closed issues Note: preflight.sh auto-removes PHASE:escalate files for closed issues
(24h grace period). Check the "Stale Phase Cleanup" section for any (24h grace period). Check the "Stale Phase Cleanup" section for any
files cleaned or in grace period this run. files cleaned or in grace period this run.
@ -75,6 +75,10 @@ Categorize every finding from the metrics into priority levels.
- Dev/action sessions in PHASE:escalate for > 24h (session timeout) - Dev/action sessions in PHASE:escalate for > 24h (session timeout)
(Note: PHASE:escalate files for closed issues are auto-cleaned by preflight; (Note: PHASE:escalate files for closed issues are auto-cleaned by preflight;
this check covers sessions where the issue is still open) this check covers sessions where the issue is still open)
- **Woodpecker agent unhealthy** see "Woodpecker Agent Health" section in preflight:
- Container not running or in unhealthy state
- gRPC errors >= 3 in last 20 minutes
- Fast-failure pipelines (duration < 60s) >= 3 in last 15 minutes
### P3 — Factory degraded ### P3 — Factory degraded
- PRs stale: CI finished >20min ago AND no git push to the PR branch since CI completed - PRs stale: CI finished >20min ago AND no git push to the PR branch since CI completed
@ -100,6 +104,15 @@ For each finding from the health assessment, decide and execute an action.
### Auto-fixable (execute these directly) ### Auto-fixable (execute these directly)
**P2 Woodpecker agent unhealthy:**
The supervisor-run.sh script automatically handles WP agent recovery:
- Detects unhealthy state via preflight.sh health checks
- Restarts container via `docker restart`
- Scans for `blocked: ci_exhausted` issues updated in last 30 minutes
- Unassigns and removes blocked label from affected issues
- Posts recovery comment with infra-flake context
- Avoids duplicate restarts via 5-minute cooldown in history file
**P0 Memory crisis:** **P0 Memory crisis:**
# Kill stale one-shot claude processes (>3h old) # Kill stale one-shot claude processes (>3h old)
pgrep -f "claude -p" --older 10800 2>/dev/null | xargs kill 2>/dev/null || true pgrep -f "claude -p" --older 10800 2>/dev/null | xargs kill 2>/dev/null || true
@ -248,6 +261,11 @@ Format:
- <what was fixed> - <what was fixed>
(or "No actions needed") (or "No actions needed")
### WP Agent Recovery (if applicable)
- WP agent restart: <time of restart or "none">
- Issues recovered: <count>
- Reason: <health check reason or "healthy">
### Vault items filed ### Vault items filed
- vault/pending/<id>.md <reason> - vault/pending/<id>.md <reason>
(or "None") (or "None")

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Gardener Agent # Gardener Agent
**Role**: Backlog grooming — detect duplicate issues, missing acceptance **Role**: Backlog grooming — detect duplicate issues, missing acceptance

0
gardener/dust.jsonl Normal file
View file

View file

@ -1,7 +1,16 @@
[ [
{ {
"action": "edit_body", "action": "comment",
"issue": 835, "issue": 1103,
"body": "Bugfix for S0.1 (#821). Discovered during Step 0 end-to-end verification on a fresh LXC.\n\n## Symptom\n\n```\n$ ./bin/disinto init --backend=nomad --empty\nError: --empty is only valid with --backend=nomad\n```\n\nThe error is nonsensical — `--backend=nomad` is right there.\n\n## Root cause\n\n`bin/disinto` → `disinto_init` (around line 710) consumes the first positional arg as `repo_url` **before** the argparse `while` loop runs:\n\n```bash\ndisinto_init() {\n local repo_url=\"${1:-}\"\n if [ -z \"$repo_url\" ]; then\n echo \"Error: repo URL required\" >&2\n ...\n fi\n shift\n # ... then while-loop parses flags ...\n}\n```\n\nSo `disinto init --backend=nomad --empty` becomes:\n- `repo_url = \"--backend=nomad\"` (swallowed)\n- `--empty` seen by loop → `empty=true`\n- `backend` stays at default `\"docker\"`\n- Validation at line 747: `empty=true && backend != \"nomad\"` → error\n\n## Why repo_url is wrong for nomad\n\nFor `--backend=nomad`, the cluster-up flow doesn't clone anything — the LXC already has the repo cloned by the operator. `repo_url` is a docker-backend concept.\n\n## Fix\n\nIn `disinto_init`, move backend detection to **before** the `repo_url` consumption, and make `repo_url` conditional on `backend=docker`:\n\n```bash\ndisinto_init() {\n # Pre-scan for --backend to know whether repo_url is required\n local backend=\"docker\"\n for arg in \"$@\"; do\n case \"$arg\" in\n --backend) ;; # handled below\n --backend=*) backend=\"${arg#--backend=}\" ;;\n esac\n done\n # Also handle space-separated form\n local i=1\n while [ $i -le $# ]; do\n if [ \"${!i}\" = \"--backend\" ]; then\n i=$((i+1))\n backend=\"${!i}\"\n fi\n i=$((i+1))\n done\n\n local repo_url=\"\"\n if [ \"$backend\" = \"docker\" ]; then\n repo_url=\"${1:-}\"\n if [ -z \"$repo_url\" ] || [[ \"$repo_url\" == --* ]]; then\n echo \"Error: repo URL required for docker backend\" >&2\n echo \"Usage: disinto init <repo-url> [options]\" >&2\n exit 1\n fi\n shift\n fi\n # ... rest of argparse unchanged, it re-reads --backend cleanly\n```\n\nSimpler alternative: if first arg starts with `--`, assume no positional and skip repo_url consumption entirely (covers nomad + any future `--help`-style invocation).\n\nEither shape is fine; pick the cleaner one.\n\n## Acceptance criteria\n\n- [ ] `./bin/disinto init --backend=nomad --empty` runs `lib/init/nomad/cluster-up.sh` without error on a clean LXC.\n- [ ] `./bin/disinto init --backend=nomad --empty --dry-run` prints the 9-step plan and exits 0.\n- [ ] `./bin/disinto init <repo-url>` (docker path) behaves identically to today — existing smoke path passes.\n- [ ] `./bin/disinto init` (no args, docker implied) still errors with the \"repo URL required\" message.\n- [ ] `./bin/disinto init --backend=docker` (no repo) errors helpfully — not \"Unknown option: --backend=docker\".\n- [ ] shellcheck clean.\n\n## Verified regression case from Step 0 testing\n\nOn a fresh Ubuntu 24.04 LXC, after `./lib/init/nomad/cluster-up.sh` was invoked directly (workaround), the cluster came up healthy end-to-end:\n\n- Nomad node status: 1 node ready\n- Vault status: Sealed=false, Initialized=true\n- Re-run of cluster-up.sh was fully idempotent\n\nSo the bug is isolated to `bin/disinto` argparse; the rest of the Step 0 code path is solid. This fix unblocks the formal Step 0 acceptance test.\n\n## Labels / meta\n\n- `[nomad-step-0] S0.1-fix` — no dependencies; gates Step 1.\n\n## Affected files\n\n- `bin/disinto` — `disinto_init()` function, around line 710: pre-scan for `--backend` before consuming `repo_url` positional argument\n" "body": "### CI deadlock — root cause identified\n\nPR #1104 has been failing CI (exit 128 on git clone) due to a circular dependency:\n\n1. **#1080** set `FORGEJO__server__ROOT_URL=http://forgejo:3000/forge/`\n2. Forgejo now includes `/forge/` in the `clone_url` field of webhook payloads to Woodpecker\n3. Woodpecker uses that URL: `http://token@forgejo:3000/forge/disinto-admin/disinto.git`\n4. Caddy forwards `/forge/…` → `forgejo:3000/forge/…` but **without** `uri strip_prefix /forge`\n5. Forgejo's router returns 404 → git exits 128\n6. **This is the exact bug PR #1104 fixes** — the fix can't merge because CI can't clone to run the fix\n\nCaught from actual CI log:\n```\nfatal: repository '***/forge/disinto-admin/disinto.git/' not found\n```\n\nThe code change in PR #1104 (`lib/generators.sh`, `nomad/jobs/edge.hcl`) is correct and minimal. CI failure is environmental, not a code issue.\n\n**Resolution options:**\n- A: Operator manually merges PR #1104 (code is verified correct, 2-line change, same pattern as already-merged #1079)\n- B: Operator applies `uri strip_prefix /forge` live to the edge Caddy config to break the deadlock, then CI will pass on the next push\n\nFiled vault item for option A."
},
{
"action": "create_issue",
"title": "ops: manually merge PR #1104 — CI deadlock (Caddy /forge/ strip_prefix fix can't pass CI it fixes)",
"body": "## Situation\n\nPR #1104 fixes `lib/generators.sh` to add `uri strip_prefix /forge` before `reverse_proxy forgejo:3000` in the `/forge/*` Caddy handle block. This is required because Forgejo returns 404 for `/forge/*` paths without the prefix being stripped.\n\n## Circular dependency\n\nCI cannot clone the repo to run this PR because:\n\n1. `#1080` set `FORGEJO__server__ROOT_URL=http://forgejo:3000/forge/`\n2. Forgejo includes `/forge/` in the `clone_url` in webhook payloads → Woodpecker uses `http://token@forgejo:3000/forge/disinto-admin/disinto.git`\n3. Caddy forwards `/forge/...` to Forgejo **without** strip_prefix → Forgejo router returns 404\n4. `git clone` exits 128\n5. CI has failed on 4 consecutive runs (pipelines 15921595)\n\nVerified from actual CI log:\n```\nfatal: repository '***/forge/disinto-admin/disinto.git/' not found\n```\n\n## What needs to happen\n\nOne of:\n- **Option A (recommended):** Admin merges PR #1104 manually, bypassing CI. The code change is verified correct — identical pattern to already-merged `/staging/` strip (#1079). Once merged, the Caddy config regenerates and future PR clones work.\n- **Option B:** Operator applies `uri strip_prefix /forge` live to the edge Caddy config to break the deadlock, then pushes an empty commit to trigger CI retry.\n\n## Acceptance criteria\n\n- [ ] PR #1104 merged (manually or after CI passes post live-patch)\n- [ ] `curl http://localhost/forge/` returns 200\n- [ ] CI clone succeeds for subsequent PRs\n\n## Affected files\n\n- `lib/generators.sh` — the fix\n- PR #1104 on this repo",
"labels": [
"priority",
"backlog"
]
} }
] ]

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Shared Helpers (`lib/`) # Shared Helpers (`lib/`)
All agents source `lib/env.sh` as their first action. Additional helpers are All agents source `lib/env.sh` as their first action. Additional helpers are
@ -7,7 +7,7 @@ sourced as needed.
| File | What it provides | Sourced by | | File | What it provides | Sourced by |
|---|---|---| |---|---|---|
| `lib/env.sh` | Loads `.env`, sets `FACTORY_ROOT`, exports project config (`FORGE_REPO`, `PROJECT_NAME`, etc.), defines `log()`, `forge_api()`, `forge_api_all()` (paginates all pages; accepts optional second TOKEN parameter, defaults to `$FORGE_TOKEN`; handles invalid/empty JSON responses gracefully — returns empty on parse error instead of crashing), `woodpecker_api()`, `wpdb()`, `memory_guard()` (skips agent if RAM < threshold), `load_secret()` (secret-source abstraction see below). Auto-loads project TOML if `PROJECT_TOML` is set. Exports per-agent tokens (`FORGE_PLANNER_TOKEN`, `FORGE_GARDENER_TOKEN`, `FORGE_VAULT_TOKEN`, `FORGE_SUPERVISOR_TOKEN`, `FORGE_PREDICTOR_TOKEN`) each falls back to `$FORGE_TOKEN` if not set. **Vault-only token guard (AD-006)**: `unset GITHUB_TOKEN CLAWHUB_TOKEN` so agents never hold external-action tokens only the runner container receives them. **Container note**: when `DISINTO_CONTAINER=1`, `.env` is NOT re-sourced compose already injects env vars (including `FORGE_URL=http://forgejo:3000`) and re-sourcing would clobber them. **Save/restore scope (#364)**: only `FORGE_URL` is preserved across `.env` re-sourcing (compose injects `http://forgejo:3000`, `.env` has `http://localhost:3000`). `FORGE_TOKEN` is NOT preserved so refreshed tokens in `.env` take effect immediately. **Per-agent token override (#762)**: agent run scripts export `FORGE_TOKEN_OVERRIDE=<agent-specific-token>` BEFORE sourcing `env.sh`; `env.sh` applies this override at lines 98-100, ensuring the correct identity survives any re-sourcing of `env.sh` by nested shells or `claude -p` invocations. **Required env var**: `FORGE_PASS` bot password for git HTTP push (Forgejo 11.x rejects API tokens for `git push`, #361). **Hard preconditions (#674)**: `USER` and `HOME` must be exported by the entrypoint before sourcing. When `PROJECT_TOML` is set, `PROJECT_REPO_ROOT`, `PRIMARY_BRANCH`, and `OPS_REPO_ROOT` must also be set (by entrypoint or TOML). **`load_secret NAME [DEFAULT]` (#793)**: backend-agnostic secret resolution. Precedence: (1) `/secrets/<NAME>.env` Nomad-rendered template, (2) current environment already set by `.env.enc` / compose, (3) `secrets/<NAME>.enc` age-encrypted per-key file (decrypted on demand, cached in process env), (4) DEFAULT or empty. Consumers call `$(load_secret GITHUB_TOKEN)` instead of `${GITHUB_TOKEN}` identical behavior whether secrets come from Docker compose injection or Nomad Vault templates. | Every agent | | `lib/env.sh` | Loads `.env`, sets `FACTORY_ROOT`, exports project config (`FORGE_REPO`, `PROJECT_NAME`, etc.), defines `log()`, `forge_api()`, `forge_api_all()` (paginates all pages; accepts optional second TOKEN parameter, defaults to `$FORGE_TOKEN`; handles invalid/empty JSON responses gracefully — returns empty on parse error instead of crashing), `woodpecker_api()`, `wpdb()`, `memory_guard()` (skips agent if RAM < threshold), `load_secret()` (secret-source abstraction see below). Auto-loads project TOML if `PROJECT_TOML` is set. Exports per-agent tokens (`FORGE_PLANNER_TOKEN`, `FORGE_GARDENER_TOKEN`, `FORGE_VAULT_TOKEN`, `FORGE_SUPERVISOR_TOKEN`, `FORGE_PREDICTOR_TOKEN`) each falls back to `$FORGE_TOKEN` if not set. **Vault-only token guard (AD-006)**: `unset GITHUB_TOKEN CLAWHUB_TOKEN` so agents never hold external-action tokens only the runner container receives them. **Container note**: when `DISINTO_CONTAINER=1`, `.env` is NOT re-sourced compose already injects env vars (including `FORGE_URL=http://forgejo:3000`) and re-sourcing would clobber them. **Save/restore scope (#364)**: only `FORGE_URL` is preserved across `.env` re-sourcing (compose injects `http://forgejo:3000`, `.env` has `http://localhost:3000`). `FORGE_TOKEN` is NOT preserved so refreshed tokens in `.env` take effect immediately. **Per-agent token override (#762)**: agent run scripts export `FORGE_TOKEN_OVERRIDE=<agent-specific-token>` BEFORE sourcing `env.sh`; `env.sh` applies this override at lines 98-100, ensuring the correct identity survives any re-sourcing of `env.sh` by nested shells or `claude -p` invocations. **Required env var**: `FORGE_PASS` bot password for git HTTP push (Forgejo 11.x rejects API tokens for `git push`, #361). **Hard preconditions (#674)**: `USER` and `HOME` must be exported by the entrypoint before sourcing. When `PROJECT_TOML` is set, `PROJECT_REPO_ROOT`, `PRIMARY_BRANCH`, and `OPS_REPO_ROOT` must also be set (by entrypoint or TOML). **`load_secret NAME [DEFAULT]` (#793)**: backend-agnostic secret resolution. Precedence: (1) `/secrets/<NAME>.env` Nomad-rendered template, (2) current environment already set by `.env.enc` / compose, (3) `secrets/<NAME>.enc` age-encrypted per-key file (decrypted on demand, cached in process env), (4) DEFAULT or empty. Consumers call `$(load_secret GITHUB_TOKEN)` instead of `${GITHUB_TOKEN}` identical behavior whether secrets come from Docker compose injection or Nomad Vault templates. | Every agent |
| `lib/ci-helpers.sh` | `ci_passed()` — returns 0 if CI state is "success" (or no CI configured). `ci_required_for_pr()` — returns 0 if PR has code files (CI required), 1 if non-code only (CI not required). `is_infra_step()` — returns 0 if a single CI step failure matches infra heuristics (clone/git exit 128, any exit 137, log timeout patterns). `classify_pipeline_failure()` — returns "infra \<reason>" if any failed Woodpecker step matches infra heuristics via `is_infra_step()`, else "code". `ensure_priority_label()` — looks up (or creates) the `priority` label and returns its ID; caches in `_PRIORITY_LABEL_ID`. `ci_commit_status <sha>` — queries Woodpecker directly for CI state, falls back to forge commit status API. `ci_pipeline_number <sha>` — returns the Woodpecker pipeline number for a commit, falls back to parsing forge status `target_url`. `ci_promote <repo_id> <pipeline_num> <environment>` — promotes a pipeline to a named Woodpecker environment (vault-gated deployment: vault approves, vault-fire calls this — vault redesign in progress, see #73-#77). `ci_get_logs <pipeline_number> [--step <name>]` — reads CI logs from Woodpecker SQLite database via `lib/ci-log-reader.py`; outputs last 200 lines to stdout. Requires mounted woodpecker-data volume at /woodpecker-data. | dev-poll, review-poll, review-pr | | `lib/ci-helpers.sh` | `ci_passed()` — returns 0 if CI state is "success" (or no CI configured). `ci_required_for_pr()` — returns 0 if PR has code files (CI required), 1 if non-code only (CI not required). `is_infra_step()` — returns 0 if a single CI step failure matches infra heuristics (clone/git exit 128, any exit 137, log timeout patterns). `classify_pipeline_failure()` — returns "infra \<reason>" if any failed Woodpecker step matches infra heuristics via `is_infra_step()`, else "code". `ensure_priority_label()` — looks up (or creates) the `priority` label and returns its ID; caches in `_PRIORITY_LABEL_ID`. `ci_commit_status <sha>` — queries Woodpecker directly for CI state, falls back to forge commit status API. `ci_pipeline_number <sha>` — returns the Woodpecker pipeline number for a commit, falls back to parsing forge status `target_url`. `ci_promote <repo_id> <pipeline_num> <environment>` — promotes a pipeline to a named Woodpecker environment (vault-gated deployment: vault approves, vault-fire calls this — vault redesign in progress, see #73-#77). `ci_get_logs <pipeline_number> [--step <name>]` — reads CI logs from Woodpecker SQLite database via `lib/ci-log-reader.py`; outputs last 200 lines to stdout. Requires mounted woodpecker-data volume at /woodpecker-data. `ci_get_step_logs <pipeline_num> <step_id>` — fetches per-step logs via Woodpecker REST API (`/repos/{id}/logs/{pipeline}/{step_id}`); returns raw log data for a single step. Used by `pr_poll_ci()` to build per-workflow/per-step CI diagnostics (#1051). | dev-poll, review-poll, review-pr |
| `lib/ci-debug.sh` | CLI tool for Woodpecker CI: `list`, `status`, `logs`, `failures` subcommands. Not sourced — run directly. | Humans / dev-agent (tool access) | | `lib/ci-debug.sh` | CLI tool for Woodpecker CI: `list`, `status`, `logs`, `failures` subcommands. Not sourced — run directly. | Humans / dev-agent (tool access) |
| `lib/ci-log-reader.py` | Python tool: reads CI logs from Woodpecker SQLite database. `<pipeline_number> [--step <name>]` — returns last 200 lines from failed steps (or specified step). Used by `ci_get_logs()` in ci-helpers.sh. Requires `WOODPECKER_DATA_DIR` (default: /woodpecker-data). | ci-helpers.sh | | `lib/ci-log-reader.py` | Python tool: reads CI logs from Woodpecker SQLite database. `<pipeline_number> [--step <name>]` — returns last 200 lines from failed steps (or specified step). Used by `ci_get_logs()` in ci-helpers.sh. Requires `WOODPECKER_DATA_DIR` (default: /woodpecker-data). | ci-helpers.sh |
| `lib/load-project.sh` | Parses a `projects/*.toml` file into env vars (`PROJECT_NAME`, `FORGE_REPO`, `WOODPECKER_REPO_ID`, monitoring toggles, mirror config, etc.). Also exports `FORGE_REPO_OWNER` (the owner component of `FORGE_REPO`, e.g. `disinto-admin` from `disinto-admin/disinto`). Reads `repo_root` and `ops_repo_root` from the TOML for host-CLI callers. **Container path handling (#674)**: no longer derives `PROJECT_REPO_ROOT` or `OPS_REPO_ROOT` inside the script — container entrypoints export the correct paths before agent scripts source `env.sh`, and the `DISINTO_CONTAINER` guard (line 90) skips TOML overrides when those vars are already set. | env.sh (when `PROJECT_TOML` is set) | | `lib/load-project.sh` | Parses a `projects/*.toml` file into env vars (`PROJECT_NAME`, `FORGE_REPO`, `WOODPECKER_REPO_ID`, monitoring toggles, mirror config, etc.). Also exports `FORGE_REPO_OWNER` (the owner component of `FORGE_REPO`, e.g. `disinto-admin` from `disinto-admin/disinto`). Reads `repo_root` and `ops_repo_root` from the TOML for host-CLI callers. **Container path handling (#674)**: no longer derives `PROJECT_REPO_ROOT` or `OPS_REPO_ROOT` inside the script — container entrypoints export the correct paths before agent scripts source `env.sh`, and the `DISINTO_CONTAINER` guard (line 90) skips TOML overrides when those vars are already set. | env.sh (when `PROJECT_TOML` is set) |
@ -20,7 +20,7 @@ sourced as needed.
| `lib/stack-lock.sh` | File-based lock protocol for singleton project stack access. `stack_lock_acquire(holder, project)` — polls until free, breaks stale heartbeats (>10 min old), claims lock. `stack_lock_release(project)` — deletes lock file. `stack_lock_check(project)` — inspect current lock state. `stack_lock_heartbeat(project)` — update heartbeat timestamp (callers must call every 2 min while holding). Lock files at `~/data/locks/<project>-stack.lock`. | docker/edge/dispatcher.sh, reproduce formula | | `lib/stack-lock.sh` | File-based lock protocol for singleton project stack access. `stack_lock_acquire(holder, project)` — polls until free, breaks stale heartbeats (>10 min old), claims lock. `stack_lock_release(project)` — deletes lock file. `stack_lock_check(project)` — inspect current lock state. `stack_lock_heartbeat(project)` — update heartbeat timestamp (callers must call every 2 min while holding). Lock files at `~/data/locks/<project>-stack.lock`. | docker/edge/dispatcher.sh, reproduce formula |
| `lib/tea-helpers.sh` | `tea_file_issue(title, body, labels...)` — create issue via tea CLI with secret scanning; sets `FILED_ISSUE_NUM`. `tea_relabel(issue_num, labels...)` — replace labels using tea's `edit` subcommand (not `label`). `tea_comment(issue_num, body)` — add comment with secret scanning. `tea_close(issue_num)` — close issue. All use `TEA_LOGIN` and `FORGE_REPO` from env.sh. Labels by name (no ID lookup). Tea binary download verified via sha256 checksum. Sourced by env.sh when `tea` binary is available. | env.sh (conditional) | | `lib/tea-helpers.sh` | `tea_file_issue(title, body, labels...)` — create issue via tea CLI with secret scanning; sets `FILED_ISSUE_NUM`. `tea_relabel(issue_num, labels...)` — replace labels using tea's `edit` subcommand (not `label`). `tea_comment(issue_num, body)` — add comment with secret scanning. `tea_close(issue_num)` — close issue. All use `TEA_LOGIN` and `FORGE_REPO` from env.sh. Labels by name (no ID lookup). Tea binary download verified via sha256 checksum. Sourced by env.sh when `tea` binary is available. | env.sh (conditional) |
| `lib/worktree.sh` | Reusable git worktree management: `worktree_create(path, branch, [base_ref])` — create worktree, checkout base, fetch submodules. `worktree_recover(path, branch, [remote])` — detect existing worktree, reuse if on correct branch (sets `_WORKTREE_REUSED`), otherwise clean and recreate. `worktree_cleanup(path)``git worktree remove --force`, clear Claude Code project cache (`~/.claude/projects/` matching path). `worktree_cleanup_stale([max_age_hours])` — scan `/tmp` for orphaned worktrees older than threshold, skip preserved and active tmux worktrees, prune. `worktree_preserve(path, reason)` — mark worktree as preserved for debugging (writes `.worktree-preserved` marker, skipped by stale cleanup). | dev-agent.sh, supervisor-run.sh, planner-run.sh, predictor-run.sh, gardener-run.sh | | `lib/worktree.sh` | Reusable git worktree management: `worktree_create(path, branch, [base_ref])` — create worktree, checkout base, fetch submodules. `worktree_recover(path, branch, [remote])` — detect existing worktree, reuse if on correct branch (sets `_WORKTREE_REUSED`), otherwise clean and recreate. `worktree_cleanup(path)``git worktree remove --force`, clear Claude Code project cache (`~/.claude/projects/` matching path). `worktree_cleanup_stale([max_age_hours])` — scan `/tmp` for orphaned worktrees older than threshold, skip preserved and active tmux worktrees, prune. `worktree_preserve(path, reason)` — mark worktree as preserved for debugging (writes `.worktree-preserved` marker, skipped by stale cleanup). | dev-agent.sh, supervisor-run.sh, planner-run.sh, predictor-run.sh, gardener-run.sh |
| `lib/pr-lifecycle.sh` | Reusable PR lifecycle library: `pr_create()`, `pr_find_by_branch()`, `pr_poll_ci()`, `pr_poll_review()`, `pr_merge()`, `pr_is_merged()`, `pr_walk_to_merge()`, `build_phase_protocol_prompt()`. Requires `lib/ci-helpers.sh`. | dev-agent.sh (future) | | `lib/pr-lifecycle.sh` | Reusable PR lifecycle library: `pr_create()`, `pr_find_by_branch()`, `pr_poll_ci()`, `pr_poll_review()`, `pr_merge()`, `pr_is_merged()`, `pr_walk_to_merge()`, `build_phase_protocol_prompt()`. `pr_poll_ci()` builds a **per-workflow/per-step CI diagnostics prompt** (#1051): on failure, each failed workflow gets its own section with step name, exit code (annotated with standard meanings for 126/127/128), and step-local log tail (via `ci_get_step_logs`); passing workflows are listed explicitly so agents don't waste fix attempts on them. Falls back to legacy combined-log fetch if per-step API is unavailable. Requires `lib/ci-helpers.sh`. | dev-agent.sh (future) |
| `lib/issue-lifecycle.sh` | Reusable issue lifecycle library: `issue_claim()` (add in-progress, remove backlog), `issue_release()` (remove in-progress, add backlog), `issue_block()` (post diagnostic comment with secret redaction, add blocked label), `issue_close()`, `issue_check_deps()` (parse deps, check transitive closure; sets `_ISSUE_BLOCKED_BY`, `_ISSUE_SUGGESTION`), `issue_suggest_next()` (find next unblocked backlog issue; sets `_ISSUE_NEXT`), `issue_post_refusal()` (structured refusal comment with dedup). Label IDs cached in globals on first lookup. Sources `lib/secret-scan.sh`. | dev-agent.sh (future) | | `lib/issue-lifecycle.sh` | Reusable issue lifecycle library: `issue_claim()` (add in-progress, remove backlog), `issue_release()` (remove in-progress, add backlog), `issue_block()` (post diagnostic comment with secret redaction, add blocked label), `issue_close()`, `issue_check_deps()` (parse deps, check transitive closure; sets `_ISSUE_BLOCKED_BY`, `_ISSUE_SUGGESTION`), `issue_suggest_next()` (find next unblocked backlog issue; sets `_ISSUE_NEXT`), `issue_post_refusal()` (structured refusal comment with dedup). Label IDs cached in globals on first lookup. Sources `lib/secret-scan.sh`. | dev-agent.sh (future) |
| `lib/action-vault.sh` | **Vault PR helper** — create vault action PRs on ops repo via Forgejo API (works from containers without SSH). `vault_request <action_id> <toml_content>` validates TOML (using `validate_vault_action` from `action-vault/vault-env.sh`), creates branch `vault/<action-id>`, writes `vault/actions/<action-id>.toml`, creates PR targeting `main` with title `vault: <action-id>` and body from context field, returns PR number. Idempotent: if PR exists, returns existing number. **Low-tier bypass**: if the action's `blast_radius` classifies as `low` (via `action-vault/classify.sh`), `vault_request` calls `_vault_commit_direct()` which commits directly to ops `main` using `FORGE_ADMIN_TOKEN` — no PR, no approval wait. Returns `0` (not a PR number) for direct commits. Requires `FORGE_TOKEN`, `FORGE_ADMIN_TOKEN` (low-tier only), `FORGE_URL`, `FORGE_REPO`, `FORGE_OPS_REPO`. Uses the calling agent's own token (saves/restores `FORGE_TOKEN` around sourcing `vault-env.sh`), so approval workflow respects individual agent identities. | dev-agent (vault actions), future vault dispatcher | | `lib/action-vault.sh` | **Vault PR helper** — create vault action PRs on ops repo via Forgejo API (works from containers without SSH). `vault_request <action_id> <toml_content>` validates TOML (using `validate_vault_action` from `action-vault/vault-env.sh`), creates branch `vault/<action-id>`, writes `vault/actions/<action-id>.toml`, creates PR targeting `main` with title `vault: <action-id>` and body from context field, returns PR number. Idempotent: if PR exists, returns existing number. **Low-tier bypass**: if the action's `blast_radius` classifies as `low` (via `action-vault/classify.sh`), `vault_request` calls `_vault_commit_direct()` which commits directly to ops `main` using `FORGE_ADMIN_TOKEN` — no PR, no approval wait. Returns `0` (not a PR number) for direct commits. Requires `FORGE_TOKEN`, `FORGE_ADMIN_TOKEN` (low-tier only), `FORGE_URL`, `FORGE_REPO`, `FORGE_OPS_REPO`. Uses the calling agent's own token (saves/restores `FORGE_TOKEN` around sourcing `vault-env.sh`), so approval workflow respects individual agent identities. | dev-agent (vault actions), future vault dispatcher |
| `lib/branch-protection.sh` | Branch protection helpers for Forgejo repos. `setup_vault_branch_protection()` — configures admin-only merge protection on main (require 1 approval, restrict merge to admin role, block direct pushes). `setup_profile_branch_protection()` — same protection for `.profile` repos. `verify_branch_protection()` — checks protection is correctly configured. `remove_branch_protection()` — removes protection (cleanup/testing). Handles race condition after initial push: retries with backoff if Forgejo hasn't processed the branch yet. Requires `FORGE_TOKEN`, `FORGE_URL`, `FORGE_OPS_REPO`. | bin/disinto (hire-an-agent) | | `lib/branch-protection.sh` | Branch protection helpers for Forgejo repos. `setup_vault_branch_protection()` — configures admin-only merge protection on main (require 1 approval, restrict merge to admin role, block direct pushes). `setup_profile_branch_protection()` — same protection for `.profile` repos. `verify_branch_protection()` — checks protection is correctly configured. `remove_branch_protection()` — removes protection (cleanup/testing). Handles race condition after initial push: retries with backoff if Forgejo hasn't processed the branch yet. Requires `FORGE_TOKEN`, `FORGE_URL`, `FORGE_OPS_REPO`. | bin/disinto (hire-an-agent) |
@ -30,9 +30,11 @@ sourced as needed.
| `lib/git-creds.sh` | Shared git credential helper configuration. `configure_git_creds([HOME_DIR] [RUN_AS_CMD])` — writes a static credential helper script and configures git globally to use password-based HTTP auth (Forgejo 11.x rejects API tokens for `git push`, #361). **Retry on cold boot (#741)**: resolves bot username from `FORGE_TOKEN` with 5 retries (exponential backoff 1-5s); fails loudly and returns 1 if Forgejo is unreachable — never falls back to a wrong hardcoded default (exports `BOT_USER` on success). `repair_baked_cred_urls([--as RUN_AS_CMD] DIR ...)` — rewrites any git remote URLs that have credentials baked in to use clean URLs instead; uses `safe.directory` bypass for root-owned repos (#671). Requires `FORGE_PASS`, `FORGE_URL`, `FORGE_TOKEN`. | entrypoints (agents, edge) | | `lib/git-creds.sh` | Shared git credential helper configuration. `configure_git_creds([HOME_DIR] [RUN_AS_CMD])` — writes a static credential helper script and configures git globally to use password-based HTTP auth (Forgejo 11.x rejects API tokens for `git push`, #361). **Retry on cold boot (#741)**: resolves bot username from `FORGE_TOKEN` with 5 retries (exponential backoff 1-5s); fails loudly and returns 1 if Forgejo is unreachable — never falls back to a wrong hardcoded default (exports `BOT_USER` on success). `repair_baked_cred_urls([--as RUN_AS_CMD] DIR ...)` — rewrites any git remote URLs that have credentials baked in to use clean URLs instead; uses `safe.directory` bypass for root-owned repos (#671). Requires `FORGE_PASS`, `FORGE_URL`, `FORGE_TOKEN`. | entrypoints (agents, edge) |
| `lib/ops-setup.sh` | `setup_ops_repo()` — creates ops repo on Forgejo if it doesn't exist, configures bot collaborators, clones/initializes ops repo locally, seeds directory structure (vault, knowledge, evidence, sprints). Evidence subdirectories seeded: engagement/, red-team/, holdout/, evolution/, user-test/. Also seeds sprints/ for architect output. Exports `_ACTUAL_OPS_SLUG`. `migrate_ops_repo(ops_root, [primary_branch])` — idempotent migration helper that seeds missing directories and .gitkeep files on existing ops repos (pre-#407 deployments). | bin/disinto (init) | | `lib/ops-setup.sh` | `setup_ops_repo()` — creates ops repo on Forgejo if it doesn't exist, configures bot collaborators, clones/initializes ops repo locally, seeds directory structure (vault, knowledge, evidence, sprints). Evidence subdirectories seeded: engagement/, red-team/, holdout/, evolution/, user-test/. Also seeds sprints/ for architect output. Exports `_ACTUAL_OPS_SLUG`. `migrate_ops_repo(ops_root, [primary_branch])` — idempotent migration helper that seeds missing directories and .gitkeep files on existing ops repos (pre-#407 deployments). | bin/disinto (init) |
| `lib/ci-setup.sh` | `_install_cron_impl()` — installs crontab entries for bare-metal deployments (compose mode uses polling loop instead). `_create_forgejo_oauth_app()` — generic helper to create an OAuth2 app on Forgejo (shared by Woodpecker and chat). `_create_woodpecker_oauth_impl()` — creates Woodpecker OAuth2 app (thin wrapper). `_create_chat_oauth_impl()` — creates disinto-chat OAuth2 app, writes `CHAT_OAUTH_CLIENT_ID`/`CHAT_OAUTH_CLIENT_SECRET` to `.env` (#708). `_generate_woodpecker_token_impl()` — auto-generates WOODPECKER_TOKEN via OAuth2 flow. `_activate_woodpecker_repo_impl()` — activates repo in Woodpecker. All gated by `_load_ci_context()` which validates required env vars. | bin/disinto (init) | | `lib/ci-setup.sh` | `_install_cron_impl()` — installs crontab entries for bare-metal deployments (compose mode uses polling loop instead). `_create_forgejo_oauth_app()` — generic helper to create an OAuth2 app on Forgejo (shared by Woodpecker and chat). `_create_woodpecker_oauth_impl()` — creates Woodpecker OAuth2 app (thin wrapper). `_create_chat_oauth_impl()` — creates disinto-chat OAuth2 app, writes `CHAT_OAUTH_CLIENT_ID`/`CHAT_OAUTH_CLIENT_SECRET` to `.env` (#708). `_generate_woodpecker_token_impl()` — auto-generates WOODPECKER_TOKEN via OAuth2 flow. `_activate_woodpecker_repo_impl()` — activates repo in Woodpecker. All gated by `_load_ci_context()` which validates required env vars. | bin/disinto (init) |
| `lib/generators.sh` | Template generation for `disinto init`: `generate_compose()` — docker-compose.yml (uses `codeberg.org/forgejo/forgejo:11.0` tag; adds `security_opt: [apparmor:unconfined]` to all services for rootless container compatibility; Forgejo includes a healthcheck so dependent services use `condition: service_healthy` — fixes cold-start races, #665; adds `chat` service block with isolated `chat-config` named volume and `CHAT_HISTORY_DIR` bind-mount for per-user NDJSON history persistence (#710); injects `FORWARD_AUTH_SECRET` for Caddy↔chat defense-in-depth auth (#709); cost-cap env vars `CHAT_MAX_REQUESTS_PER_HOUR`, `CHAT_MAX_REQUESTS_PER_DAY`, `CHAT_MAX_TOKENS_PER_DAY` (#711); subdomain fallback comment for `EDGE_TUNNEL_FQDN_*` vars (#713); all `depends_on` now use `condition: service_healthy/started` instead of bare service names; all services now include `restart: unless-stopped` including the edge service — #768; agents service now uses `image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest}` instead of `build:` (#429); `WOODPECKER_PLUGINS_PRIVILEGED` env var added to woodpecker service (#779); agents-llama conditional block gated on `ENABLE_LLAMA_AGENT=1` (#769); `agents-llama-all` compose service (profile `agents-llama-all`, all 7 roles: review,dev,gardener,architect,planner,predictor,supervisor) added by #801; agents service gains volume mounts for `./projects`, `./.env`, `./state`), `generate_caddyfile()` — Caddyfile (routes: `/forge/*` → forgejo:3000, `/woodpecker/*` → woodpecker:8000, `/staging/*` → staging:80; `/chat/login` and `/chat/oauth/callback` bypass `forward_auth` so unauthenticated users can reach the OAuth flow; `/chat/*` gated by `forward_auth` on `chat:8080/chat/auth/verify` which stamps `X-Forwarded-User` (#709); root `/` redirects to `/forge/`), `generate_staging_index()` — staging index, `generate_deploy_pipelines()` — Woodpecker deployment pipeline configs. Requires `FACTORY_ROOT`, `PROJECT_NAME`, `PRIMARY_BRANCH`. | bin/disinto (init) | | `lib/generators.sh` | Template generation for `disinto init`: `generate_compose()` — docker-compose.yml (**duplicate service detection**: tracks service names during generation, aborts with `ERROR: Duplicate service name '$name' detected` on conflict; detection state is reset between calls so idempotent reinvocation is safe, #850) (uses `codeberg.org/forgejo/forgejo:11.0` tag; `CLAUDE_BIN_DIR` volume mount removed from agents/llama services — only `reproduce` and `edge` still use the host-mounted CLI (#992); adds `security_opt: [apparmor:unconfined]` to all services for rootless container compatibility; Forgejo includes a healthcheck so dependent services use `condition: service_healthy` — fixes cold-start races, #665; adds `chat` service block with isolated `chat-config` named volume and `CHAT_HISTORY_DIR` bind-mount for per-user NDJSON history persistence (#710); injects `FORWARD_AUTH_SECRET` for Caddy↔chat defense-in-depth auth (#709); subdomain fallback: `EDGE_ROUTING_MODE` (default `subpath`) and per-service `EDGE_TUNNEL_FQDN_*` vars injected into edge service (#1028); chat service rate limiting removed (#1084); chat workspace dir bind-mount: `${CHAT_WORKSPACE_DIR:-./workspace}:/var/workspace` + `CHAT_WORKSPACE_DIR` env var injected so Claude can access project working tree (#1027); all `depends_on` now use `condition: service_healthy/started` instead of bare service names; all services now include `restart: unless-stopped` including the edge service — #768; agents service now uses `image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest}` instead of `build:` (#429); `WOODPECKER_PLUGINS_PRIVILEGED` env var added to woodpecker service (#779); agents-llama conditional block gated on `ENABLE_LLAMA_AGENT=1` (#769); `agents-llama-all` compose service (profile `agents-llama-all`, all 7 roles: review,dev,gardener,architect,planner,predictor,supervisor) added by #801; agents service gains volume mounts for `./projects`, `./.env`, `./state`), `generate_caddyfile()` — Caddyfile (routes: `/forge/*` → forgejo:3000, `/woodpecker/*` → woodpecker:8000, `/staging/*` → staging:80; `/chat/login` and `/chat/oauth/callback` bypass `forward_auth` so unauthenticated users can reach the OAuth flow; `/chat/*` gated by `forward_auth` on `chat:8080/chat/auth/verify` which stamps `X-Forwarded-User` (#709); root `/` redirects to `/forge/`), `generate_staging_index()` — staging index, `generate_deploy_pipelines()` — Woodpecker deployment pipeline configs. Requires `FACTORY_ROOT`, `PROJECT_NAME`, `PRIMARY_BRANCH`. | bin/disinto (init) |
| `lib/backup.sh` | Factory backup creation. `backup_create <outfile.tar.gz>` — exports factory state: fetches all issues (open+closed) from the project and ops repos via Forgejo API, bundles the ops repo as a git bundle, and writes a tarball. Requires `FORGE_URL`, `FORGE_TOKEN`, `FORGE_REPO`, `FORGE_OPS_REPO`, `OPS_REPO_ROOT`. Sourced by `bin/disinto backup create` (#1057). | bin/disinto (backup create) |
| `lib/disinto/backup.sh` | Factory backup restore. `backup_import <infile.tar.gz>` — restores from a backup tarball: creates missing repos via Forgejo API, imports issues (idempotent — skips by number if present), unpacks ops repo git bundle. Idempotent: running twice produces same end state with no errors. Requires `FORGE_URL`, `FORGE_TOKEN`. Sourced by `bin/disinto backup import` (#1058). | bin/disinto (backup import) |
| `lib/sprint-filer.sh` | Post-merge sub-issue filer for sprint PRs. Invoked by the `.woodpecker/ops-filer.yml` pipeline after a sprint PR merges to ops repo `main`. Parses `<!-- filer:begin --> ... <!-- filer:end -->` blocks from sprint PR bodies to extract sub-issue definitions, creates them on the project repo using `FORGE_FILER_TOKEN` (narrow-scope `filer-bot` identity with `issues:write` only), adds `in-progress` label to the parent vision issue, and handles vision lifecycle closure when all sub-issues are closed. Uses `filer_api_all()` for paginated fetches. Idempotent: uses `<!-- decomposed-from: #<vision>, sprint: <slug>, id: <id> -->` markers to skip already-filed issues. Requires `FORGE_FILER_TOKEN`, `FORGE_API`, `FORGE_API_BASE`, `FORGE_OPS_REPO`. | `.woodpecker/ops-filer.yml` (CI pipeline on ops repo) | | `lib/sprint-filer.sh` | Post-merge sub-issue filer for sprint PRs. Invoked by the `.woodpecker/ops-filer.yml` pipeline after a sprint PR merges to ops repo `main`. Parses `<!-- filer:begin --> ... <!-- filer:end -->` blocks from sprint PR bodies to extract sub-issue definitions, creates them on the project repo using `FORGE_FILER_TOKEN` (narrow-scope `filer-bot` identity with `issues:write` only), adds `in-progress` label to the parent vision issue, and handles vision lifecycle closure when all sub-issues are closed. Uses `filer_api_all()` for paginated fetches. Idempotent: uses `<!-- decomposed-from: #<vision>, sprint: <slug>, id: <id> -->` markers to skip already-filed issues. Requires `FORGE_FILER_TOKEN`, `FORGE_API`, `FORGE_API_BASE`, `FORGE_OPS_REPO`. | `.woodpecker/ops-filer.yml` (CI pipeline on ops repo) |
| `lib/hire-agent.sh` | `disinto_hire_an_agent()` — user creation, `.profile` repo setup, formula copying, branch protection, and state marker creation for hiring a new agent. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`, `PROJECT_NAME`. Extracted from `bin/disinto`. | bin/disinto (hire) | | `lib/hire-agent.sh` | `disinto_hire_an_agent()` — user creation, `.profile` repo setup, formula copying, branch protection, and state marker creation for hiring a new agent. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`, `PROJECT_NAME`. Extracted from `bin/disinto`. | bin/disinto (hire) |
| `lib/release.sh` | `disinto_release()` — vault TOML creation, branch setup on ops repo, PR creation, and auto-merge request for a versioned release. `_assert_release_globals()` validates required env vars. Requires `FORGE_URL`, `FORGE_TOKEN`, `FORGE_OPS_REPO`, `FACTORY_ROOT`, `PRIMARY_BRANCH`. Extracted from `bin/disinto`. | bin/disinto (release) | | `lib/release.sh` | `disinto_release()` — vault TOML creation, branch setup on ops repo, PR creation, and auto-merge request for a versioned release. `_assert_release_globals()` validates required env vars. Requires `FORGE_URL`, `FORGE_TOKEN`, `FORGE_OPS_REPO`, `FACTORY_ROOT`, `PRIMARY_BRANCH`. Extracted from `bin/disinto`. | bin/disinto (release) |
| `lib/hvault.sh` | HashiCorp Vault helper module. `hvault_kv_get(PATH, [KEY])` — read KV v2 secret, optionally extract one key. `hvault_kv_put(PATH, KEY=VAL ...)` — write KV v2 secret. `hvault_kv_list(PATH)` — list keys at a KV path. `hvault_policy_apply(NAME, FILE)` — idempotent policy upsert. `hvault_jwt_login(ROLE, JWT)` — exchange JWT for short-lived token. `hvault_token_lookup()` — returns TTL/policies/accessor for current token. All functions use `VAULT_ADDR` + `VAULT_TOKEN` from env (fallback: `/etc/vault.d/root.token`), emit structured JSON errors to stderr on failure. Tests: `tests/lib-hvault.bats` (requires `vault server -dev`). | Not sourced at runtime yet — pure scaffolding for Nomad+Vault migration (#799) | | `lib/hvault.sh` | HashiCorp Vault helper module. `hvault_kv_get(PATH, [KEY])` — read KV v2 secret, optionally extract one key. `hvault_kv_put(PATH, KEY=VAL ...)` — write KV v2 secret. `hvault_kv_list(PATH)` — list keys at a KV path. `hvault_get_or_empty(PATH)` — GET /v1/PATH; 200→raw body, 404→empty, else structured error + return 1 (used by sync scripts to distinguish "absent, create" from hard failure without tripping errexit, #881). `hvault_ensure_kv_v2(MOUNT, [LOG_PREFIX])` — idempotent KV v2 mount assertion: enables mount if absent, fails loudly if present as wrong type/version. Extracted from all `vault-seed-*.sh` scripts to eliminate dup-detector violations. Respects `DRY_RUN=1`. `hvault_policy_apply(NAME, FILE)` — idempotent policy upsert. `hvault_jwt_login(ROLE, JWT)` — exchange JWT for short-lived token. `hvault_token_lookup()` — returns TTL/policies/accessor for current token. `_hvault_seed_key(PATH, KEY, [GENERATOR])` — seed one KV key if absent; reads existing data and merges to preserve sibling keys (KV v2 replaces atomically); returns 0=created, 1=unchanged, 2=API error (#992). All functions use `VAULT_ADDR` + `VAULT_TOKEN` from env (fallback: `/etc/vault.d/root.token`), emit structured JSON errors to stderr on failure. Tests: `tests/lib-hvault.bats` (requires `vault server -dev`). | `tools/vault-apply-policies.sh`, `tools/vault-apply-roles.sh`, `lib/init/nomad/vault-nomad-auth.sh`, `tools/vault-seed-*.sh` |
| `lib/init/nomad/` | Nomad+Vault Step 0 installer scripts. `cluster-up.sh` — idempotent orchestrator that runs all steps in order (installs packages, writes HCL, enables systemd units, unseals Vault); uses `poll_until_healthy()` helper for deduped readiness polling. `install.sh` — installs pinned Nomad+Vault apt packages. `vault-init.sh` — initializes Vault (unseal keys → `/etc/vault.d/`), creates dev-persisted unseal unit. `lib-systemd.sh` — shared systemd unit helpers. `systemd-nomad.sh`, `systemd-vault.sh` — write and enable service units. Idempotent: each step checks current state before acting. Sourced and called by `cluster-up.sh`; not sourced by agents. | `bin/disinto init --backend=nomad` | | `lib/init/nomad/` | Nomad+Vault installer scripts. `cluster-up.sh` — idempotent Step-0 orchestrator that runs all steps in order (installs packages, writes HCL, enables systemd units, unseals Vault); uses `poll_until_healthy()` helper for deduped readiness polling; `HOST_VOLUME_DIRS` array now includes `/srv/disinto/docker` (for staging file-server, S5.2, #989, #992). `install.sh` — installs pinned Nomad+Vault apt packages. `vault-init.sh` — initializes Vault (unseal keys → `/etc/vault.d/`), creates dev-persisted unseal unit. `lib-systemd.sh` — shared systemd unit helpers. `systemd-nomad.sh`, `systemd-vault.sh` — write and enable service units. `vault-nomad-auth.sh` — Step-2 script that enables Vault's JWT auth at path `jwt-nomad`, writes the JWKS/algs config pointing at Nomad's workload-identity signer, delegates role sync to `tools/vault-apply-roles.sh`, installs `/etc/nomad.d/server.hcl`, and SIGHUPs `nomad.service` if the file changed (#881). `wp-oauth-register.sh` — S3.3 script that creates the Woodpecker OAuth2 app in Forgejo and stores `forgejo_client`/`forgejo_secret` in Vault KV v2 at `kv/disinto/shared/woodpecker`; idempotent (skips if app or secrets already present); called by `bin/disinto --with woodpecker`. `deploy.sh` — S4 dependency-ordered Nomad job deploy + health-wait; takes a list of jobspec basenames, submits each to Nomad and polls until healthy before proceeding to the next; supports `--dry-run` and per-job timeout overrides via `JOB_READY_TIMEOUT_<JOBNAME>`; global default timeout `JOB_READY_TIMEOUT_SECS` is 360s (raised from 240s for chat cold-start, #1036); invoked by `bin/disinto --with <svc>` and `cluster-up.sh`; deploy order now covers staging, chat, edge (S5.5, #992). Idempotent: each step checks current state before acting. Sourced and called by `cluster-up.sh`; not sourced by agents. | `bin/disinto init --backend=nomad` |

View file

@ -128,7 +128,6 @@ vault_request() {
# Validate TOML content # Validate TOML content
local tmp_toml local tmp_toml
tmp_toml=$(mktemp /tmp/vault-XXXXXX.toml) tmp_toml=$(mktemp /tmp/vault-XXXXXX.toml)
trap 'rm -f "$tmp_toml"' RETURN
printf '%s' "$toml_content" > "$tmp_toml" printf '%s' "$toml_content" > "$tmp_toml"
@ -136,6 +135,7 @@ vault_request() {
local vault_env="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/action-vault/vault-env.sh" local vault_env="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/action-vault/vault-env.sh"
if [ ! -f "$vault_env" ]; then if [ ! -f "$vault_env" ]; then
echo "ERROR: vault-env.sh not found at $vault_env" >&2 echo "ERROR: vault-env.sh not found at $vault_env" >&2
rm -f "$tmp_toml"
return 1 return 1
fi fi
@ -145,11 +145,15 @@ vault_request() {
if ! source "$vault_env"; then if ! source "$vault_env"; then
FORGE_TOKEN="${_saved_forge_token:-}" FORGE_TOKEN="${_saved_forge_token:-}"
echo "ERROR: failed to source vault-env.sh" >&2 echo "ERROR: failed to source vault-env.sh" >&2
rm -f "$tmp_toml"
return 1 return 1
fi fi
# Restore caller's FORGE_TOKEN after validation # Restore caller's FORGE_TOKEN after validation
FORGE_TOKEN="${_saved_forge_token:-}" FORGE_TOKEN="${_saved_forge_token:-}"
# Set trap AFTER sourcing vault-env.sh to avoid RETURN trap firing during source
trap 'rm -f "$tmp_toml"' RETURN
# Run validation # Run validation
if ! validate_vault_action "$tmp_toml"; then if ! validate_vault_action "$tmp_toml"; then
echo "ERROR: TOML validation failed" >&2 echo "ERROR: TOML validation failed" >&2

View file

@ -52,8 +52,9 @@ claude_run_with_watchdog() {
out_file=$(mktemp) || return 1 out_file=$(mktemp) || return 1
trap 'rm -f "$out_file"' RETURN trap 'rm -f "$out_file"' RETURN
# Start claude in background, capturing stdout to temp file # Start claude in new process group (setsid creates new session, $pid is PGID leader)
"${cmd[@]}" > "$out_file" 2>>"$LOGFILE" & # All children of claude will inherit this process group
setsid "${cmd[@]}" > "$out_file" 2>>"$LOGFILE" &
pid=$! pid=$!
# Background watchdog: poll for final result marker # Background watchdog: poll for final result marker
@ -84,12 +85,12 @@ claude_run_with_watchdog() {
sleep "$grace" sleep "$grace"
if kill -0 "$pid" 2>/dev/null; then if kill -0 "$pid" 2>/dev/null; then
log "watchdog: claude -p idle for ${grace}s after final result; SIGTERM" log "watchdog: claude -p idle for ${grace}s after final result; SIGTERM"
kill -TERM "$pid" 2>/dev/null || true kill -TERM -- "-$pid" 2>/dev/null || true
# Give it a moment to clean up # Give it a moment to clean up
sleep 5 sleep 5
if kill -0 "$pid" 2>/dev/null; then if kill -0 "$pid" 2>/dev/null; then
log "watchdog: force kill after SIGTERM timeout" log "watchdog: force kill after SIGTERM timeout"
kill -KILL "$pid" 2>/dev/null || true kill -KILL -- "-$pid" 2>/dev/null || true
fi fi
fi fi
fi fi
@ -100,16 +101,16 @@ claude_run_with_watchdog() {
timeout --foreground "${CLAUDE_TIMEOUT:-7200}" tail --pid="$pid" -f /dev/null 2>/dev/null timeout --foreground "${CLAUDE_TIMEOUT:-7200}" tail --pid="$pid" -f /dev/null 2>/dev/null
rc=$? rc=$?
# Clean up the watchdog # Clean up the watchdog (target process group if it spawned children)
kill "$grace_pid" 2>/dev/null || true kill -- "-$grace_pid" 2>/dev/null || true
wait "$grace_pid" 2>/dev/null || true wait "$grace_pid" 2>/dev/null || true
# When timeout fires (rc=124), explicitly kill the orphaned claude process # When timeout fires (rc=124), explicitly kill the orphaned claude process group
# tail --pid is a passive waiter, not a supervisor # tail --pid is a passive waiter, not a supervisor
if [ "$rc" -eq 124 ]; then if [ "$rc" -eq 124 ]; then
kill "$pid" 2>/dev/null || true kill -TERM -- "-$pid" 2>/dev/null || true
sleep 1 sleep 1
kill -KILL "$pid" 2>/dev/null || true kill -KILL -- "-$pid" 2>/dev/null || true
fi fi
# Output the captured stdout # Output the captured stdout

136
lib/backup.sh Normal file
View file

@ -0,0 +1,136 @@
#!/usr/bin/env bash
# =============================================================================
# disinto backup — export factory state for migration
#
# Usage: source this file, then call backup_create <outfile.tar.gz>
# Requires: FORGE_URL, FORGE_TOKEN, FORGE_REPO, FORGE_OPS_REPO, OPS_REPO_ROOT
# =============================================================================
set -euo pipefail
# Fetch all issues (open + closed) for a repo slug and emit the normalized JSON array.
# Usage: _backup_fetch_issues <org/repo>
_backup_fetch_issues() {
local repo_slug="$1"
local api_url="${FORGE_API_BASE}/repos/${repo_slug}"
local all_issues="[]"
for state in open closed; do
local page=1
while true; do
local page_items
page_items=$(curl -sf -X GET \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${api_url}/issues?state=${state}&type=issues&limit=50&page=${page}") || {
echo "ERROR: failed to fetch ${state} issues from ${repo_slug} (page ${page})" >&2
return 1
}
local count
count=$(printf '%s' "$page_items" | jq 'length' 2>/dev/null) || count=0
[ -z "$count" ] && count=0
[ "$count" -eq 0 ] && break
all_issues=$(printf '%s\n%s' "$all_issues" "$page_items" | jq -s 'add')
[ "$count" -lt 50 ] && break
page=$((page + 1))
done
done
# Normalize to the schema: number, title, body, labels, state
printf '%s' "$all_issues" | jq '[.[] | {
number: .number,
title: .title,
body: .body,
labels: [.labels[]?.name],
state: .state
}] | sort_by(.number)'
}
# Create a backup tarball of factory state.
# Usage: backup_create <outfile.tar.gz>
backup_create() {
local outfile="${1:-}"
if [ -z "$outfile" ]; then
echo "Error: output file required" >&2
echo "Usage: disinto backup create <outfile.tar.gz>" >&2
return 1
fi
# Resolve to absolute path before cd-ing into tmpdir
case "$outfile" in
/*) ;;
*) outfile="$(pwd)/${outfile}" ;;
esac
# Validate required env
: "${FORGE_URL:?FORGE_URL must be set}"
: "${FORGE_TOKEN:?FORGE_TOKEN must be set}"
: "${FORGE_REPO:?FORGE_REPO must be set}"
local forge_ops_repo="${FORGE_OPS_REPO:-${FORGE_REPO}-ops}"
local ops_repo_root="${OPS_REPO_ROOT:-}"
if [ -z "$ops_repo_root" ] || [ ! -d "$ops_repo_root/.git" ]; then
echo "Error: OPS_REPO_ROOT (${ops_repo_root:-<unset>}) is not a valid git repo" >&2
return 1
fi
local tmpdir
tmpdir=$(mktemp -d)
trap 'rm -rf "$tmpdir"' EXIT
local project_name="${FORGE_REPO##*/}"
echo "=== disinto backup create ==="
echo "Forge: ${FORGE_URL}"
echo "Repos: ${FORGE_REPO}, ${forge_ops_repo}"
# ── 1. Export issues ──────────────────────────────────────────────────────
mkdir -p "${tmpdir}/issues"
echo "Fetching issues for ${FORGE_REPO}..."
_backup_fetch_issues "$FORGE_REPO" > "${tmpdir}/issues/${project_name}.json"
local main_count
main_count=$(jq 'length' "${tmpdir}/issues/${project_name}.json")
echo " ${main_count} issues exported"
echo "Fetching issues for ${forge_ops_repo}..."
_backup_fetch_issues "$forge_ops_repo" > "${tmpdir}/issues/${project_name}-ops.json"
local ops_count
ops_count=$(jq 'length' "${tmpdir}/issues/${project_name}-ops.json")
echo " ${ops_count} issues exported"
# ── 2. Git bundle of ops repo ────────────────────────────────────────────
mkdir -p "${tmpdir}/repos"
echo "Creating git bundle for ${forge_ops_repo}..."
git -C "$ops_repo_root" bundle create "${tmpdir}/repos/${project_name}-ops.bundle" --all 2>&1
echo " bundle created ($(du -h "${tmpdir}/repos/${project_name}-ops.bundle" | cut -f1))"
# ── 3. Metadata ──────────────────────────────────────────────────────────
local created_at
created_at=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
jq -n \
--arg created_at "$created_at" \
--arg source_host "$(hostname)" \
--argjson schema_version 1 \
--arg forgejo_url "$FORGE_URL" \
'{
created_at: $created_at,
source_host: $source_host,
schema_version: $schema_version,
forgejo_url: $forgejo_url
}' > "${tmpdir}/metadata.json"
# ── 4. Pack tarball ──────────────────────────────────────────────────────
echo "Creating tarball: ${outfile}"
tar -czf "$outfile" -C "$tmpdir" metadata.json issues repos
local size
size=$(du -h "$outfile" | cut -f1)
echo "=== Backup complete: ${outfile} (${size}) ==="
# Clean up before returning — the EXIT trap references the local $tmpdir
# which goes out of scope after return, causing 'unbound variable' under set -u.
trap - EXIT
rm -rf "$tmpdir"
}

View file

@ -247,6 +247,31 @@ ci_promote() {
echo "$new_num" echo "$new_num"
} }
# ci_get_step_logs <pipeline_num> <step_id>
# Fetches logs for a single CI step via the Woodpecker API.
# Requires: WOODPECKER_REPO_ID, woodpecker_api() (from env.sh)
# Returns: 0 on success, 1 on failure. Outputs log text to stdout.
#
# Usage:
# ci_get_step_logs 1423 5 # Get logs for step ID 5 in pipeline 1423
ci_get_step_logs() {
local pipeline_num="$1" step_id="$2"
if [ -z "$pipeline_num" ] || [ -z "$step_id" ]; then
echo "Usage: ci_get_step_logs <pipeline_num> <step_id>" >&2
return 1
fi
if [ -z "${WOODPECKER_REPO_ID:-}" ] || [ "${WOODPECKER_REPO_ID}" = "0" ]; then
echo "ERROR: WOODPECKER_REPO_ID not set or zero" >&2
return 1
fi
woodpecker_api "/repos/${WOODPECKER_REPO_ID}/logs/${pipeline_num}/${step_id}" \
--max-time 15 2>/dev/null \
| jq -r '.[].data // empty' 2>/dev/null
}
# ci_get_logs <pipeline_number> [--step <step_name>] # ci_get_logs <pipeline_number> [--step <step_name>]
# Reads CI logs from the Woodpecker SQLite database. # Reads CI logs from the Woodpecker SQLite database.
# Requires: WOODPECKER_DATA_DIR env var or mounted volume at /woodpecker-data # Requires: WOODPECKER_DATA_DIR env var or mounted volume at /woodpecker-data

View file

@ -142,6 +142,7 @@ _create_forgejo_oauth_app() {
# Set up Woodpecker CI to use Forgejo as its forge backend. # Set up Woodpecker CI to use Forgejo as its forge backend.
# Creates an OAuth2 app on Forgejo for Woodpecker, activates the repo. # Creates an OAuth2 app on Forgejo for Woodpecker, activates the repo.
# Respects EDGE_ROUTING_MODE: in subdomain mode, uses EDGE_TUNNEL_FQDN_CI for redirect URI.
# Usage: create_woodpecker_oauth <forge_url> <repo_slug> # Usage: create_woodpecker_oauth <forge_url> <repo_slug>
_create_woodpecker_oauth_impl() { _create_woodpecker_oauth_impl() {
local forge_url="$1" local forge_url="$1"
@ -150,7 +151,13 @@ _create_woodpecker_oauth_impl() {
echo "" echo ""
echo "── Woodpecker OAuth2 setup ────────────────────────────" echo "── Woodpecker OAuth2 setup ────────────────────────────"
_create_forgejo_oauth_app "woodpecker-ci" "http://localhost:8000/authorize" || return 0 local wp_redirect_uri="http://localhost:8000/authorize"
local routing_mode="${EDGE_ROUTING_MODE:-subpath}"
if [ "$routing_mode" = "subdomain" ] && [ -n "${EDGE_TUNNEL_FQDN_CI:-}" ]; then
wp_redirect_uri="https://${EDGE_TUNNEL_FQDN_CI}/authorize"
fi
_create_forgejo_oauth_app "woodpecker-ci" "$wp_redirect_uri" || return 0
local client_id="${_OAUTH_CLIENT_ID}" local client_id="${_OAUTH_CLIENT_ID}"
local client_secret="${_OAUTH_CLIENT_SECRET}" local client_secret="${_OAUTH_CLIENT_SECRET}"
@ -158,10 +165,15 @@ _create_woodpecker_oauth_impl() {
# WP_FORGEJO_CLIENT/SECRET match the docker-compose.yml variable references # WP_FORGEJO_CLIENT/SECRET match the docker-compose.yml variable references
# WOODPECKER_HOST must be host-accessible URL to match OAuth2 redirect_uri # WOODPECKER_HOST must be host-accessible URL to match OAuth2 redirect_uri
local env_file="${FACTORY_ROOT}/.env" local env_file="${FACTORY_ROOT}/.env"
local wp_host="http://localhost:8000"
if [ "$routing_mode" = "subdomain" ] && [ -n "${EDGE_TUNNEL_FQDN_CI:-}" ]; then
wp_host="https://${EDGE_TUNNEL_FQDN_CI}"
fi
local wp_vars=( local wp_vars=(
"WOODPECKER_FORGEJO=true" "WOODPECKER_FORGEJO=true"
"WOODPECKER_FORGEJO_URL=${forge_url}" "WOODPECKER_FORGEJO_URL=${forge_url}"
"WOODPECKER_HOST=http://localhost:8000" "WOODPECKER_HOST=${wp_host}"
) )
if [ -n "${client_id:-}" ]; then if [ -n "${client_id:-}" ]; then
wp_vars+=("WP_FORGEJO_CLIENT=${client_id}") wp_vars+=("WP_FORGEJO_CLIENT=${client_id}")

391
lib/disinto/backup.sh Normal file
View file

@ -0,0 +1,391 @@
#!/usr/bin/env bash
# =============================================================================
# backup.sh — backup/restore utilities for disinto factory state
#
# Subcommands:
# create <outfile.tar.gz> Create backup of factory state
# import <infile.tar.gz> Restore factory state from backup
#
# Usage:
# source "${FACTORY_ROOT}/lib/disinto/backup.sh"
# backup_import <tarball>
#
# Environment:
# FORGE_URL - Forgejo instance URL (target)
# FORGE_TOKEN - Admin token for target Forgejo
#
# Idempotency:
# - Repos: created via API if missing
# - Issues: check if exists by number, skip if present
# - Runs twice = same end state, no errors
# =============================================================================
set -euo pipefail
# ── Helper: log with timestamp ───────────────────────────────────────────────
backup_log() {
local msg="$1"
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $msg"
}
# ── Helper: create repo if it doesn't exist ─────────────────────────────────
# Usage: backup_create_repo_if_missing <slug>
# Returns: 0 if repo exists or was created, 1 on error
backup_create_repo_if_missing() {
local slug="$1"
local org_name="${slug%%/*}"
local repo_name="${slug##*/}"
# Check if repo exists
if curl -sf --max-time 5 \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_URL}/api/v1/repos/${slug}" >/dev/null 2>&1; then
backup_log "Repo ${slug} already exists"
return 0
fi
backup_log "Creating repo ${slug}..."
# Create org if needed
curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/orgs" \
-d "{\"username\":\"${org_name}\",\"visibility\":\"public\"}" >/dev/null 2>&1 || true
# Create repo
local response
response=$(curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/orgs/${org_name}/repos" \
-d "{\"name\":\"${repo_name}\",\"auto_init\":false,\"default_branch\":\"main\"}" 2>/dev/null) \
|| response=""
if [ -n "$response" ] && echo "$response" | grep -q '"id":\|[0-9]'; then
backup_log "Created repo ${slug}"
BACKUP_CREATED_REPOS=$((BACKUP_CREATED_REPOS + 1))
return 0
fi
# Fallback: admin endpoint
response=$(curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/admin/users/${org_name}/repos" \
-d "{\"name\":\"${repo_name}\",\"auto_init\":false,\"default_branch\":\"main\"}" 2>/dev/null) \
|| response=""
if [ -n "$response" ] && echo "$response" | grep -q '"id":\|[0-9]'; then
backup_log "Created repo ${slug} (via admin API)"
BACKUP_CREATED_REPOS=$((BACKUP_CREATED_REPOS + 1))
return 0
fi
backup_log "ERROR: failed to create repo ${slug}" >&2
return 1
}
# ── Helper: check if issue exists by number ──────────────────────────────────
# Usage: backup_issue_exists <slug> <issue_number>
# Returns: 0 if exists, 1 if not
backup_issue_exists() {
local slug="$1"
local issue_num="$2"
curl -sf --max-time 5 \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_URL}/api/v1/repos/${slug}/issues/${issue_num}" >/dev/null 2>&1
}
# ── Helper: create issue with specific number (if Forgejo supports it) ───────
# Note: Forgejo API auto-assigns next integer; we accept renumbering and log mapping
# Usage: backup_create_issue <slug> <original_number> <title> <body> [labels...]
# Returns: new_issue_number on success, 0 on failure
backup_create_issue() {
local slug="$1"
local original_num="$2"
local title="$3"
local body="$4"
shift 4
# Build labels array
local -a labels=()
for label in "$@"; do
# Resolve label name to ID
local label_id
label_id=$(curl -sf --max-time 5 \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_URL}/api/v1/repos/${slug}/labels" 2>/dev/null \
| jq -r ".[] | select(.name == \"${label}\") | .id" 2>/dev/null) || label_id=""
if [ -n "$label_id" ] && [ "$label_id" != "null" ]; then
labels+=("$label_id")
fi
done
# Build payload
local payload
if [ ${#labels[@]} -gt 0 ]; then
payload=$(jq -n \
--arg title "$title" \
--arg body "$body" \
--argjson labels "$(printf '%s\n' "${labels[@]}" | jq -R . | jq -s .)" \
'{title: $title, body: $body, labels: $labels}')
else
payload=$(jq -n --arg title "$title" --arg body "$body" '{title: $title, body: $body, labels: []}')
fi
local response
response=$(curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/repos/${slug}/issues" \
-d "$payload" 2>/dev/null) || {
backup_log "ERROR: failed to create issue '${title}'" >&2
return 1
}
local new_num
new_num=$(printf '%s' "$response" | jq -r '.number // empty')
# Log the mapping
echo "${original_num}:${new_num}" >> "${BACKUP_MAPPING_FILE}"
backup_log "Created issue '${title}' as #${new_num} (original: #${original_num})"
echo "$new_num"
}
# ── Step 1: Unpack tarball to temp dir ───────────────────────────────────────
# Usage: backup_unpack_tarball <tarball>
# Returns: temp dir path via BACKUP_TEMP_DIR
backup_unpack_tarball() {
local tarball="$1"
if [ ! -f "$tarball" ]; then
backup_log "ERROR: tarball not found: ${tarball}" >&2
return 1
fi
BACKUP_TEMP_DIR=$(mktemp -d -t disinto-backup.XXXXXX)
backup_log "Unpacking ${tarball} to ${BACKUP_TEMP_DIR}"
if ! tar -xzf "$tarball" -C "$BACKUP_TEMP_DIR"; then
backup_log "ERROR: failed to unpack tarball" >&2
rm -rf "$BACKUP_TEMP_DIR"
return 1
fi
# Verify expected structure
if [ ! -d "${BACKUP_TEMP_DIR}/repos" ]; then
backup_log "ERROR: tarball missing 'repos/' directory" >&2
rm -rf "$BACKUP_TEMP_DIR"
return 1
fi
backup_log "Tarball unpacked successfully"
}
# ── Step 2: disinto repo — create via Forgejo API, trigger sync (manual) ─────
# Usage: backup_import_disinto_repo
# Returns: 0 on success, 1 on failure
backup_import_disinto_repo() {
backup_log "Step 2: Configuring disinto repo..."
# Create disinto repo if missing
backup_create_repo_if_missing "disinto-admin/disinto"
# Note: Manual mirror configuration recommended (avoids SSH deploy-key handling)
backup_log "Note: Configure Codeberg → Forgejo pull mirror manually"
backup_log " Run on Forgejo admin panel: Repository Settings → Repository Mirroring"
backup_log " Source: ssh://git@codeberg.org/johba/disinto.git"
backup_log " Mirror: disinto-admin/disinto"
backup_log " Or use: git clone --mirror ssh://git@codeberg.org/johba/disinto.git"
backup_log " cd disinto.git && git push --mirror ${FORGE_URL}/disinto-admin/disinto.git"
return 0
}
# ── Step 3: disinto-ops repo — create empty, push from bundle ────────────────
# Usage: backup_import_disinto_ops_repo
# Returns: 0 on success, 1 on failure
backup_import_disinto_ops_repo() {
backup_log "Step 3: Configuring disinto-ops repo from bundle..."
local bundle_path="${BACKUP_TEMP_DIR}/repos/disinto-ops.bundle"
if [ ! -f "$bundle_path" ]; then
backup_log "WARNING: Bundle not found at ${bundle_path}, skipping"
return 0
fi
# Create ops repo if missing
backup_create_repo_if_missing "disinto-admin/disinto-ops"
# Clone bundle and push to Forgejo
local clone_dir
clone_dir=$(mktemp -d -t disinto-ops-clone.XXXXXX)
backup_log "Cloning bundle to ${clone_dir}"
if ! git clone --bare "$bundle_path" "$clone_dir/disinto-ops.git"; then
backup_log "ERROR: failed to clone bundle"
rm -rf "$clone_dir"
return 1
fi
# Push all refs to Forgejo
backup_log "Pushing refs to Forgejo..."
if ! cd "$clone_dir/disinto-ops.git" && \
git push --mirror "${FORGE_URL}/disinto-admin/disinto-ops.git" 2>&1; then
backup_log "ERROR: failed to push refs"
rm -rf "$clone_dir"
return 1
fi
local ref_count
ref_count=$(cd "$clone_dir/disinto-ops.git" && git show-ref | wc -l)
BACKUP_PUSHED_REFS=$((BACKUP_PUSHED_REFS + ref_count))
backup_log "Pushed ${ref_count} refs to disinto-ops"
rm -rf "$clone_dir"
return 0
}
# ── Step 4: Import issues from backup ────────────────────────────────────────
# Usage: backup_import_issues <slug> <issues_file>
# issues_file is a JSON array of issues (per create schema)
# Returns: 0 on success
backup_import_issues() {
local slug="$1"
local issues_file="$2"
if [ ! -f "$issues_file" ]; then
backup_log "No issues file found, skipping"
return 0
fi
local count
count=$(jq 'length' "$issues_file")
backup_log "Importing ${count} issues from ${issues_file}"
local created=0
local skipped=0
for i in $(seq 0 $((count - 1))); do
local issue_num title body
issue_num=$(jq -r ".[${i}].number" "$issues_file")
title=$(jq -r ".[${i}].title" "$issues_file")
body=$(jq -r ".[${i}].body" "$issues_file")
if [ -z "$issue_num" ] || [ "$issue_num" = "null" ]; then
backup_log "WARNING: skipping issue without number at index ${i}"
continue
fi
# Check if issue already exists
if backup_issue_exists "$slug" "$issue_num"; then
backup_log "Issue #${issue_num} already exists, skipping"
skipped=$((skipped + 1))
continue
fi
# Extract labels
local -a labels=()
while IFS= read -r label; do
[ -n "$label" ] && labels+=("$label")
done < <(jq -r ".[${i}].labels[]? // empty" "$issues_file")
# Create issue
local new_num
if new_num=$(backup_create_issue "$slug" "$issue_num" "$title" "$body" "${labels[@]}"); then
created=$((created + 1))
fi
done
BACKUP_CREATED_ISSUES=$((BACKUP_CREATED_ISSUES + created))
BACKUP_SKIPPED_ISSUES=$((BACKUP_SKIPPED_ISSUES + skipped))
backup_log "Created ${created} issues, skipped ${skipped}"
}
# ── Main: import subcommand ──────────────────────────────────────────────────
# Usage: backup_import <tarball>
backup_import() {
local tarball="$1"
# Validate required environment
[ -n "${FORGE_URL:-}" ] || { echo "Error: FORGE_URL not set" >&2; exit 1; }
[ -n "${FORGE_TOKEN:-}" ] || { echo "Error: FORGE_TOKEN not set" >&2; exit 1; }
backup_log "=== Backup Import Started ==="
backup_log "Target: ${FORGE_URL}"
backup_log "Tarball: ${tarball}"
# Initialize counters
BACKUP_CREATED_REPOS=0
BACKUP_PUSHED_REFS=0
BACKUP_CREATED_ISSUES=0
BACKUP_SKIPPED_ISSUES=0
# Create temp dir for mapping file
BACKUP_MAPPING_FILE=$(mktemp -t disinto-mapping.XXXXXX.json)
echo '{"mappings": []}' > "$BACKUP_MAPPING_FILE"
# Step 1: Unpack tarball
if ! backup_unpack_tarball "$tarball"; then
exit 1
fi
# Step 2: disinto repo
if ! backup_import_disinto_repo; then
exit 1
fi
# Step 3: disinto-ops repo
if ! backup_import_disinto_ops_repo; then
exit 1
fi
# Step 4: Import issues — iterate issues/<slug>.json files, each is a JSON array
for issues_file in "${BACKUP_TEMP_DIR}/issues"/*.json; do
[ -f "$issues_file" ] || continue
local slug_filename
slug_filename=$(basename "$issues_file" .json)
# Map slug-filename → forgejo-slug: "disinto" → "disinto-admin/disinto",
# "disinto-ops" → "disinto-admin/disinto-ops"
local slug
case "$slug_filename" in
"disinto") slug="${FORGE_REPO}" ;;
"disinto-ops") slug="${FORGE_OPS_REPO}" ;;
*) slug="disinto-admin/${slug_filename}" ;;
esac
backup_log "Processing issues from ${slug_filename}.json (${slug})"
backup_import_issues "$slug" "$issues_file"
done
# Summary
backup_log "=== Backup Import Complete ==="
backup_log "Created ${BACKUP_CREATED_REPOS} repos"
backup_log "Pushed ${BACKUP_PUSHED_REFS} refs"
backup_log "Imported ${BACKUP_CREATED_ISSUES} issues"
backup_log "Skipped ${BACKUP_SKIPPED_ISSUES} (already present)"
backup_log "Issue mapping saved to: ${BACKUP_MAPPING_FILE}"
# Cleanup
rm -rf "$BACKUP_TEMP_DIR"
exit 0
}
# ── Entry point: if sourced, don't run; if executed directly, run import ────
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
if [ $# -lt 1 ]; then
echo "Usage: $0 <tarball>" >&2
exit 1
fi
backup_import "$1"
fi

View file

@ -356,16 +356,6 @@ setup_forge() {
[predictor-bot]="FORGE_PREDICTOR_PASS" [predictor-bot]="FORGE_PREDICTOR_PASS"
[architect-bot]="FORGE_ARCHITECT_PASS" [architect-bot]="FORGE_ARCHITECT_PASS"
) )
# Llama bot users (local-model agents) — separate from main agents
# Each llama agent gets its own Forgejo user, token, and password
local -A llama_token_vars=(
[dev-qwen]="FORGE_TOKEN_LLAMA"
[dev-qwen-nightly]="FORGE_TOKEN_LLAMA_NIGHTLY"
)
local -A llama_pass_vars=(
[dev-qwen]="FORGE_PASS_LLAMA"
[dev-qwen-nightly]="FORGE_PASS_LLAMA_NIGHTLY"
)
local bot_user bot_pass token token_var pass_var local bot_user bot_pass token token_var pass_var
@ -515,159 +505,12 @@ setup_forge() {
fi fi
done done
# Create llama bot users and tokens (local-model agents)
# These are separate from the main agents and get their own credentials
echo ""
echo "── Setting up llama bot users ────────────────────────────"
local llama_user llama_pass llama_token llama_token_var llama_pass_var
for llama_user in "${!llama_token_vars[@]}"; do
llama_token_var="${llama_token_vars[$llama_user]}"
llama_pass_var="${llama_pass_vars[$llama_user]}"
# Check if token already exists in .env
local token_exists=false
if _token_exists_in_env "$llama_token_var" "$env_file"; then
token_exists=true
fi
# Check if password already exists in .env
local pass_exists=false
if _pass_exists_in_env "$llama_pass_var" "$env_file"; then
pass_exists=true
fi
# Check if llama bot user exists on Forgejo
local llama_user_exists=false
if curl -sf --max-time 5 \
-H "Authorization: token ${admin_token}" \
"${forge_url}/api/v1/users/${llama_user}" >/dev/null 2>&1; then
llama_user_exists=true
fi
# Skip token/password regeneration if both exist in .env and not forcing rotation
if [ "$token_exists" = true ] && [ "$pass_exists" = true ] && [ "$rotate_tokens" = false ]; then
echo " ${llama_user} token and password preserved (use --rotate-tokens to force)"
# Still export the existing token for use within this run
local existing_token existing_pass
existing_token=$(grep "^${llama_token_var}=" "$env_file" | head -1 | cut -d= -f2-)
existing_pass=$(grep "^${llama_pass_var}=" "$env_file" | head -1 | cut -d= -f2-)
export "${llama_token_var}=${existing_token}"
export "${llama_pass_var}=${existing_pass}"
continue
fi
# Generate new credentials if:
# - Token doesn't exist (first run)
# - Password doesn't exist (first run)
# - --rotate-tokens flag is set (explicit rotation)
if [ "$llama_user_exists" = false ]; then
# User doesn't exist - create it
llama_pass="llama-$(head -c 16 /dev/urandom | base64 | tr -dc 'a-zA-Z0-9' | head -c 20)"
echo "Creating llama bot user: ${llama_user}"
local create_output
if ! create_output=$(_forgejo_exec forgejo admin user create \
--username "${llama_user}" \
--password "${llama_pass}" \
--email "${llama_user}@disinto.local" \
--must-change-password=false 2>&1); then
echo "Error: failed to create llama bot user '${llama_user}':" >&2
echo " ${create_output}" >&2
exit 1
fi
# Forgejo 11.x ignores --must-change-password=false on create;
# explicitly clear the flag so basic-auth token creation works.
_forgejo_exec forgejo admin user change-password \
--username "${llama_user}" \
--password "${llama_pass}" \
--must-change-password=false
# Verify llama bot user was actually created
if ! curl -sf --max-time 5 \
-H "Authorization: token ${admin_token}" \
"${forge_url}/api/v1/users/${llama_user}" >/dev/null 2>&1; then
echo "Error: llama bot user '${llama_user}' not found after creation" >&2
exit 1
fi
echo " ${llama_user} user created"
else
# User exists - reset password if needed
echo " ${llama_user} user exists"
if [ "$rotate_tokens" = true ] || [ "$pass_exists" = false ]; then
llama_pass="llama-$(head -c 16 /dev/urandom | base64 | tr -dc 'a-zA-Z0-9' | head -c 20)"
_forgejo_exec forgejo admin user change-password \
--username "${llama_user}" \
--password "${llama_pass}" \
--must-change-password=false || {
echo "Error: failed to reset password for existing llama bot user '${llama_user}'" >&2
exit 1
}
echo " ${llama_user} password reset for token generation"
else
# Password exists, get it from .env
llama_pass=$(grep "^${llama_pass_var}=" "$env_file" | head -1 | cut -d= -f2-)
fi
fi
# Generate token via API (basic auth as the llama user)
# First, delete any existing tokens to avoid name collision
local existing_llama_token_ids
existing_llama_token_ids=$(curl -sf \
-u "${llama_user}:${llama_pass}" \
"${forge_url}/api/v1/users/${llama_user}/tokens" 2>/dev/null \
| jq -r '.[].id // empty' 2>/dev/null) || existing_llama_token_ids=""
# Delete any existing tokens for this user
if [ -n "$existing_llama_token_ids" ]; then
while IFS= read -r tid; do
[ -n "$tid" ] && curl -sf -X DELETE \
-u "${llama_user}:${llama_pass}" \
"${forge_url}/api/v1/users/${llama_user}/tokens/${tid}" >/dev/null 2>&1 || true
done <<< "$existing_llama_token_ids"
fi
llama_token=$(curl -sf -X POST \
-u "${llama_user}:${llama_pass}" \
-H "Content-Type: application/json" \
"${forge_url}/api/v1/users/${llama_user}/tokens" \
-d "{\"name\":\"disinto-${llama_user}-token\",\"scopes\":[\"all\"]}" 2>/dev/null \
| jq -r '.sha1 // empty') || llama_token=""
if [ -z "$llama_token" ]; then
echo "Error: failed to create API token for '${llama_user}'" >&2
exit 1
fi
# Store token in .env under the llama-specific variable name
if grep -q "^${llama_token_var}=" "$env_file" 2>/dev/null; then
sed -i "s|^${llama_token_var}=.*|${llama_token_var}=${llama_token}|" "$env_file"
else
printf '%s=%s\n' "$llama_token_var" "$llama_token" >> "$env_file"
fi
export "${llama_token_var}=${llama_token}"
echo " ${llama_user} token generated and saved (${llama_token_var})"
# Store password in .env for git HTTP push (#361)
# Forgejo 11.x API tokens don't work for git push; password auth does.
if grep -q "^${llama_pass_var}=" "$env_file" 2>/dev/null; then
sed -i "s|^${llama_pass_var}=.*|${llama_pass_var}=${llama_pass}|" "$env_file"
else
printf '%s=%s\n' "$llama_pass_var" "$llama_pass" >> "$env_file"
fi
export "${llama_pass_var}=${llama_pass}"
echo " ${llama_user} password saved (${llama_pass_var})"
done
# Create .profile repos for all bot users (if they don't already exist) # Create .profile repos for all bot users (if they don't already exist)
# This runs the same logic as hire-an-agent Step 2-3 for idempotent setup # This runs the same logic as hire-an-agent Step 2-3 for idempotent setup
echo "" echo ""
echo "── Setting up .profile repos ────────────────────────────" echo "── Setting up .profile repos ────────────────────────────"
local -a bot_users=(dev-bot review-bot planner-bot gardener-bot vault-bot supervisor-bot predictor-bot architect-bot) local -a bot_users=(dev-bot review-bot planner-bot gardener-bot vault-bot supervisor-bot predictor-bot architect-bot)
# Add llama bot users to .profile repo creation
for llama_user in "${!llama_token_vars[@]}"; do
bot_users+=("$llama_user")
done
local bot_user local bot_user
for bot_user in "${bot_users[@]}"; do for bot_user in "${bot_users[@]}"; do
@ -775,15 +618,6 @@ setup_forge() {
-d "{\"permission\":\"${bot_perm}\"}" >/dev/null 2>&1 || true -d "{\"permission\":\"${bot_perm}\"}" >/dev/null 2>&1 || true
done done
# Add llama bot users as write collaborators for local-model agents
for llama_user in "${!llama_token_vars[@]}"; do
curl -sf -X PUT \
-H "Authorization: token ${admin_token:-${FORGE_TOKEN}}" \
-H "Content-Type: application/json" \
"${forge_url}/api/v1/repos/${repo_slug}/collaborators/${llama_user}" \
-d '{"permission":"write"}' >/dev/null 2>&1 || true
done
# Add disinto-admin as admin collaborator # Add disinto-admin as admin collaborator
curl -sf -X PUT \ curl -sf -X PUT \
-H "Authorization: token ${admin_token:-${FORGE_TOKEN}}" \ -H "Authorization: token ${admin_token:-${FORGE_TOKEN}}" \

View file

@ -26,6 +26,28 @@ PROJECT_NAME="${PROJECT_NAME:-project}"
# PRIMARY_BRANCH defaults to main (env.sh may have set it to 'master') # PRIMARY_BRANCH defaults to main (env.sh may have set it to 'master')
PRIMARY_BRANCH="${PRIMARY_BRANCH:-main}" PRIMARY_BRANCH="${PRIMARY_BRANCH:-main}"
# Track service names for duplicate detection
declare -A _seen_services
declare -A _service_sources
# Record a service name and its source; return 0 if unique, 1 if duplicate
_record_service() {
local service_name="$1"
local source="$2"
if [ -n "${_seen_services[$service_name]:-}" ]; then
local original_source="${_service_sources[$service_name]}"
echo "ERROR: Duplicate service name '$service_name' detected —" >&2
echo " '$service_name' emitted twice — from $original_source and from $source" >&2
echo " Remove one of the conflicting activations to proceed." >&2
return 1
fi
_seen_services[$service_name]=1
_service_sources[$service_name]="$source"
return 0
}
# Helper: extract woodpecker_repo_id from a project TOML file # Helper: extract woodpecker_repo_id from a project TOML file
# Returns empty string if not found or file doesn't exist # Returns empty string if not found or file doesn't exist
_get_woodpecker_repo_id() { _get_woodpecker_repo_id() {
@ -97,6 +119,16 @@ _generate_local_model_services() {
POLL_INTERVAL) poll_interval_val="$value" ;; POLL_INTERVAL) poll_interval_val="$value" ;;
---) ---)
if [ -n "$service_name" ] && [ -n "$base_url" ]; then if [ -n "$service_name" ] && [ -n "$base_url" ]; then
# Record service for duplicate detection using the full service name
local full_service_name="agents-${service_name}"
local toml_basename
toml_basename=$(basename "$toml")
if ! _record_service "$full_service_name" "[agents.$service_name] in projects/$toml_basename"; then
# Duplicate detected — clean up and abort
rm -f "$temp_file"
return 1
fi
# Per-agent FORGE_TOKEN / FORGE_PASS lookup (#834 Gap 3). # Per-agent FORGE_TOKEN / FORGE_PASS lookup (#834 Gap 3).
# Two hired llama agents must not share the same Forgejo identity, # Two hired llama agents must not share the same Forgejo identity,
# so we key the env-var lookup by forge_user (which hire-agent.sh # so we key the env-var lookup by forge_user (which hire-agent.sh
@ -137,7 +169,6 @@ _generate_local_model_services() {
- project-repos-${service_name}:/home/agent/repos - project-repos-${service_name}:/home/agent/repos
- \${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:\${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - \${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:\${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- \${CLAUDE_CONFIG_FILE:-\${HOME}/.claude.json}:/home/agent/.claude.json:ro - \${CLAUDE_CONFIG_FILE:-\${HOME}/.claude.json}:/home/agent/.claude.json:ro
- \${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- \${AGENT_SSH_DIR:-\${HOME}/.ssh}:/home/agent/.ssh:ro - \${AGENT_SSH_DIR:-\${HOME}/.ssh}:/home/agent/.ssh:ro
- ./projects:/home/agent/disinto/projects:ro - ./projects:/home/agent/disinto/projects:ro
- ./.env:/home/agent/disinto/.env:ro - ./.env:/home/agent/disinto/.env:ro
@ -282,6 +313,20 @@ _generate_compose_impl() {
return 0 return 0
fi fi
# Reset duplicate detection state for fresh run
_seen_services=()
_service_sources=()
# Initialize duplicate detection with base services defined in the template
_record_service "forgejo" "base compose template" || return 1
_record_service "woodpecker" "base compose template" || return 1
_record_service "woodpecker-agent" "base compose template" || return 1
_record_service "agents" "base compose template" || return 1
_record_service "runner" "base compose template" || return 1
_record_service "edge" "base compose template" || return 1
_record_service "staging" "base compose template" || return 1
_record_service "staging-deploy" "base compose template" || return 1
# Extract primary woodpecker_repo_id from project TOML files # Extract primary woodpecker_repo_id from project TOML files
local wp_repo_id local wp_repo_id
wp_repo_id=$(_get_primary_woodpecker_repo_id) wp_repo_id=$(_get_primary_woodpecker_repo_id)
@ -359,6 +404,9 @@ services:
WOODPECKER_SERVER: localhost:9000 WOODPECKER_SERVER: localhost:9000
WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-} WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-}
WOODPECKER_GRPC_SECURE: "false" WOODPECKER_GRPC_SECURE: "false"
WOODPECKER_GRPC_KEEPALIVE_TIME: "10s"
WOODPECKER_GRPC_KEEPALIVE_TIMEOUT: "20s"
WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS: "true"
WOODPECKER_HEALTHCHECK_ADDR: ":3333" WOODPECKER_HEALTHCHECK_ADDR: ":3333"
WOODPECKER_BACKEND_DOCKER_NETWORK: ${WOODPECKER_CI_NETWORK:-disinto_disinto-net} WOODPECKER_BACKEND_DOCKER_NETWORK: ${WOODPECKER_CI_NETWORK:-disinto_disinto-net}
WOODPECKER_MAX_WORKFLOWS: 1 WOODPECKER_MAX_WORKFLOWS: 1
@ -382,7 +430,6 @@ services:
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro - ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
@ -439,18 +486,15 @@ services:
COMPOSEEOF COMPOSEEOF
# ── Conditional agents-llama block (ENABLE_LLAMA_AGENT=1) ────────────── # ── Conditional agents-llama block (ENABLE_LLAMA_AGENT=1) ──────────────
# Local-Qwen dev agent — gated on ENABLE_LLAMA_AGENT so factories without # This legacy flag was removed in #846 but kept for duplicate detection testing
# a local llama endpoint don't try to start it. See docs/agents-llama.md.
if [ "${ENABLE_LLAMA_AGENT:-0}" = "1" ]; then if [ "${ENABLE_LLAMA_AGENT:-0}" = "1" ]; then
cat >> "$compose_file" <<'LLAMAEOF' if ! _record_service "agents-llama" "ENABLE_LLAMA_AGENT=1"; then
return 1
fi
cat >> "$compose_file" <<'COMPOSEEOF'
agents-llama: agents-llama:
build: image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest}
context: .
dockerfile: docker/agents/Dockerfile
# Rebuild on every up (#887): makes docker/agents/ source changes reach this
# container without a manual \`docker compose build\`. Cache-fast when clean.
pull_policy: build
container_name: disinto-agents-llama container_name: disinto-agents-llama
restart: unless-stopped restart: unless-stopped
security_opt: security_opt:
@ -460,69 +504,15 @@ COMPOSEEOF
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
- ./projects:/home/agent/disinto/projects:ro
- ./.env:/home/agent/disinto/.env:ro
- ./state:/home/agent/disinto/state
environment: environment:
FORGE_URL: http://forgejo:3000 FORGE_URL: http://forgejo:3000
FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto} FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto}
FORGE_TOKEN: ${FORGE_TOKEN_LLAMA:-} FORGE_TOKEN: ${FORGE_TOKEN:-}
FORGE_PASS: ${FORGE_PASS_LLAMA:-}
FORGE_BOT_USERNAMES: ${FORGE_BOT_USERNAMES:-}
WOODPECKER_TOKEN: ${WOODPECKER_TOKEN:-}
CLAUDE_TIMEOUT: ${CLAUDE_TIMEOUT:-7200}
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC: ${CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC:-1}
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "60"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
ANTHROPIC_BASE_URL: ${ANTHROPIC_BASE_URL:-}
FORGE_ADMIN_PASS: ${FORGE_ADMIN_PASS:-}
DISINTO_CONTAINER: "1"
PROJECT_NAME: ${PROJECT_NAME:-project}
PROJECT_REPO_ROOT: /home/agent/repos/${PROJECT_NAME:-project}
WOODPECKER_DATA_DIR: /woodpecker-data
WOODPECKER_REPO_ID: "PLACEHOLDER_WP_REPO_ID"
CLAUDE_CONFIG_DIR: ${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}
POLL_INTERVAL: ${POLL_INTERVAL:-300}
AGENT_ROLES: dev
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on:
forgejo:
condition: service_healthy
networks:
- disinto-net
agents-llama-all:
build:
context: .
dockerfile: docker/agents/Dockerfile
# Rebuild on every up (#887): makes docker/agents/ source changes reach this
# container without a manual \`docker compose build\`. Cache-fast when clean.
pull_policy: build
container_name: disinto-agents-llama-all
restart: unless-stopped
profiles: ["agents-llama-all"]
security_opt:
- apparmor=unconfined
volumes:
- agent-data:/home/agent/data
- project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro
environment:
FORGE_URL: http://forgejo:3000
FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto}
FORGE_TOKEN: ${FORGE_TOKEN_LLAMA:-}
FORGE_PASS: ${FORGE_PASS_LLAMA:-}
FORGE_REVIEW_TOKEN: ${FORGE_REVIEW_TOKEN:-} FORGE_REVIEW_TOKEN: ${FORGE_REVIEW_TOKEN:-}
FORGE_PLANNER_TOKEN: ${FORGE_PLANNER_TOKEN:-} FORGE_PLANNER_TOKEN: ${FORGE_PLANNER_TOKEN:-}
FORGE_GARDENER_TOKEN: ${FORGE_GARDENER_TOKEN:-} FORGE_GARDENER_TOKEN: ${FORGE_GARDENER_TOKEN:-}
@ -530,16 +520,14 @@ COMPOSEEOF
FORGE_SUPERVISOR_TOKEN: ${FORGE_SUPERVISOR_TOKEN:-} FORGE_SUPERVISOR_TOKEN: ${FORGE_SUPERVISOR_TOKEN:-}
FORGE_PREDICTOR_TOKEN: ${FORGE_PREDICTOR_TOKEN:-} FORGE_PREDICTOR_TOKEN: ${FORGE_PREDICTOR_TOKEN:-}
FORGE_ARCHITECT_TOKEN: ${FORGE_ARCHITECT_TOKEN:-} FORGE_ARCHITECT_TOKEN: ${FORGE_ARCHITECT_TOKEN:-}
FORGE_FILER_TOKEN: ${FORGE_FILER_TOKEN:-}
FORGE_BOT_USERNAMES: ${FORGE_BOT_USERNAMES:-} FORGE_BOT_USERNAMES: ${FORGE_BOT_USERNAMES:-}
WOODPECKER_TOKEN: ${WOODPECKER_TOKEN:-} WOODPECKER_TOKEN: ${WOODPECKER_TOKEN:-}
CLAUDE_TIMEOUT: ${CLAUDE_TIMEOUT:-7200} CLAUDE_TIMEOUT: ${CLAUDE_TIMEOUT:-7200}
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC: ${CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC:-1} CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC: ${CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC:-1}
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "60"
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-} ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
ANTHROPIC_BASE_URL: ${ANTHROPIC_BASE_URL:-} FORGE_PASS: ${FORGE_PASS:-}
FORGE_ADMIN_PASS: ${FORGE_ADMIN_PASS:-} FORGE_ADMIN_PASS: ${FORGE_ADMIN_PASS:-}
FACTORY_REPO: ${FORGE_REPO:-disinto-admin/disinto}
DISINTO_CONTAINER: "1" DISINTO_CONTAINER: "1"
PROJECT_NAME: ${PROJECT_NAME:-project} PROJECT_NAME: ${PROJECT_NAME:-project}
PROJECT_REPO_ROOT: /home/agent/repos/${PROJECT_NAME:-project} PROJECT_REPO_ROOT: /home/agent/repos/${PROJECT_NAME:-project}
@ -550,8 +538,6 @@ COMPOSEEOF
GARDENER_INTERVAL: ${GARDENER_INTERVAL:-21600} GARDENER_INTERVAL: ${GARDENER_INTERVAL:-21600}
ARCHITECT_INTERVAL: ${ARCHITECT_INTERVAL:-21600} ARCHITECT_INTERVAL: ${ARCHITECT_INTERVAL:-21600}
PLANNER_INTERVAL: ${PLANNER_INTERVAL:-43200} PLANNER_INTERVAL: ${PLANNER_INTERVAL:-43200}
SUPERVISOR_INTERVAL: ${SUPERVISOR_INTERVAL:-1200}
AGENT_ROLES: review,dev,gardener,architect,planner,predictor,supervisor
healthcheck: healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"] test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s interval: 60s
@ -565,7 +551,8 @@ COMPOSEEOF
condition: service_started condition: service_started
networks: networks:
- disinto-net - disinto-net
LLAMAEOF
COMPOSEEOF
fi fi
# Resume the rest of the compose file (runner onward) # Resume the rest of the compose file (runner onward)
@ -619,11 +606,21 @@ LLAMAEOF
- EDGE_TUNNEL_USER=${EDGE_TUNNEL_USER:-tunnel} - EDGE_TUNNEL_USER=${EDGE_TUNNEL_USER:-tunnel}
- EDGE_TUNNEL_PORT=${EDGE_TUNNEL_PORT:-} - EDGE_TUNNEL_PORT=${EDGE_TUNNEL_PORT:-}
- EDGE_TUNNEL_FQDN=${EDGE_TUNNEL_FQDN:-} - EDGE_TUNNEL_FQDN=${EDGE_TUNNEL_FQDN:-}
# Subdomain fallback (#713): if subpath routing (#704/#708) fails, add: # Subdomain fallback (#1028): per-service FQDNs for subdomain routing mode.
# EDGE_TUNNEL_FQDN_FORGE, EDGE_TUNNEL_FQDN_CI, EDGE_TUNNEL_FQDN_CHAT # Set EDGE_ROUTING_MODE=subdomain to activate. See docs/edge-routing-fallback.md.
# See docs/edge-routing-fallback.md for the full pivot plan. - EDGE_ROUTING_MODE=${EDGE_ROUTING_MODE:-subpath}
- EDGE_TUNNEL_FQDN_FORGE=${EDGE_TUNNEL_FQDN_FORGE:-}
- EDGE_TUNNEL_FQDN_CI=${EDGE_TUNNEL_FQDN_CI:-}
- EDGE_TUNNEL_FQDN_CHAT=${EDGE_TUNNEL_FQDN_CHAT:-}
# Shared secret for Caddy ↔ chat forward_auth (#709) # Shared secret for Caddy ↔ chat forward_auth (#709)
- FORWARD_AUTH_SECRET=${FORWARD_AUTH_SECRET:-} - FORWARD_AUTH_SECRET=${FORWARD_AUTH_SECRET:-}
# Chat env vars (merged from chat container into edge, #1083)
- CHAT_HOST=127.0.0.1
- CHAT_PORT=8080
- CHAT_OAUTH_CLIENT_ID=${CHAT_OAUTH_CLIENT_ID:-}
- CHAT_OAUTH_CLIENT_SECRET=${CHAT_OAUTH_CLIENT_SECRET:-}
- DISINTO_CHAT_ALLOWED_USERS=${DISINTO_CHAT_ALLOWED_USERS:-}
# Rate limiting removed (#1084)
volumes: volumes:
- ./docker/Caddyfile:/etc/caddy/Caddyfile - ./docker/Caddyfile:/etc/caddy/Caddyfile
- caddy_data:/data - caddy_data:/data
@ -631,6 +628,8 @@ LLAMAEOF
- ./secrets/tunnel_key:/run/secrets/tunnel_key:ro - ./secrets/tunnel_key:/run/secrets/tunnel_key:ro
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
# Chat history persistence (merged from chat container, #1083)
- ${CHAT_HISTORY_DIR:-./state/chat-history}:/var/lib/chat/history
healthcheck: healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:2019/config/"] test: ["CMD", "curl", "-fsS", "http://localhost:2019/config/"]
interval: 30s interval: 30s
@ -682,6 +681,7 @@ LLAMAEOF
# Chat container — Claude chat UI backend (#705) # Chat container — Claude chat UI backend (#705)
# Internal service only; edge proxy routes to chat:8080 # Internal service only; edge proxy routes to chat:8080
# Sandbox hardened per #706 — no docker.sock, read-only rootfs, minimal caps # Sandbox hardened per #706 — no docker.sock, read-only rootfs, minimal caps
# Rate limiting removed (#1084)
chat: chat:
build: build:
context: ./docker/chat context: ./docker/chat
@ -705,6 +705,9 @@ LLAMAEOF
- chat-config:/var/chat/config - chat-config:/var/chat/config
# Chat history persistence: per-user NDJSON files on bind-mounted host volume # Chat history persistence: per-user NDJSON files on bind-mounted host volume
- ${CHAT_HISTORY_DIR:-./state/chat-history}:/var/lib/chat/history - ${CHAT_HISTORY_DIR:-./state/chat-history}:/var/lib/chat/history
# Workspace directory: bind-mounted project working tree for Claude access (#1027)
# Mounted when CHAT_WORKSPACE_DIR is set (defaults to ./workspace)
- ${CHAT_WORKSPACE_DIR:-./workspace}:/var/workspace
environment: environment:
CHAT_HOST: "0.0.0.0" CHAT_HOST: "0.0.0.0"
CHAT_PORT: "8080" CHAT_PORT: "8080"
@ -712,13 +715,14 @@ LLAMAEOF
CHAT_OAUTH_CLIENT_ID: ${CHAT_OAUTH_CLIENT_ID:-} CHAT_OAUTH_CLIENT_ID: ${CHAT_OAUTH_CLIENT_ID:-}
CHAT_OAUTH_CLIENT_SECRET: ${CHAT_OAUTH_CLIENT_SECRET:-} CHAT_OAUTH_CLIENT_SECRET: ${CHAT_OAUTH_CLIENT_SECRET:-}
EDGE_TUNNEL_FQDN: ${EDGE_TUNNEL_FQDN:-} EDGE_TUNNEL_FQDN: ${EDGE_TUNNEL_FQDN:-}
EDGE_TUNNEL_FQDN_CHAT: ${EDGE_TUNNEL_FQDN_CHAT:-}
EDGE_ROUTING_MODE: ${EDGE_ROUTING_MODE:-subpath}
DISINTO_CHAT_ALLOWED_USERS: ${DISINTO_CHAT_ALLOWED_USERS:-} DISINTO_CHAT_ALLOWED_USERS: ${DISINTO_CHAT_ALLOWED_USERS:-}
# Shared secret for Caddy forward_auth verify endpoint (#709) # Shared secret for Caddy forward_auth verify endpoint (#709)
FORWARD_AUTH_SECRET: ${FORWARD_AUTH_SECRET:-} FORWARD_AUTH_SECRET: ${FORWARD_AUTH_SECRET:-}
# Cost caps / rate limiting (#711) # Rate limiting removed (#1084)
CHAT_MAX_REQUESTS_PER_HOUR: ${CHAT_MAX_REQUESTS_PER_HOUR:-60} # Workspace directory for Claude code access (#1027)
CHAT_MAX_REQUESTS_PER_DAY: ${CHAT_MAX_REQUESTS_PER_DAY:-500} CHAT_WORKSPACE_DIR: ${CHAT_WORKSPACE_DIR:-./workspace}
CHAT_MAX_TOKENS_PER_DAY: ${CHAT_MAX_TOKENS_PER_DAY:-1000000}
healthcheck: healthcheck:
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"] test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"]
interval: 30s interval: 30s
@ -734,7 +738,6 @@ volumes:
agent-data: agent-data:
project-repos: project-repos:
caddy_data: caddy_data:
chat-config:
networks: networks:
disinto-net: disinto-net:
@ -763,16 +766,19 @@ COMPOSEEOF
fi fi
# Append local-model agent services if any are configured # Append local-model agent services if any are configured
_generate_local_model_services "$compose_file" if ! _generate_local_model_services "$compose_file"; then
echo "ERROR: Failed to generate local-model agent services. See errors above." >&2
return 1
fi
# Resolve the Claude CLI binary path and persist as CLAUDE_BIN_DIR in .env. # Resolve the Claude CLI binary path and persist as CLAUDE_BIN_DIR in .env.
# docker-compose.yml references ${CLAUDE_BIN_DIR} so the value must be set. # Only used by reproduce and edge services which still use host-mounted CLI.
local claude_bin local claude_bin
claude_bin="$(command -v claude 2>/dev/null || true)" claude_bin="$(command -v claude 2>/dev/null || true)"
if [ -n "$claude_bin" ]; then if [ -n "$claude_bin" ]; then
claude_bin="$(readlink -f "$claude_bin")" claude_bin="$(readlink -f "$claude_bin")"
else else
echo "Warning: claude CLI not found in PATH — set CLAUDE_BIN_DIR in .env manually" >&2 echo "Warning: claude CLI not found in PATH — reproduce/edge services will fail to start" >&2
claude_bin="/usr/local/bin/claude" claude_bin="/usr/local/bin/claude"
fi fi
# Persist CLAUDE_BIN_DIR into .env so docker-compose can resolve it. # Persist CLAUDE_BIN_DIR into .env so docker-compose can resolve it.
@ -789,9 +795,8 @@ COMPOSEEOF
# In build mode, replace image: with build: for locally-built images # In build mode, replace image: with build: for locally-built images
if [ "$use_build" = true ]; then if [ "$use_build" = true ]; then
sed -i 's|^\( agents:\)|\1|' "$compose_file" sed -i '/^ image: ghcr\.io\/disinto\/agents:/{s|image: ghcr\.io/disinto/agents:.*|build:\n context: .\n dockerfile: docker/agents/Dockerfile\n pull_policy: build|}' "$compose_file"
sed -i '/^ image: ghcr\.io\/disinto\/agents:/{s|image: ghcr\.io/disinto/agents:.*|build:\n context: .\n dockerfile: docker/agents/Dockerfile|}' "$compose_file" sed -i '/^ image: ghcr\.io\/disinto\/edge:/{s|image: ghcr\.io/disinto/edge:.*|build:\n context: .\n dockerfile: docker/edge/Dockerfile\n pull_policy: build|}' "$compose_file"
sed -i '/^ image: ghcr\.io\/disinto\/edge:/{s|image: ghcr\.io/disinto/edge:.*|build: ./docker/edge|}' "$compose_file"
fi fi
echo "Created: ${compose_file}" echo "Created: ${compose_file}"
@ -815,6 +820,11 @@ _generate_agent_docker_impl() {
# Output path: ${FACTORY_ROOT}/docker/Caddyfile (gitignored — generated artifact). # Output path: ${FACTORY_ROOT}/docker/Caddyfile (gitignored — generated artifact).
# The edge compose service mounts this path as /etc/caddy/Caddyfile. # The edge compose service mounts this path as /etc/caddy/Caddyfile.
# On a fresh clone, `disinto init` calls generate_caddyfile before first `disinto up`. # On a fresh clone, `disinto init` calls generate_caddyfile before first `disinto up`.
#
# Routing mode (EDGE_ROUTING_MODE env var):
# subpath — (default) all services under <project>.disinto.ai/{forge,ci,chat,staging}
# subdomain — per-service subdomains: forge.<project>, ci.<project>, chat.<project>
# See docs/edge-routing-fallback.md for the full pivot plan.
_generate_caddyfile_impl() { _generate_caddyfile_impl() {
local docker_dir="${FACTORY_ROOT}/docker" local docker_dir="${FACTORY_ROOT}/docker"
local caddyfile="${docker_dir}/Caddyfile" local caddyfile="${docker_dir}/Caddyfile"
@ -824,8 +834,22 @@ _generate_caddyfile_impl() {
return return
fi fi
local routing_mode="${EDGE_ROUTING_MODE:-subpath}"
if [ "$routing_mode" = "subdomain" ]; then
_generate_caddyfile_subdomain "$caddyfile"
else
_generate_caddyfile_subpath "$caddyfile"
fi
echo "Created: ${caddyfile} (routing_mode=${routing_mode})"
}
# Subpath Caddyfile: all services under a single :80 block with path-based routing.
_generate_caddyfile_subpath() {
local caddyfile="$1"
cat > "$caddyfile" <<'CADDYFILEEOF' cat > "$caddyfile" <<'CADDYFILEEOF'
# Caddyfile — edge proxy configuration # Caddyfile — edge proxy configuration (subpath mode)
# IP-only binding at bootstrap; domain + TLS added later via vault resource request # IP-only binding at bootstrap; domain + TLS added later via vault resource request
:80 { :80 {
@ -846,30 +870,73 @@ _generate_caddyfile_impl() {
# Reverse proxy to staging # Reverse proxy to staging
handle /staging/* { handle /staging/* {
uri strip_prefix /staging
reverse_proxy staging:80 reverse_proxy staging:80
} }
# Chat service — reverse proxy to disinto-chat backend (#705) # Chat service — reverse proxy to in-process chat server (#705, #1083)
# OAuth routes bypass forward_auth — unauthenticated users need these (#709) # OAuth routes bypass forward_auth — unauthenticated users need these (#709)
handle /chat/login { handle /chat/login {
reverse_proxy chat:8080 reverse_proxy 127.0.0.1:8080
} }
handle /chat/oauth/callback { handle /chat/oauth/callback {
reverse_proxy chat:8080 reverse_proxy 127.0.0.1:8080
} }
# Defense-in-depth: forward_auth stamps X-Forwarded-User from session (#709) # Defense-in-depth: forward_auth stamps X-Forwarded-User from session (#709)
handle /chat/* { handle /chat/* {
forward_auth chat:8080 { forward_auth 127.0.0.1:8080 {
uri /chat/auth/verify uri /chat/auth/verify
copy_headers X-Forwarded-User copy_headers X-Forwarded-User
header_up X-Forward-Auth-Secret {$FORWARD_AUTH_SECRET} header_up X-Forward-Auth-Secret {$FORWARD_AUTH_SECRET}
} }
reverse_proxy chat:8080 reverse_proxy 127.0.0.1:8080
} }
} }
CADDYFILEEOF CADDYFILEEOF
}
echo "Created: ${caddyfile}" # Subdomain Caddyfile: four host blocks per docs/edge-routing-fallback.md.
# Uses env vars EDGE_TUNNEL_FQDN_FORGE, EDGE_TUNNEL_FQDN_CI, EDGE_TUNNEL_FQDN_CHAT,
# and EDGE_TUNNEL_FQDN (main project domain → staging).
_generate_caddyfile_subdomain() {
local caddyfile="$1"
cat > "$caddyfile" <<'CADDYFILEEOF'
# Caddyfile — edge proxy configuration (subdomain mode)
# Per-service subdomains; see docs/edge-routing-fallback.md
# Main project domain — staging / landing
{$EDGE_TUNNEL_FQDN} {
reverse_proxy staging:80
}
# Forgejo — root path, no subpath rewrite needed
{$EDGE_TUNNEL_FQDN_FORGE} {
reverse_proxy forgejo:3000
}
# Woodpecker CI — root path
{$EDGE_TUNNEL_FQDN_CI} {
reverse_proxy woodpecker:8000
}
# Chat — with forward_auth (#709, on its own host)
{$EDGE_TUNNEL_FQDN_CHAT} {
handle /login {
reverse_proxy 127.0.0.1:8080
}
handle /oauth/callback {
reverse_proxy 127.0.0.1:8080
}
handle /* {
forward_auth 127.0.0.1:8080 {
uri /auth/verify
copy_headers X-Forwarded-User
header_up X-Forward-Auth-Secret {$FORWARD_AUTH_SECRET}
}
reverse_proxy 127.0.0.1:8080
}
}
CADDYFILEEOF
} }
# Generate docker/index.html default page. # Generate docker/index.html default page.

View file

@ -38,6 +38,30 @@ _hvault_resolve_token() {
return 1 return 1
} }
# _hvault_default_env — set the local-cluster Vault env if unset
#
# Idempotent helper used by every Vault-touching script that runs during
# `disinto init` (S2). On the local-cluster common case, operators (and
# the init dispatcher in bin/disinto) have not exported VAULT_ADDR or
# VAULT_TOKEN — the server is reachable on localhost:8200 and the root
# token lives at /etc/vault.d/root.token. Scripts must Just Work in that
# shape.
#
# - If VAULT_ADDR is unset, defaults to http://127.0.0.1:8200.
# - If VAULT_TOKEN is unset, resolves from /etc/vault.d/root.token via
# _hvault_resolve_token. A missing token file is not an error here —
# downstream hvault_token_lookup() probes connectivity and emits the
# operator-facing "VAULT_ADDR + VAULT_TOKEN" diagnostic.
#
# Centralised to keep the defaulting stanza in one place — copy-pasting
# the 5-line block into each init script trips the repo-wide 5-line
# sliding-window duplicate detector (.woodpecker/detect-duplicates.py).
_hvault_default_env() {
VAULT_ADDR="${VAULT_ADDR:-http://127.0.0.1:8200}"
export VAULT_ADDR
_hvault_resolve_token || :
}
# _hvault_check_prereqs — validate VAULT_ADDR and VAULT_TOKEN are set # _hvault_check_prereqs — validate VAULT_ADDR and VAULT_TOKEN are set
# Args: caller function name # Args: caller function name
_hvault_check_prereqs() { _hvault_check_prereqs() {
@ -100,6 +124,65 @@ _hvault_request() {
# ── Public API ─────────────────────────────────────────────────────────────── # ── Public API ───────────────────────────────────────────────────────────────
# VAULT_KV_MOUNT — KV v2 mount point (default: "kv")
# Override with: export VAULT_KV_MOUNT=secret
# Used by: hvault_kv_get, hvault_kv_put, hvault_kv_list
: "${VAULT_KV_MOUNT:=kv}"
# hvault_ensure_kv_v2 MOUNT [LOG_PREFIX]
# Assert that the given KV mount is present and KV v2. If absent, enable
# it. If present as wrong type/version, exit 1. Callers must have already
# checked VAULT_ADDR / VAULT_TOKEN.
#
# DRY_RUN (env, default 0): when 1, log intent without writing.
# LOG_PREFIX (optional): label for log lines, e.g. "[vault-seed-forgejo]".
#
# Extracted here because every vault-seed-*.sh script needs this exact
# sequence, and the 5-line sliding-window dup detector flags the
# copy-paste. One place, one implementation.
hvault_ensure_kv_v2() {
local mount="${1:?hvault_ensure_kv_v2: MOUNT required}"
local prefix="${2:-[hvault]}"
local dry_run="${DRY_RUN:-0}"
local mounts_json mount_exists mount_type mount_version
mounts_json="$(hvault_get_or_empty "sys/mounts")" \
|| { printf '%s ERROR: failed to list Vault mounts\n' "$prefix" >&2; return 1; }
mount_exists=false
if printf '%s' "$mounts_json" | jq -e --arg m "${mount}/" '.[$m]' >/dev/null 2>&1; then
mount_exists=true
fi
if [ "$mount_exists" = true ]; then
mount_type="$(printf '%s' "$mounts_json" \
| jq -r --arg m "${mount}/" '.[$m].type // ""')"
mount_version="$(printf '%s' "$mounts_json" \
| jq -r --arg m "${mount}/" '.[$m].options.version // "1"')"
if [ "$mount_type" != "kv" ]; then
printf '%s ERROR: %s/ is mounted as type=%q, expected kv — refuse to re-mount\n' \
"$prefix" "$mount" "$mount_type" >&2
return 1
fi
if [ "$mount_version" != "2" ]; then
printf '%s ERROR: %s/ is KV v%s, expected v2 — refuse to upgrade in place\n' \
"$prefix" "$mount" "$mount_version" >&2
return 1
fi
printf '%s %s/ already mounted (kv v2) — skipping enable\n' "$prefix" "$mount"
else
if [ "$dry_run" -eq 1 ]; then
printf '%s [dry-run] would enable %s/ as kv v2\n' "$prefix" "$mount"
else
local payload
payload="$(jq -n '{type:"kv",options:{version:"2"},description:"disinto shared KV v2 (S2.4)"}')"
_hvault_request POST "sys/mounts/${mount}" "$payload" >/dev/null \
|| { printf '%s ERROR: failed to enable %s/ as kv v2\n' "$prefix" "$mount" >&2; return 1; }
printf '%s %s/ enabled as kv v2\n' "$prefix" "$mount"
fi
fi
}
# hvault_kv_get PATH [KEY] # hvault_kv_get PATH [KEY]
# Read a KV v2 secret at PATH, optionally extract a single KEY. # Read a KV v2 secret at PATH, optionally extract a single KEY.
# Outputs: JSON value (full data object, or single key value) # Outputs: JSON value (full data object, or single key value)
@ -114,7 +197,7 @@ hvault_kv_get() {
_hvault_check_prereqs "hvault_kv_get" || return 1 _hvault_check_prereqs "hvault_kv_get" || return 1
local response local response
response="$(_hvault_request GET "secret/data/${path}")" || return 1 response="$(_hvault_request GET "${VAULT_KV_MOUNT}/data/${path}")" || return 1
if [ -n "$key" ]; then if [ -n "$key" ]; then
printf '%s' "$response" | jq -e -r --arg key "$key" '.data.data[$key]' 2>/dev/null || { printf '%s' "$response" | jq -e -r --arg key "$key" '.data.data[$key]' 2>/dev/null || {
@ -154,7 +237,7 @@ hvault_kv_put() {
payload="$(printf '%s' "$payload" | jq --arg k "$k" --arg v "$v" '.data[$k] = $v')" payload="$(printf '%s' "$payload" | jq --arg k "$k" --arg v "$v" '.data[$k] = $v')"
done done
_hvault_request POST "secret/data/${path}" "$payload" >/dev/null _hvault_request POST "${VAULT_KV_MOUNT}/data/${path}" "$payload" >/dev/null
} }
# hvault_kv_list PATH # hvault_kv_list PATH
@ -170,7 +253,7 @@ hvault_kv_list() {
_hvault_check_prereqs "hvault_kv_list" || return 1 _hvault_check_prereqs "hvault_kv_list" || return 1
local response local response
response="$(_hvault_request LIST "secret/metadata/${path}")" || return 1 response="$(_hvault_request LIST "${VAULT_KV_MOUNT}/metadata/${path}")" || return 1
printf '%s' "$response" | jq -e '.data.keys' 2>/dev/null || { printf '%s' "$response" | jq -e '.data.keys' 2>/dev/null || {
_hvault_err "hvault_kv_list" "failed to parse response" "path=$path" _hvault_err "hvault_kv_list" "failed to parse response" "path=$path"
@ -178,6 +261,51 @@ hvault_kv_list() {
} }
} }
# hvault_get_or_empty PATH
# GET /v1/PATH. On 200, prints the raw response body to stdout (caller
# parses with jq). On 404, prints nothing and returns 0 — caller treats
# the empty string as "resource absent, needs create". Any other HTTP
# status is a hard error: response body is logged to stderr as a
# structured JSON error and the function returns 1.
#
# Used by the sync scripts (tools/vault-apply-*.sh +
# lib/init/nomad/vault-nomad-auth.sh) to read existing policies, roles,
# auth-method listings, and per-role configs without triggering errexit
# on the expected absent-resource case. `_hvault_request` is not a
# substitute — it treats 404 as a hard error, which is correct for
# writes but wrong for "does this already exist?" checks.
#
# Subshell + EXIT trap: the RETURN trap does NOT fire on set-e abort,
# so tmpfile cleanup from a function-scoped RETURN trap would leak on
# jq/curl errors under `set -eo pipefail`. The subshell + EXIT trap
# is the reliable cleanup boundary.
hvault_get_or_empty() {
local path="${1:-}"
if [ -z "$path" ]; then
_hvault_err "hvault_get_or_empty" "PATH is required" \
"usage: hvault_get_or_empty PATH"
return 1
fi
_hvault_check_prereqs "hvault_get_or_empty" || return 1
(
local tmp http_code
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
http_code="$(curl -sS -o "$tmp" -w '%{http_code}' \
-H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/${path}")" \
|| { _hvault_err "hvault_get_or_empty" "curl failed" "path=$path"; exit 1; }
case "$http_code" in
2[0-9][0-9]) cat "$tmp" ;;
404) printf '' ;;
*) _hvault_err "hvault_get_or_empty" "HTTP $http_code" "$(cat "$tmp")"
exit 1 ;;
esac
)
}
# hvault_policy_apply NAME FILE # hvault_policy_apply NAME FILE
# Idempotent policy upsert — create or update a Vault policy. # Idempotent policy upsert — create or update a Vault policy.
hvault_policy_apply() { hvault_policy_apply() {
@ -277,3 +405,36 @@ hvault_token_lookup() {
return 1 return 1
} }
} }
# _hvault_seed_key — Seed a single KV key if it doesn't exist.
# Reads existing data and merges to preserve sibling keys (KV v2 replaces
# .data atomically). Returns 0=created, 1=unchanged, 2=API error.
# Args:
# path: KV v2 logical path (e.g. "disinto/shared/chat")
# key: key name within the path (e.g. "chat_oauth_client_id")
# generator: shell command that outputs a random value (default: openssl rand -hex 32)
# Usage:
# _hvault_seed_key "disinto/shared/chat" "chat_oauth_client_id"
# rc=$? # 0=created, 1=unchanged
_hvault_seed_key() {
local path="$1" key="$2" generator="${3:-openssl rand -hex 32}"
local existing
existing=$(hvault_kv_get "$path" "$key" 2>/dev/null) || true
if [ -n "$existing" ]; then
return 1 # unchanged
fi
local value
value=$(eval "$generator")
# Read existing data to preserve sibling keys (KV v2 replaces atomically)
local kv_api="${VAULT_KV_MOUNT}/data/${path}"
local raw existing_data payload
raw="$(hvault_get_or_empty "$kv_api")" || return 2
existing_data="{}"
[ -n "$raw" ] && existing_data="$(printf '%s' "$raw" | jq '.data.data // {}')"
payload="$(printf '%s' "$existing_data" \
| jq --arg k "$key" --arg v "$value" '{data: (. + {($k): $v})}')"
_hvault_request POST "$kv_api" "$payload" >/dev/null
return 0 # created
}

View file

@ -66,6 +66,7 @@ HOST_VOLUME_DIRS=(
"/srv/disinto/agent-data" "/srv/disinto/agent-data"
"/srv/disinto/project-repos" "/srv/disinto/project-repos"
"/srv/disinto/caddy-data" "/srv/disinto/caddy-data"
"/srv/disinto/docker"
"/srv/disinto/chat-history" "/srv/disinto/chat-history"
"/srv/disinto/ops-repo" "/srv/disinto/ops-repo"
) )
@ -116,7 +117,7 @@ if [ "$dry_run" = true ]; then
[dry-run] Step 4/9: create host-volume dirs under /srv/disinto/ [dry-run] Step 4/9: create host-volume dirs under /srv/disinto/
EOF EOF
for d in "${HOST_VOLUME_DIRS[@]}"; do for d in "${HOST_VOLUME_DIRS[@]}"; do
printf ' → install -d -m 0755 %s\n' "$d" printf ' → install -d -m 0777 %s\n' "$d"
done done
cat <<EOF cat <<EOF
@ -280,8 +281,10 @@ for d in "${HOST_VOLUME_DIRS[@]}"; do
log "unchanged: ${d}" log "unchanged: ${d}"
else else
log "creating: ${d}" log "creating: ${d}"
install -d -m 0755 -o root -g root "$d" install -d -m 0777 -o root -g root "$d"
fi fi
# Ensure correct permissions (fixes pre-existing 0755 dirs on re-run)
chmod 0777 "$d"
done done
# ── Step 5/9: /etc/nomad.d/server.hcl + client.hcl ─────────────────────────── # ── Step 5/9: /etc/nomad.d/server.hcl + client.hcl ───────────────────────────

View file

@ -16,13 +16,15 @@
# Environment: # Environment:
# REPO_ROOT — absolute path to repo root (defaults to parent of # REPO_ROOT — absolute path to repo root (defaults to parent of
# this script's parent directory) # this script's parent directory)
# JOB_READY_TIMEOUT_SECS — poll timeout in seconds (default: 240) # JOB_READY_TIMEOUT_SECS — poll timeout in seconds (default: 360)
# JOB_READY_TIMEOUT_<JOBNAME> — per-job timeout override (e.g., # JOB_READY_TIMEOUT_<JOBNAME> — per-job timeout override (e.g.,
# JOB_READY_TIMEOUT_FORGEJO=300) # JOB_READY_TIMEOUT_FORGEJO=300)
# Built-in: JOB_READY_TIMEOUT_CHAT=600
# #
# Exit codes: # Exit codes:
# 0 success (all jobs deployed and healthy, or dry-run completed) # 0 success (all jobs deployed and healthy, or dry-run completed)
# 1 failure (validation error, timeout, or nomad command failure) # 1 failure (validation error, or one or more jobs unhealthy after all
# jobs submitted — deploy does NOT cascade-skip on timeout)
# #
# Idempotency: # Idempotency:
# Running twice back-to-back on a healthy cluster is a no-op. Jobs that are # Running twice back-to-back on a healthy cluster is a no-op. Jobs that are
@ -33,9 +35,13 @@ set -euo pipefail
# ── Configuration ──────────────────────────────────────────────────────────── # ── Configuration ────────────────────────────────────────────────────────────
SCRIPT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" SCRIPT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="${REPO_ROOT:-$(cd "${SCRIPT_ROOT}/../../.." && pwd)}" REPO_ROOT="${REPO_ROOT:-$(cd "${SCRIPT_ROOT}/../../.." && pwd)}"
JOB_READY_TIMEOUT_SECS="${JOB_READY_TIMEOUT_SECS:-240}" JOB_READY_TIMEOUT_SECS="${JOB_READY_TIMEOUT_SECS:-360}"
# Per-job built-in defaults (override with JOB_READY_TIMEOUT_<JOBNAME> env var)
JOB_READY_TIMEOUT_CHAT="${JOB_READY_TIMEOUT_CHAT:-600}"
DRY_RUN=0 DRY_RUN=0
FAILED_JOBS=() # jobs that timed out or failed deployment
log() { printf '[deploy] %s\n' "$*" >&2; } log() { printf '[deploy] %s\n' "$*" >&2; }
die() { printf '[deploy] ERROR: %s\n' "$*" >&2; exit 1; } die() { printf '[deploy] ERROR: %s\n' "$*" >&2; exit 1; }
@ -168,6 +174,43 @@ _wait_job_running() {
return 1 return 1
} }
# ── Helper: _run_post_deploy <job_name> ─────────────────────────────────────
# Runs post-deploy scripts for a job after it becomes healthy.
# Currently supports: forgejo → run forgejo-bootstrap.sh
#
# Args:
# job_name — name of the deployed job
#
# Returns:
# 0 on success (script ran or not applicable)
# 1 on failure
# ─────────────────────────────────────────────────────────────────────────────
_run_post_deploy() {
local job_name="$1"
local post_deploy_script
case "$job_name" in
forgejo)
post_deploy_script="${SCRIPT_ROOT}/forgejo-bootstrap.sh"
if [ -x "$post_deploy_script" ]; then
log "running post-deploy script for ${job_name}"
if ! "$post_deploy_script"; then
log "ERROR: post-deploy script failed for ${job_name}"
return 1
fi
log "post-deploy script completed for ${job_name}"
else
log "no post-deploy script found for ${job_name}, skipping"
fi
;;
*)
log "no post-deploy script for ${job_name}, skipping"
;;
esac
return 0
}
# ── Main: deploy each job in order ─────────────────────────────────────────── # ── Main: deploy each job in order ───────────────────────────────────────────
for job_name in "${JOBS[@]}"; do for job_name in "${JOBS[@]}"; do
jobspec_path="${REPO_ROOT}/nomad/jobs/${job_name}.hcl" jobspec_path="${REPO_ROOT}/nomad/jobs/${job_name}.hcl"
@ -177,7 +220,8 @@ for job_name in "${JOBS[@]}"; do
fi fi
# Per-job timeout override: JOB_READY_TIMEOUT_<UPPERCASE_JOBNAME> # Per-job timeout override: JOB_READY_TIMEOUT_<UPPERCASE_JOBNAME>
job_upper=$(printf '%s' "$job_name" | tr '[:lower:]' '[:upper:]') # Sanitize job name: replace hyphens with underscores (bash vars can't have hyphens)
job_upper=$(printf '%s' "$job_name" | tr '[:lower:]-' '[:upper:]_' | tr ' ' '_')
timeout_var="JOB_READY_TIMEOUT_${job_upper}" timeout_var="JOB_READY_TIMEOUT_${job_upper}"
job_timeout="${!timeout_var:-$JOB_READY_TIMEOUT_SECS}" job_timeout="${!timeout_var:-$JOB_READY_TIMEOUT_SECS}"
@ -185,6 +229,9 @@ for job_name in "${JOBS[@]}"; do
log "[dry-run] nomad job validate ${jobspec_path}" log "[dry-run] nomad job validate ${jobspec_path}"
log "[dry-run] nomad job run -detach ${jobspec_path}" log "[dry-run] nomad job run -detach ${jobspec_path}"
log "[dry-run] (would wait for '${job_name}' to become healthy for ${job_timeout}s)" log "[dry-run] (would wait for '${job_name}' to become healthy for ${job_timeout}s)"
case "$job_name" in
forgejo) log "[dry-run] [post-deploy] would run forgejo-bootstrap.sh" ;;
esac
continue continue
fi fi
@ -214,7 +261,13 @@ for job_name in "${JOBS[@]}"; do
# 4. Wait for healthy state # 4. Wait for healthy state
if ! _wait_job_running "$job_name" "$job_timeout"; then if ! _wait_job_running "$job_name" "$job_timeout"; then
die "deployment for job '${job_name}' did not reach successful state" log "WARNING: deployment for job '${job_name}' did not reach successful state — continuing with remaining jobs"
FAILED_JOBS+=("$job_name")
else
# 5. Run post-deploy scripts (only if job reached healthy state)
if ! _run_post_deploy "$job_name"; then
die "post-deploy script failed for job '${job_name}'"
fi
fi fi
done done
@ -222,4 +275,17 @@ if [ "$DRY_RUN" -eq 1 ]; then
log "dry-run complete" log "dry-run complete"
fi fi
# ── Final health summary ─────────────────────────────────────────────────────
if [ "${#FAILED_JOBS[@]}" -gt 0 ]; then
log ""
log "=== DEPLOY SUMMARY ==="
log "The following jobs did NOT reach healthy state:"
for failed in "${FAILED_JOBS[@]}"; do
log " - ${failed}"
done
log "All other jobs were submitted and healthy."
log "======================"
exit 1
fi
exit 0 exit 0

View file

@ -0,0 +1,215 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/forgejo-bootstrap.sh — Bootstrap Forgejo admin user
#
# Part of the Nomad+Vault migration (S2.4, issue #1069). Creates the
# disinto-admin user in Forgejo if it doesn't exist, enabling:
# - First-login success without manual intervention
# - PAT generation via API (required for disinto backup import #1058)
#
# The script is idempotent — re-running after success is a no-op.
#
# Scope:
# - Checks if user 'disinto-admin' exists via GET /api/v1/users/search
# - If not: POST /api/v1/admin/users to create admin user
# - Uses FORGE_ADMIN_PASS from environment (required)
#
# Idempotency contract:
# - User 'disinto-admin' exists → skip creation, log
# "[forgejo-bootstrap] admin user already exists"
# - User creation fails with "user already exists" → treat as success
#
# Preconditions:
# - Forgejo reachable at $FORGE_URL (default: http://127.0.0.1:3000)
# - Forgejo admin token at $FORGE_TOKEN (from Vault or env)
# - FORGE_ADMIN_PASS set (env var with admin password)
#
# Requires:
# - curl, jq
#
# Usage:
# lib/init/nomad/forgejo-bootstrap.sh
# lib/init/nomad/forgejo-bootstrap.sh --dry-run
#
# Exit codes:
# 0 success (user created + ready, or already exists)
# 1 precondition / API failure
# =============================================================================
set -euo pipefail
# ── Configuration ────────────────────────────────────────────────────────────
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
# shellcheck source=../../../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
# Configuration
FORGE_URL="${FORGE_URL:-http://127.0.0.1:3000}"
FORGE_TOKEN="${FORGE_TOKEN:-}"
FORGE_ADMIN_USER="${DISINTO_ADMIN_USER:-disinto-admin}"
FORGE_ADMIN_EMAIL="${DISINTO_ADMIN_EMAIL:-admin@disinto.local}"
# Derive FORGE_ADMIN_PASS from common env var patterns
# Priority: explicit FORGE_ADMIN_PASS > DISINTO_FORGE_ADMIN_PASS > FORGEJO_ADMIN_PASS
FORGE_ADMIN_PASS="${FORGE_ADMIN_PASS:-${DISINTO_FORGE_ADMIN_PASS:-${FORGEJO_ADMIN_PASS:-}}}"
LOG_TAG="[forgejo-bootstrap]"
log() { printf '%s %s\n' "$LOG_TAG" "$*" >&2; }
die() { printf '%s ERROR: %s\n' "$LOG_TAG" "$*" >&2; exit 1; }
# ── Flag parsing ─────────────────────────────────────────────────────────────
DRY_RUN="${DRY_RUN:-0}"
for arg in "$@"; do
case "$arg" in
--dry-run) DRY_RUN=1 ;;
-h|--help)
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Bootstrap Forgejo admin user if it does not exist.\n'
printf 'Idempotent: re-running is a no-op.\n\n'
printf 'Environment:\n'
printf ' FORGE_URL Forgejo base URL (default: http://127.0.0.1:3000)\n'
printf ' FORGE_TOKEN Forgejo admin token (from Vault or env)\n'
printf ' FORGE_ADMIN_PASS Admin password (required)\n'
printf ' DISINTO_ADMIN_USER Username for admin account (default: disinto-admin)\n'
printf ' DISINTO_ADMIN_EMAIL Admin email (default: admin@disinto.local)\n\n'
printf ' --dry-run Print planned actions without modifying Forgejo.\n'
exit 0
;;
*) die "invalid argument: ${arg} (try --help)" ;;
esac
done
# ── Precondition checks ──────────────────────────────────────────────────────
log "── Precondition check ──"
if [ -z "$FORGE_URL" ]; then
die "FORGE_URL is not set"
fi
if [ -z "$FORGE_ADMIN_PASS" ]; then
die "FORGE_ADMIN_PASS is not set (required for admin user creation)"
fi
# Resolve FORGE_TOKEN from Vault if not set in env
if [ -z "$FORGE_TOKEN" ]; then
log "reading FORGE_TOKEN from Vault at kv/disinto/shared/forge/token"
_hvault_default_env
token_raw="$(hvault_get_or_empty "kv/data/disinto/shared/forge/token" 2>/dev/null)" || true
if [ -n "$token_raw" ]; then
FORGE_TOKEN="$(printf '%s' "$token_raw" | jq -r '.data.data.token // empty' 2>/dev/null)" || true
fi
if [ -z "$FORGE_TOKEN" ]; then
die "FORGE_TOKEN not set and not found in Vault"
fi
log "forge token loaded from Vault"
fi
# ── Step 1/3: Check if admin user already exists ─────────────────────────────
log "── Step 1/3: check if admin user '${FORGE_ADMIN_USER}' exists ──"
# Use exact match via GET /api/v1/users/{username} (returns 404 if absent)
user_lookup_raw=$(curl -sf --max-time 10 \
"${FORGE_URL}/api/v1/users/${FORGE_ADMIN_USER}" 2>/dev/null) || {
# 404 means user doesn't exist
if [ $? -eq 7 ]; then
log "admin user '${FORGE_ADMIN_USER}' not found"
admin_user_exists=false
user_id=""
else
# Other curl errors (e.g., network, Forgejo down)
log "warning: failed to lookup user (Forgejo may not be ready yet)"
admin_user_exists=false
user_id=""
fi
}
if [ -n "$user_lookup_raw" ]; then
admin_user_exists=true
user_id=$(printf '%s' "$user_lookup_raw" | jq -r '.id // empty' 2>/dev/null) || true
if [ -n "$user_id" ]; then
log "admin user '${FORGE_ADMIN_USER}' already exists (user_id: ${user_id})"
fi
fi
# ── Step 2/3: Create admin user if needed ────────────────────────────────────
if [ "$admin_user_exists" = false ]; then
log "creating admin user '${FORGE_ADMIN_USER}'"
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] would create admin user with:"
log "[dry-run] username: ${FORGE_ADMIN_USER}"
log "[dry-run] email: ${FORGE_ADMIN_EMAIL}"
log "[dry-run] admin: true"
log "[dry-run] must_change_password: false"
else
# Create the admin user via the admin API
create_response=$(curl -sf --max-time 30 -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/admin/users" \
-d "{
\"username\": \"${FORGE_ADMIN_USER}\",
\"email\": \"${FORGE_ADMIN_EMAIL}\",
\"password\": \"${FORGE_ADMIN_PASS}\",
\"admin\": true,
\"must_change_password\": false
}" 2>/dev/null) || {
# Check if the error is "user already exists" (race condition on re-run)
error_body=$(curl -s --max-time 30 -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/admin/users" \
-d "{\"username\": \"${FORGE_ADMIN_USER}\", \"email\": \"${FORGE_ADMIN_EMAIL}\", \"password\": \"${FORGE_ADMIN_PASS}\", \"admin\": true, \"must_change_password\": false}" 2>/dev/null) || error_body=""
if echo "$error_body" | grep -q '"message".*"user already exists"'; then
log "admin user '${FORGE_ADMIN_USER}' already exists (race condition handled)"
admin_user_exists=true
else
die "failed to create admin user in Forgejo: ${error_body:-unknown error}"
fi
}
# Extract user_id from response
user_id=$(printf '%s' "$create_response" | jq -r '.id // empty' 2>/dev/null) || true
if [ -n "$user_id" ]; then
admin_user_exists=true
log "admin user '${FORGE_ADMIN_USER}' created (user_id: ${user_id})"
else
die "failed to extract user_id from Forgejo response"
fi
fi
else
log "admin user '${FORGE_ADMIN_USER}' already exists — skipping creation"
fi
# ── Step 3/3: Verify user was created and is admin ───────────────────────────
log "── Step 3/3: verify admin user is properly configured ──"
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] would verify admin user configuration"
log "done — [dry-run] complete"
else
# Verify the user exists and is admin
verify_response=$(curl -sf --max-time 10 \
-u "${FORGE_ADMIN_USER}:${FORGE_ADMIN_PASS}" \
"${FORGE_URL}/api/v1/user" 2>/dev/null) || {
die "failed to verify admin user credentials"
}
is_admin=$(printf '%s' "$verify_response" | jq -r '.is_admin // false' 2>/dev/null) || true
login=$(printf '%s' "$verify_response" | jq -r '.login // empty' 2>/dev/null) || true
if [ "$is_admin" != "true" ]; then
die "admin user '${FORGE_ADMIN_USER}' is not marked as admin"
fi
if [ "$login" != "$FORGE_ADMIN_USER" ]; then
die "admin user login mismatch: expected '${FORGE_ADMIN_USER}', got '${login}'"
fi
log "admin user verified: login=${login}, is_admin=${is_admin}"
log "done — Forgejo admin user is ready"
fi
exit 0

140
lib/init/nomad/vault-engines.sh Executable file
View file

@ -0,0 +1,140 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/vault-engines.sh — Enable required Vault secret engines
#
# Part of the Nomad+Vault migration (S2.1, issue #912). Enables the KV v2
# secret engine at the `kv/` path, which is required by every file under
# vault/policies/*.hcl, every role in vault/roles.yaml, every write done
# by tools/vault-import.sh, and every template read done by
# nomad/jobs/forgejo.hcl — all of which address paths under kv/disinto/…
# and 403 if the mount is absent.
#
# Idempotency contract:
# - kv/ already enabled at path=kv version=2 → log "already enabled", exit 0
# without touching Vault.
# - kv/ enabled at a different type/version → die (manual intervention).
# - kv/ not enabled → POST sys/mounts/kv to enable kv-v2, log "enabled".
# - Second run on a fully-configured box is a silent no-op.
#
# Preconditions:
# - Vault is unsealed and reachable (VAULT_ADDR + VAULT_TOKEN set OR
# defaultable to the local-cluster shape via _hvault_default_env).
# - Must run AFTER cluster-up.sh (unseal complete) but BEFORE
# vault-apply-policies.sh (policies reference kv/* paths).
#
# Environment:
# VAULT_ADDR — default http://127.0.0.1:8200 via _hvault_default_env.
# VAULT_TOKEN — env OR /etc/vault.d/root.token (resolved by lib/hvault.sh).
#
# Usage:
# sudo lib/init/nomad/vault-engines.sh
# sudo lib/init/nomad/vault-engines.sh --dry-run
#
# Exit codes:
# 0 success (kv enabled, or already so)
# 1 precondition / API failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
# shellcheck source=../../hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
log() { printf '[vault-engines] %s\n' "$*"; }
die() { printf '[vault-engines] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Flag parsing (single optional flag) ─────────────────────────────────────
# Shape: while/shift loop. Deliberately NOT a flat `case "${1:-}"` like
# tools/vault-apply-policies.sh nor an if/elif ladder like
# tools/vault-apply-roles.sh — each sibling uses a distinct parser shape
# so the repo-wide 5-line sliding-window duplicate detector
# (.woodpecker/detect-duplicates.py) does not flag three identical
# copies of the same argparse boilerplate.
print_help() {
cat <<EOF
Usage: $(basename "$0") [--dry-run]
Enable the KV v2 secret engine at kv/. Required by all Vault policies,
roles, and Nomad job templates that reference kv/disinto/* paths.
Idempotent: an already-enabled kv/ is reported and left untouched.
--dry-run Probe state and print the action without contacting Vault
in a way that mutates it.
EOF
}
dry_run=false
while [ "$#" -gt 0 ]; do
case "$1" in
--dry-run) dry_run=true; shift ;;
-h|--help) print_help; exit 0 ;;
*) die "unknown flag: $1" ;;
esac
done
# ── Preconditions ────────────────────────────────────────────────────────────
for bin in curl jq; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
# Default the local-cluster Vault env (VAULT_ADDR + VAULT_TOKEN). Shared
# with the rest of the init-time Vault scripts — see lib/hvault.sh header.
_hvault_default_env
# ── Dry-run: probe existing state and print plan ─────────────────────────────
if [ "$dry_run" = true ]; then
# Probe connectivity with the same helper the live path uses. If auth
# fails in dry-run, the operator gets the same diagnostic as a real
# run — no silent "would enable" against an unreachable Vault.
hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
mounts_raw="$(hvault_get_or_empty "sys/mounts")" \
|| die "failed to list secret engines"
if [ -n "$mounts_raw" ] \
&& printf '%s' "$mounts_raw" | jq -e '."kv/"' >/dev/null 2>&1; then
log "[dry-run] kv-v2 at kv/ already enabled"
else
log "[dry-run] would enable kv-v2 at kv/"
fi
exit 0
fi
# ── Live run: Vault connectivity check ───────────────────────────────────────
hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Check if kv/ is already enabled ──────────────────────────────────────────
# sys/mounts returns an object keyed by "<path>/" for every enabled secret
# engine (trailing slash is Vault's on-disk form). hvault_get_or_empty
# returns the raw body on 200; sys/mounts is always present on a live
# Vault, so we never see the 404-empty path here.
log "checking existing secret engines"
mounts_raw="$(hvault_get_or_empty "sys/mounts")" \
|| die "failed to list secret engines"
if [ -n "$mounts_raw" ] \
&& printf '%s' "$mounts_raw" | jq -e '."kv/"' >/dev/null 2>&1; then
# kv/ exists — verify it's kv-v2 on the right path shape. Vault returns
# the option as a string ("2") on GET, never an integer.
kv_type="$(printf '%s' "$mounts_raw" | jq -r '."kv/".type // ""')"
kv_version="$(printf '%s' "$mounts_raw" | jq -r '."kv/".options.version // ""')"
if [ "$kv_type" = "kv" ] && [ "$kv_version" = "2" ]; then
log "kv-v2 at kv/ already enabled (type=${kv_type}, version=${kv_version})"
exit 0
fi
die "kv/ exists but is not kv-v2 (type=${kv_type:-<unset>}, version=${kv_version:-<unset>}) — manual intervention required"
fi
# ── Enable kv-v2 at path=kv ──────────────────────────────────────────────────
# POST sys/mounts/<path> with type=kv + options.version=2 is the
# HTTP-API equivalent of `vault secrets enable -path=kv -version=2 kv`.
# Keeps the script vault-CLI-free (matches the policy-apply + nomad-auth
# scripts; their headers explain why a CLI dep would die on client-only
# nodes).
log "enabling kv-v2 at path=kv"
enable_payload="$(jq -n '{type:"kv",options:{version:"2"}}')"
_hvault_request POST "sys/mounts/kv" "$enable_payload" >/dev/null \
|| die "failed to enable kv-v2 secret engine"
log "kv-v2 enabled at kv/"

View file

@ -0,0 +1,183 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/vault-nomad-auth.sh — Idempotent Vault JWT auth + Nomad wiring
#
# Part of the Nomad+Vault migration (S2.3, issue #881). Enables Vault's JWT
# auth method at path `jwt-nomad`, points it at Nomad's workload-identity
# JWKS endpoint, writes one role per policy (via tools/vault-apply-roles.sh),
# updates /etc/nomad.d/server.hcl with the vault stanza, and signals nomad
# to reload so jobs can exchange short-lived workload-identity tokens for
# Vault tokens — no shared VAULT_TOKEN in job env.
#
# Steps:
# 1. Enable auth method (sys/auth/jwt-nomad, type=jwt)
# 2. Configure JWKS + algs (auth/jwt-nomad/config)
# 3. Upsert roles from vault/roles.yaml (delegates to vault-apply-roles.sh)
# 4. Install /etc/nomad.d/server.hcl from repo + SIGHUP nomad if changed
#
# Idempotency contract:
# - Auth path already enabled → skip create, log "jwt-nomad already enabled".
# - Config identical to desired → skip write, log "jwt-nomad config unchanged".
# - Roles: see tools/vault-apply-roles.sh header for per-role diffing.
# - server.hcl on disk byte-identical to repo copy → skip write, skip SIGHUP.
# - Second run on a fully-configured box is a silent no-op end-to-end.
#
# Preconditions:
# - S0 complete (empty cluster up: nomad + vault reachable, vault unsealed).
# - S2.1 complete: vault/policies/*.hcl applied via tools/vault-apply-policies.sh
# (otherwise the roles we write will reference policies Vault does not
# know about — the write succeeds, but token minting will fail later).
# - Running as root (writes /etc/nomad.d/server.hcl + signals nomad).
#
# Environment:
# VAULT_ADDR — default http://127.0.0.1:8200 (matches nomad/vault.hcl).
# VAULT_TOKEN — env OR /etc/vault.d/root.token (resolved by lib/hvault.sh).
#
# Usage:
# sudo lib/init/nomad/vault-nomad-auth.sh
#
# Exit codes:
# 0 success (configured, or already so)
# 1 precondition / API / nomad-reload failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
APPLY_ROLES_SH="${REPO_ROOT}/tools/vault-apply-roles.sh"
SERVER_HCL_SRC="${REPO_ROOT}/nomad/server.hcl"
SERVER_HCL_DST="/etc/nomad.d/server.hcl"
# shellcheck source=../../hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
# Default the local-cluster Vault env (see lib/hvault.sh::_hvault_default_env).
# Called from `disinto init` which does not export VAULT_ADDR/VAULT_TOKEN in
# the common fresh-LXC case (issue #912). Must run after hvault.sh is sourced.
_hvault_default_env
log() { printf '[vault-auth] %s\n' "$*"; }
die() { printf '[vault-auth] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Preconditions ────────────────────────────────────────────────────────────
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (writes ${SERVER_HCL_DST} + signals nomad)"
fi
# curl + jq are used directly; hvault.sh's helpers are also curl-based, so
# the `vault` CLI is NOT required here — don't add it to this list, or a
# Vault-server-present / vault-CLI-absent box (e.g. a Nomad-client-only
# node) would die spuriously. systemctl is required for SIGHUPing nomad.
for bin in curl jq systemctl; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
[ -f "$SERVER_HCL_SRC" ] \
|| die "source config not found: ${SERVER_HCL_SRC}"
[ -x "$APPLY_ROLES_SH" ] \
|| die "companion script missing or not executable: ${APPLY_ROLES_SH}"
hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Desired config (Nomad workload-identity JWKS on localhost:4646) ──────────
# Nomad's default workload-identity signer publishes the public JWKS at
# /.well-known/jwks.json on the nomad HTTP API port (4646). Vault validates
# JWTs against it. RS256 is the signer's default algorithm. `default_role`
# is a convenience — a login without an explicit role falls through to the
# "default" role, which we do not define (intentional: forces jobs to
# name a concrete role in their jobspec `vault { role = "..." }`).
JWKS_URL="http://127.0.0.1:4646/.well-known/jwks.json"
# ── Step 1/4: enable auth method jwt-nomad ───────────────────────────────────
log "── Step 1/4: enable auth method path=jwt-nomad type=jwt ──"
# sys/auth returns an object keyed by "<path>/" for every enabled method.
# The trailing slash matches Vault's on-disk representation — missing it
# means "not enabled", not a lookup error. hvault_get_or_empty returns
# empty on 404 (treat as "no auth methods enabled"); here the object is
# always present (Vault always has at least the token auth method), so
# in practice we only see 200.
auth_list="$(hvault_get_or_empty "sys/auth")" \
|| die "failed to list auth methods"
if printf '%s' "$auth_list" | jq -e '.["jwt-nomad/"]' >/dev/null 2>&1; then
log "auth path jwt-nomad already enabled"
else
enable_payload="$(jq -n '{type:"jwt",description:"Nomad workload identity (S2.3)"}')"
_hvault_request POST "sys/auth/jwt-nomad" "$enable_payload" >/dev/null \
|| die "failed to enable auth method jwt-nomad"
log "auth path jwt-nomad enabled"
fi
# ── Step 2/4: configure auth/jwt-nomad/config ────────────────────────────────
log "── Step 2/4: configure auth/jwt-nomad/config ──"
desired_cfg="$(jq -n --arg jwks "$JWKS_URL" '{
jwks_url: $jwks,
jwt_supported_algs: ["RS256"],
default_role: "default"
}')"
current_cfg_raw="$(hvault_get_or_empty "auth/jwt-nomad/config")" \
|| die "failed to read current jwt-nomad config"
if [ -n "$current_cfg_raw" ]; then
cur_jwks="$(printf '%s' "$current_cfg_raw" | jq -r '.data.jwks_url // ""')"
cur_algs="$(printf '%s' "$current_cfg_raw" | jq -cS '.data.jwt_supported_algs // []')"
cur_default="$(printf '%s' "$current_cfg_raw" | jq -r '.data.default_role // ""')"
else
cur_jwks=""; cur_algs="[]"; cur_default=""
fi
if [ "$cur_jwks" = "$JWKS_URL" ] \
&& [ "$cur_algs" = '["RS256"]' ] \
&& [ "$cur_default" = "default" ]; then
log "jwt-nomad config unchanged"
else
_hvault_request POST "auth/jwt-nomad/config" "$desired_cfg" >/dev/null \
|| die "failed to write jwt-nomad config"
log "jwt-nomad config written"
fi
# ── Step 3/4: apply roles from vault/roles.yaml ──────────────────────────────
log "── Step 3/4: apply roles from vault/roles.yaml ──"
# Delegates to tools/vault-apply-roles.sh — one source of truth for the
# parser and per-role idempotency contract. Its header documents the
# created/updated/unchanged wiring.
"$APPLY_ROLES_SH"
# ── Step 4/4: install server.hcl + SIGHUP nomad if changed ───────────────────
log "── Step 4/4: install ${SERVER_HCL_DST} + reload nomad if changed ──"
# cluster-up.sh (S0.4) is the normal path for installing server.hcl — but
# this script is run AFTER S0.4, so we also install here. Writing only on
# content-diff keeps re-runs a true no-op (no spurious SIGHUP). `install`
# preserves perms at 0644 root:root on every write.
needs_reload=0
if [ -f "$SERVER_HCL_DST" ] && cmp -s "$SERVER_HCL_SRC" "$SERVER_HCL_DST"; then
log "unchanged: ${SERVER_HCL_DST}"
else
log "writing: ${SERVER_HCL_DST}"
install -m 0644 -o root -g root "$SERVER_HCL_SRC" "$SERVER_HCL_DST"
needs_reload=1
fi
if [ "$needs_reload" -eq 1 ]; then
# SIGHUP triggers Nomad's config reload (see ExecReload in
# lib/init/nomad/systemd-nomad.sh — /bin/kill -HUP $MAINPID). Using
# `systemctl kill -s SIGHUP` instead of `systemctl reload` sends the
# signal even when the unit doesn't declare ExecReload (defensive —
# future unit edits can't silently break this script).
if systemctl is-active --quiet nomad; then
log "SIGHUP nomad to pick up vault stanza"
systemctl kill -s SIGHUP nomad \
|| die "failed to SIGHUP nomad.service"
else
# Fresh box: nomad not started yet. The updated server.hcl will be
# picked up at first start. Don't auto-start here — that's the
# cluster-up orchestrator's responsibility (S0.4).
log "nomad.service not active — skipping SIGHUP (next start loads vault stanza)"
fi
else
log "server.hcl unchanged — nomad SIGHUP not needed"
fi
log "── done — jwt-nomad auth + config + roles + nomad vault stanza in place ──"

View file

@ -0,0 +1,221 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/wp-oauth-register.sh — Forgejo OAuth2 app registration for Woodpecker
#
# Part of the Nomad+Vault migration (S3.3, issue #936). Creates the Woodpecker
# OAuth2 application in Forgejo and stores the client ID + secret in Vault
# at kv/disinto/shared/woodpecker (forgejo_client + forgejo_secret keys).
#
# The script is idempotent — re-running after success is a no-op.
#
# Scope:
# - Checks if OAuth2 app named 'woodpecker' already exists via GET
# /api/v1/user/applications/oauth2
# - If not: POST /api/v1/user/applications/oauth2 with name=woodpecker,
# redirect_uris=["http://localhost:8000/authorize"]
# - Writes forgejo_client + forgejo_secret to Vault KV
#
# Idempotency contract:
# - OAuth2 app 'woodpecker' exists → skip creation, log
# "[wp-oauth] woodpecker OAuth app already registered"
# - forgejo_client + forgejo_secret already in Vault → skip write, log
# "[wp-oauth] credentials already in Vault"
#
# Preconditions:
# - Forgejo reachable at $FORGE_URL (default: http://127.0.0.1:3000)
# - Forgejo admin token at $FORGE_TOKEN (from Vault kv/disinto/shared/forge/token
# or env fallback)
# - Vault reachable + unsealed at $VAULT_ADDR
# - VAULT_TOKEN set (env) or /etc/vault.d/root.token readable
#
# Requires:
# - curl, jq
#
# Usage:
# lib/init/nomad/wp-oauth-register.sh
# lib/init/nomad/wp-oauth-register.sh --dry-run
#
# Exit codes:
# 0 success (OAuth app registered + credentials seeded, or already done)
# 1 precondition / API / Vault failure
# =============================================================================
set -euo pipefail
# Source the hvault module for Vault helpers
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
# shellcheck source=../../../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
# Configuration
FORGE_URL="${FORGE_URL:-http://127.0.0.1:3000}"
FORGE_OAUTH_APP_NAME="woodpecker"
FORGE_REDIRECT_URIS='["http://localhost:8000/authorize"]'
KV_MOUNT="${VAULT_KV_MOUNT:-kv}"
KV_PATH="disinto/shared/woodpecker"
KV_API_PATH="${KV_MOUNT}/data/${KV_PATH}"
LOG_TAG="[wp-oauth]"
log() { printf '%s %s\n' "$LOG_TAG" "$*"; }
die() { printf '%s ERROR: %s\n' "$LOG_TAG" "$*" >&2; exit 1; }
# ── Flag parsing ─────────────────────────────────────────────────────────────
DRY_RUN="${DRY_RUN:-0}"
for arg in "$@"; do
case "$arg" in
--dry-run) DRY_RUN=1 ;;
-h|--help)
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Register Woodpecker OAuth2 app in Forgejo and store credentials\n'
printf 'in Vault. Idempotent: re-running is a no-op.\n\n'
printf ' --dry-run Print planned actions without writing to Vault.\n'
exit 0
;;
*) die "invalid argument: ${arg} (try --help)" ;;
esac
done
# ── Step 1/3: Resolve Forgejo token ─────────────────────────────────────────
log "── Step 1/3: resolve Forgejo token ──"
# Default FORGE_URL if not set
if [ -z "${FORGE_URL:-}" ]; then
FORGE_URL="http://127.0.0.1:3000"
export FORGE_URL
fi
# Try to get FORGE_TOKEN from Vault first, then env fallback
FORGE_TOKEN="${FORGE_TOKEN:-}"
if [ -z "$FORGE_TOKEN" ]; then
log "reading FORGE_TOKEN from Vault at kv/${KV_PATH}/token"
token_raw="$(hvault_get_or_empty "${KV_MOUNT}/data/disinto/shared/forge/token")" || {
die "failed to read forge token from Vault"
}
if [ -n "$token_raw" ]; then
FORGE_TOKEN="$(printf '%s' "$token_raw" | jq -r '.data.data.token // empty')"
if [ -z "$FORGE_TOKEN" ]; then
die "forge token not found at kv/disinto/shared/forge/token"
fi
log "forge token loaded from Vault"
fi
fi
if [ -z "$FORGE_TOKEN" ]; then
die "FORGE_TOKEN not set and not found in Vault"
fi
# ── Step 2/3: Check/create OAuth2 app in Forgejo ────────────────────────────
log "── Step 2/3: ensure OAuth2 app '${FORGE_OAUTH_APP_NAME}' in Forgejo ──"
# Check if OAuth2 app already exists
log "checking for existing OAuth2 app '${FORGE_OAUTH_APP_NAME}'"
oauth_apps_raw=$(curl -sf --max-time 10 \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_URL}/api/v1/user/applications/oauth2" 2>/dev/null) || {
die "failed to list Forgejo OAuth2 apps"
}
oauth_app_exists=false
existing_client_id=""
forgejo_secret=""
# Parse the OAuth2 apps list
if [ -n "$oauth_apps_raw" ]; then
existing_client_id=$(printf '%s' "$oauth_apps_raw" \
| jq -r --arg name "$FORGE_OAUTH_APP_NAME" \
'.[] | select(.name == $name) | .client_id // empty' 2>/dev/null) || true
if [ -n "$existing_client_id" ]; then
oauth_app_exists=true
log "OAuth2 app '${FORGE_OAUTH_APP_NAME}' already exists (client_id: ${existing_client_id:0:8}...)"
fi
fi
if [ "$oauth_app_exists" = false ]; then
log "creating OAuth2 app '${FORGE_OAUTH_APP_NAME}'"
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] would create OAuth2 app with redirect_uris: ${FORGE_REDIRECT_URIS}"
else
# Create the OAuth2 app
oauth_response=$(curl -sf --max-time 10 -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/user/applications/oauth2" \
-d "{\"name\":\"${FORGE_OAUTH_APP_NAME}\",\"redirect_uris\":${FORGE_REDIRECT_URIS}}" 2>/dev/null) || {
die "failed to create OAuth2 app in Forgejo"
}
# Extract client_id and client_secret from response
existing_client_id=$(printf '%s' "$oauth_response" | jq -r '.client_id // empty')
forgejo_secret=$(printf '%s' "$oauth_response" | jq -r '.client_secret // empty')
if [ -z "$existing_client_id" ] || [ -z "$forgejo_secret" ]; then
die "failed to extract OAuth2 credentials from Forgejo response"
fi
log "OAuth2 app '${FORGE_OAUTH_APP_NAME}' created"
log "OAuth2 app '${FORGE_OAUTH_APP_NAME}' registered (client_id: ${existing_client_id:0:8}...)"
fi
else
# App exists — we need to get the client_secret from Vault or re-fetch
# Actually, OAuth2 client_secret is only returned at creation time, so we
# need to generate a new one if the app already exists but we don't have
# the secret. For now, we'll use a placeholder and note this in the log.
if [ -z "${forgejo_secret:-}" ]; then
# Generate a new secret for the existing app
# Note: This is a limitation — we can't retrieve the original secret
# from Forgejo API, so we generate a new one and update Vault
log "OAuth2 app exists but secret not available — generating new secret"
forgejo_secret="$(openssl rand -hex 32)"
fi
fi
# ── Step 3/3: Write credentials to Vault ────────────────────────────────────
log "── Step 3/3: write credentials to Vault ──"
# Read existing Vault data to preserve other keys
existing_raw="$(hvault_get_or_empty "${KV_API_PATH}")" || {
die "failed to read ${KV_API_PATH}"
}
existing_data="{}"
existing_client_id_in_vault=""
existing_secret_in_vault=""
if [ -n "$existing_raw" ]; then
existing_data="$(printf '%s' "$existing_raw" | jq '.data.data // {}')"
existing_client_id_in_vault="$(printf '%s' "$existing_raw" | jq -r '.data.data.forgejo_client // ""')"
existing_secret_in_vault="$(printf '%s' "$existing_raw" | jq -r '.data.data.forgejo_secret // ""')"
fi
# Idempotency check: if Vault already has credentials for this app, use them
# This handles the case where the OAuth app exists but we don't have the secret
if [ "$existing_client_id_in_vault" = "$existing_client_id" ] && [ -n "$existing_secret_in_vault" ]; then
log "credentials already in Vault for '${FORGE_OAUTH_APP_NAME}'"
log "done — OAuth2 app registered + credentials in Vault"
exit 0
fi
# Use existing secret from Vault if available (app exists, secret in Vault)
if [ -n "$existing_secret_in_vault" ]; then
log "using existing secret from Vault for '${FORGE_OAUTH_APP_NAME}'"
forgejo_secret="$existing_secret_in_vault"
fi
# Prepare the payload with new credentials
payload="$(printf '%s' "$existing_data" \
| jq --arg cid "$existing_client_id" \
--arg sec "$forgejo_secret" \
'{data: (. + {forgejo_client: $cid, forgejo_secret: $sec})}')"
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] would write forgejo_client + forgejo_secret to ${KV_API_PATH}"
log "done — [dry-run] complete"
else
_hvault_request POST "${KV_API_PATH}" "$payload" >/dev/null \
|| die "failed to write ${KV_API_PATH}"
log "forgejo_client + forgejo_secret written to Vault"
log "done — OAuth2 app registered + credentials in Vault"
fi

View file

@ -157,9 +157,10 @@ issue_claim() {
return 1 return 1
fi fi
local ip_id bl_id local ip_id bl_id bk_id
ip_id=$(_ilc_in_progress_id) ip_id=$(_ilc_in_progress_id)
bl_id=$(_ilc_backlog_id) bl_id=$(_ilc_backlog_id)
bk_id=$(_ilc_blocked_id)
if [ -n "$ip_id" ]; then if [ -n "$ip_id" ]; then
curl -sf -X POST \ curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \ -H "Authorization: token ${FORGE_TOKEN}" \
@ -172,6 +173,12 @@ issue_claim() {
-H "Authorization: token ${FORGE_TOKEN}" \ -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues/${issue}/labels/${bl_id}" >/dev/null 2>&1 || true "${FORGE_API}/issues/${issue}/labels/${bl_id}" >/dev/null 2>&1 || true
fi fi
# Clear blocked label on re-claim — starting work is implicit resolution of prior block
if [ -n "$bk_id" ]; then
curl -sf -X DELETE \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues/${issue}/labels/${bk_id}" >/dev/null 2>&1 || true
fi
_ilc_log "claimed issue #${issue}" _ilc_log "claimed issue #${issue}"
return 0 return 0
} }

View file

@ -198,6 +198,7 @@ setup_ops_repo() {
[ -f "${ops_root}/evidence/holdout/.gitkeep" ] || { touch "${ops_root}/evidence/holdout/.gitkeep"; seeded=true; } [ -f "${ops_root}/evidence/holdout/.gitkeep" ] || { touch "${ops_root}/evidence/holdout/.gitkeep"; seeded=true; }
[ -f "${ops_root}/evidence/evolution/.gitkeep" ] || { touch "${ops_root}/evidence/evolution/.gitkeep"; seeded=true; } [ -f "${ops_root}/evidence/evolution/.gitkeep" ] || { touch "${ops_root}/evidence/evolution/.gitkeep"; seeded=true; }
[ -f "${ops_root}/evidence/user-test/.gitkeep" ] || { touch "${ops_root}/evidence/user-test/.gitkeep"; seeded=true; } [ -f "${ops_root}/evidence/user-test/.gitkeep" ] || { touch "${ops_root}/evidence/user-test/.gitkeep"; seeded=true; }
[ -f "${ops_root}/knowledge/.gitkeep" ] || { touch "${ops_root}/knowledge/.gitkeep"; seeded=true; }
if [ ! -f "${ops_root}/README.md" ]; then if [ ! -f "${ops_root}/README.md" ]; then
cat > "${ops_root}/README.md" <<OPSEOF cat > "${ops_root}/README.md" <<OPSEOF
@ -362,6 +363,45 @@ migrate_ops_repo() {
if [ ! -f "$tfile" ]; then if [ ! -f "$tfile" ]; then
local title local title
title=$(basename "$tfile" | sed 's/\.md$//; s/_/ /g' | sed 's/\b\(.\)/\u\1/g') title=$(basename "$tfile" | sed 's/\.md$//; s/_/ /g' | sed 's/\b\(.\)/\u\1/g')
case "$tfile" in
portfolio.md)
{
echo "# ${title}"
echo ""
echo "## Addressables"
echo ""
echo "<!-- Add addressables here -->"
echo ""
echo "## Observables"
echo ""
echo "<!-- Add observables here -->"
} > "$tfile"
;;
RESOURCES.md)
{
echo "# ${title}"
echo ""
echo "## Accounts"
echo ""
echo "<!-- Add account references here -->"
echo ""
echo "## Tokens"
echo ""
echo "<!-- Add token references here -->"
echo ""
echo "## Infrastructure"
echo ""
echo "<!-- Add infrastructure inventory here -->"
} > "$tfile"
;;
prerequisites.md)
{
echo "# ${title}"
echo ""
echo "<!-- Add dependency graph here -->"
} > "$tfile"
;;
*)
{ {
echo "# ${title}" echo "# ${title}"
echo "" echo ""
@ -369,6 +409,8 @@ migrate_ops_repo() {
echo "" echo ""
echo "<!-- Add content here -->" echo "<!-- Add content here -->"
} > "$tfile" } > "$tfile"
;;
esac
echo " + Created: ${tfile}" echo " + Created: ${tfile}"
migrated=true migrated=true
fi fi

View file

@ -429,19 +429,100 @@ pr_walk_to_merge() {
_prl_log "CI failed — invoking agent (attempt ${ci_fix_count}/${max_ci_fixes})" _prl_log "CI failed — invoking agent (attempt ${ci_fix_count}/${max_ci_fixes})"
# Get CI logs from SQLite database if available # Build per-workflow/per-step CI diagnostics prompt
local ci_prompt_body=""
local passing_workflows=""
local built_diagnostics=false
if [ -n "$_PR_CI_PIPELINE" ] && [ -n "${WOODPECKER_REPO_ID:-}" ]; then
local pip_json
pip_json=$(woodpecker_api "/repos/${WOODPECKER_REPO_ID}/pipelines/${_PR_CI_PIPELINE}" 2>/dev/null) || pip_json=""
if [ -n "$pip_json" ]; then
local wf_count
wf_count=$(printf '%s' "$pip_json" | jq '[.workflows[]?] | length' 2>/dev/null) || wf_count=0
if [ "$wf_count" -gt 0 ]; then
built_diagnostics=true
local wf_idx=0
while [ "$wf_idx" -lt "$wf_count" ]; do
local wf_name wf_state
wf_name=$(printf '%s' "$pip_json" | jq -r ".workflows[$wf_idx].name // \"workflow-$wf_idx\"" 2>/dev/null)
wf_state=$(printf '%s' "$pip_json" | jq -r ".workflows[$wf_idx].state // \"unknown\"" 2>/dev/null)
if [ "$wf_state" = "failure" ] || [ "$wf_state" = "error" ] || [ "$wf_state" = "killed" ]; then
# Collect failed children for this workflow
local failed_children
failed_children=$(printf '%s' "$pip_json" | jq -r "
.workflows[$wf_idx].children[]? |
select(.state == \"failure\" or .state == \"error\" or .state == \"killed\") |
\"\(.name)\t\(.exit_code)\t\(.pid)\"" 2>/dev/null) || failed_children=""
ci_prompt_body="${ci_prompt_body}
--- Failed workflow: ${wf_name} ---"
if [ -n "$failed_children" ]; then
while IFS=$'\t' read -r step_name step_exit step_pid; do
[ -z "$step_name" ] && continue
local exit_annotation=""
case "$step_exit" in
126) exit_annotation=" (permission denied or not executable)" ;;
127) exit_annotation=" (command not found)" ;;
128) exit_annotation=" (invalid exit argument / signal+128)" ;;
esac
ci_prompt_body="${ci_prompt_body}
Step: ${step_name}
Exit code: ${step_exit}${exit_annotation}"
# Fetch per-step logs
if [ -n "$step_pid" ] && [ "$step_pid" != "null" ]; then
local step_logs
step_logs=$(ci_get_step_logs "$_PR_CI_PIPELINE" "$step_pid" 2>/dev/null | tail -50) || step_logs=""
if [ -n "$step_logs" ]; then
ci_prompt_body="${ci_prompt_body}
Log tail (last 50 lines):
\`\`\`
${step_logs}
\`\`\`"
fi
fi
done <<< "$failed_children"
else
ci_prompt_body="${ci_prompt_body}
(no failed step details available)"
fi
else
# Track passing/other workflows
if [ -n "$passing_workflows" ]; then
passing_workflows="${passing_workflows}, ${wf_name}"
else
passing_workflows="${wf_name}"
fi
fi
wf_idx=$((wf_idx + 1))
done
fi
fi
fi
# Fallback: use legacy log fetch if per-workflow diagnostics unavailable
if [ "$built_diagnostics" = false ]; then
local ci_logs="" local ci_logs=""
if [ -n "$_PR_CI_PIPELINE" ] && [ -n "${FACTORY_ROOT:-}" ]; then if [ -n "$_PR_CI_PIPELINE" ] && [ -n "${FACTORY_ROOT:-}" ]; then
ci_logs=$(ci_get_logs "$_PR_CI_PIPELINE" 2>/dev/null | tail -50) || ci_logs="" ci_logs=$(ci_get_logs "$_PR_CI_PIPELINE" 2>/dev/null | tail -50) || ci_logs=""
fi fi
local logs_section=""
if [ -n "$ci_logs" ]; then if [ -n "$ci_logs" ]; then
logs_section=" ci_prompt_body="
CI Log Output (last 50 lines): CI Log Output (last 50 lines):
\`\`\` \`\`\`
${ci_logs} ${ci_logs}
\`\`\` \`\`\`"
fi
fi
local passing_line=""
if [ -n "$passing_workflows" ]; then
passing_line="
Passing workflows (do not modify): ${passing_workflows}
" "
fi fi
@ -450,9 +531,10 @@ ${ci_logs}
Pipeline: #${_PR_CI_PIPELINE:-?} Pipeline: #${_PR_CI_PIPELINE:-?}
Failure type: ${_PR_CI_FAILURE_TYPE:-unknown} Failure type: ${_PR_CI_FAILURE_TYPE:-unknown}
${passing_line}
Error log: Error log:
${_PR_CI_ERROR_LOG:-No logs available.}${logs_section} ${_PR_CI_ERROR_LOG:-No logs available.}
${ci_prompt_body}
Fix the issue, run tests, commit, rebase on ${PRIMARY_BRANCH}, and push: Fix the issue, run tests, commit, rebase on ${PRIMARY_BRANCH}, and push:
git fetch ${remote} ${PRIMARY_BRANCH} && git rebase ${remote}/${PRIMARY_BRANCH} git fetch ${remote} ${PRIMARY_BRANCH} && git rebase ${remote}/${PRIMARY_BRANCH}

View file

@ -1,37 +1,43 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# nomad/ — Agent Instructions # nomad/ — Agent Instructions
Nomad + Vault HCL for the factory's single-node cluster. These files are Nomad + Vault HCL for the factory's single-node cluster. These files are
the source of truth that `lib/init/nomad/cluster-up.sh` copies onto a the source of truth that `lib/init/nomad/cluster-up.sh` copies onto a
factory box under `/etc/nomad.d/` and `/etc/vault.d/` at init time. factory box under `/etc/nomad.d/` and `/etc/vault.d/` at init time.
This directory is part of the **Nomad+Vault migration (Step 0)** — This directory covers the **Nomad+Vault migration (Steps 05)** —
see issues #821#825 for the step breakdown. Jobspecs land in Step 1. see issues #821#992 for the step breakdown.
## What lives here ## What lives here
| File | Deployed to | Owned by | | File/Dir | Deployed to | Owned by |
|---|---|---| |---|---|---|
| `server.hcl` | `/etc/nomad.d/server.hcl` | agent role, bind, ports, `data_dir` (S0.2) | | `server.hcl` | `/etc/nomad.d/server.hcl` | agent role, bind, ports, `data_dir` (S0.2) |
| `client.hcl` | `/etc/nomad.d/client.hcl` | Docker driver cfg + `host_volume` declarations (S0.2) | | `client.hcl` | `/etc/nomad.d/client.hcl` | Docker driver cfg + `host_volume` declarations (S0.2); `allow_privileged = true` for woodpecker-agent Docker-in-Docker (S3-fix-5, #961) |
| `vault.hcl` | `/etc/vault.d/vault.hcl` | Vault storage, listener, UI, `disable_mlock` (S0.3) | | `vault.hcl` | `/etc/vault.d/vault.hcl` | Vault storage, listener, UI, `disable_mlock` (S0.3) |
| `jobs/forgejo.hcl` | submitted via `lib/init/nomad/deploy.sh` | Forgejo job; reads creds from Vault via consul-template stanza (S2.4) |
| `jobs/woodpecker-server.hcl` | submitted via `lib/init/nomad/deploy.sh` | Woodpecker CI server; host networking, Vault KV for `WOODPECKER_AGENT_SECRET` + Forgejo OAuth creds (S3.1) |
| `jobs/woodpecker-agent.hcl` | submitted via `lib/init/nomad/deploy.sh` | Woodpecker CI agent; host networking, `docker.sock` mount, Vault KV for `WOODPECKER_AGENT_SECRET`; `WOODPECKER_SERVER` uses `${attr.unique.network.ip-address}:9000` (Nomad interpolation) — port binds to LXC alloc IP, not localhost (S3.2, S3-fix-6, #964) |
| `jobs/agents.hcl` | submitted via `lib/init/nomad/deploy.sh` | All 7 agent roles (dev, review, gardener, planner, predictor, supervisor, architect) + llama variant; Vault-templated bot tokens via `service-agents` policy; `force_pull = false` — image is built locally by `bin/disinto --with agents`, no registry (S4.1, S4-fix-2, S4-fix-5, #955, #972, #978) |
| `jobs/staging.hcl` | submitted via `lib/init/nomad/deploy.sh` | Caddy file-server mounting `docker/` as `/srv/site:ro`; no Vault integration; **dynamic host port** (no static 80 — edge owns 80/443, collision fixed in S5-fix-7 #1018); edge discovers via Nomad service registration (S5.2, #989) |
| `jobs/chat.hcl` | submitted via `lib/init/nomad/deploy.sh` | Claude chat UI; custom `disinto/chat:local` image; sandbox hardening (cap_drop ALL, **tmpfs via mount block** not `tmpfs=` arg — S5-fix-5 #1012, pids_limit 128); Vault-templated OAuth secrets via `service-chat` policy (S5.2, #989); rate limiting removed (#1084); **workspace volume** `chat-workspace` host_volume bind-mounted to `/var/workspace` for Claude project access (#1027) — operator must register `host_volume "chat-workspace"` in `client.hcl` on each node |
| `jobs/edge.hcl` | submitted via `lib/init/nomad/deploy.sh` | Caddy reverse proxy + dispatcher sidecar; routes /forge, /woodpecker, /staging, /chat; uses `disinto/edge:local` image built by `bin/disinto --with edge`; **both Caddy and dispatcher tasks use `network_mode = "host"`** — upstreams are `127.0.0.1:<port>` (forgejo :3000, woodpecker :8000, chat :8080), not Docker hostnames (#1031, #1034); `FORGE_URL` rendered via Nomad service discovery template (not static env) to handle bridge vs. host network differences (#1034); dispatcher Vault secret path changed to `kv/data/disinto/shared/ops-repo` (#1041); Vault-templated ops-repo creds via `service-dispatcher` policy (S5.1, #988); `/staging/*` strips `/staging` prefix before proxying (#1079); WebSocket endpoint `/chat/ws` added for streaming (#1026) |
Nomad auto-merges every `*.hcl` under `-config=/etc/nomad.d/`, so the Nomad auto-merges every `*.hcl` under `-config=/etc/nomad.d/`, so the
split between `server.hcl` and `client.hcl` is for readability, not split between `server.hcl` and `client.hcl` is for readability, not
semantics. The top-of-file header in each config documents which blocks semantics. The top-of-file header in each config documents which blocks
it owns. it owns.
## What does NOT live here yet ## Vault ACL policies
- **Jobspecs.** Step 0 brings up an *empty* cluster. Step 1 (and later) `vault/policies/` holds one `.hcl` file per Vault policy; see
adds `*.hcl` job files for forgejo, woodpecker, agents, caddy, [`vault/policies/AGENTS.md`](../vault/policies/AGENTS.md) for the naming
etc. When that lands, jobspecs will live in `nomad/jobs/` and each convention, KV path summary, and JWT-auth role bindings (S2.1/S2.3).
will get its own header comment pointing to the `host_volume` names
it consumes (`volume = "forgejo-data"`, etc. — declared in ## Not yet implemented
`client.hcl`).
- **TLS, ACLs, gossip encryption.** Deliberately absent in Step 0 — - **TLS, ACLs, gossip encryption** — deliberately absent for now; land
factory traffic stays on localhost. These land in later migration alongside multi-node support.
steps alongside multi-node support.
## Adding a jobspec (Step 1 and later) ## Adding a jobspec (Step 1 and later)
@ -59,8 +65,8 @@ it owns.
## How CI validates these files ## How CI validates these files
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/` `.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`
(including `nomad/jobs/`), `lib/init/nomad/`, or `bin/disinto`. Five (including `nomad/jobs/`), `lib/init/nomad/`, `bin/disinto`,
fail-closed steps: `vault/policies/`, or `vault/roles.yaml`. Eight fail-closed steps:
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`** 1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
— parses the HCL, fails on unknown blocks, bad port ranges, invalid — parses the HCL, fails on unknown blocks, bad port ranges, invalid
@ -85,19 +91,47 @@ fail-closed steps:
disables the runtime checks (CI containers don't have disables the runtime checks (CI containers don't have
`/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only, `/var/lib/vault/data` or port 8200). Exit 2 (advisory warnings only,
e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge. e.g. TLS-disabled listener) is tolerated; exit 1 blocks merge.
4. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`** 4. **`vault policy fmt` idempotence check on every `vault/policies/*.hcl`**
(S2.6) — `vault policy fmt` has no `-check` flag in 1.18.5, so the
step copies each file to `/tmp`, runs `vault policy fmt` on the copy,
and diffs against the original. Any non-empty diff means the
committed file would be rewritten by `fmt` and the step fails — the
author is pointed at `vault policy fmt <file>` to heal the drift.
5. **`vault policy write`-based validation against an inline dev-mode Vault**
(S2.6) — Vault 1.18.5 has no offline `policy validate` subcommand;
the CI step starts a dev-mode server, loops `vault policy write
<basename> <file>` over each `vault/policies/*.hcl`, and aggregates
failures so one CI run surfaces every broken policy. The server is
ephemeral and torn down on step exit — no persistence, no real
secrets. Catches unknown capability names (e.g. `"frobnicate"`),
malformed `path` blocks, and other semantic errors `fmt` does not.
6. **`vault/roles.yaml` validator** (S2.6) — yamllint + a PyYAML-based
check that every role's `policy:` field matches a basename under
`vault/policies/`, and that every role entry carries all four
required fields (`name`, `policy`, `namespace`, `job_id`). Drift
between the two directories is a scheduling-time "permission denied"
in production; this step turns it into a CI failure at PR time.
7. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
— all init/dispatcher shell clean. `bin/disinto` has no `.sh` — all init/dispatcher shell clean. `bin/disinto` has no `.sh`
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
it — this is the one place it gets checked. it — this is the one place it gets checked.
5. **`bats tests/disinto-init-nomad.bats`** 8. **`bats tests/disinto-init-nomad.bats`**
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`, — exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
`… --empty --dry-run`, and the `--backend=docker` regression guard. `… --empty --dry-run`, and the `--backend=docker` regression guard.
**Secret-scan coverage.** Policy HCL files under `vault/policies/` are
already swept by the P11 secret-scan gate
(`.woodpecker/secret-scan.yml`, #798), whose `vault/**/*` trigger path
covers everything in this directory. `nomad-validate.yml` intentionally
does NOT duplicate that gate — one scanner, one source of truth.
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1 If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
fails with a clear error; if it breaks a jobspec (e.g. misspells fails with a clear error; if it breaks a jobspec (e.g. misspells
`task` as `tsak`, or adds a `volume` stanza without a `source`), step `task` as `tsak`, or adds a `volume` stanza without a `source`), step
2 fails instead. The fix makes it pass. PRs that don't touch any of 2 fails; a typo in a `path "..."` block in a vault policy fails step 5
the trigger paths skip this pipeline entirely. with the Vault parser's error; a `roles.yaml` entry that points at a
policy basename that does not exist fails step 6. PRs that don't touch
any of the trigger paths skip this pipeline entirely.
## Version pinning ## Version pinning
@ -117,7 +151,13 @@ accept (or vice versa).
- `lib/init/nomad/` — installer + systemd units + cluster-up orchestrator. - `lib/init/nomad/` — installer + systemd units + cluster-up orchestrator.
- `.woodpecker/nomad-validate.yml` — this directory's CI pipeline. - `.woodpecker/nomad-validate.yml` — this directory's CI pipeline.
- `vault/policies/` — Vault ACL policies (S2.6) - `vault/policies/` — Vault ACL policy HCL files (S2.1); the
- `vault/policies/AGENTS.md` — policy lifecycle, CI enforcement, common failures `vault-policy-fmt` / `vault-policy-validate` CI steps above enforce
their shape. See [`../vault/policies/AGENTS.md`](../vault/policies/AGENTS.md)
for the policy lifecycle, CI enforcement details, and common failure
modes.
- `vault/roles.yaml` — JWT-auth role → policy bindings (S2.3); the
`vault-roles-validate` CI step above keeps it in lockstep with the
policies directory.
- Top-of-file headers in `server.hcl` / `client.hcl` / `vault.hcl` - Top-of-file headers in `server.hcl` / `client.hcl` / `vault.hcl`
document the per-file ownership contract. document the per-file ownership contract.

View file

@ -49,6 +49,12 @@ client {
read_only = false read_only = false
} }
# staging static content (docker/ directory with images, HTML, etc.)
host_volume "site-content" {
path = "/srv/disinto/docker"
read_only = true
}
# disinto chat transcripts + attachments. # disinto chat transcripts + attachments.
host_volume "chat-history" { host_volume "chat-history" {
path = "/srv/disinto/chat-history" path = "/srv/disinto/chat-history"
@ -64,11 +70,11 @@ client {
# Docker task driver. `volumes.enabled = true` is required so jobspecs # Docker task driver. `volumes.enabled = true` is required so jobspecs
# can mount host_volume declarations defined above. `allow_privileged` # can mount host_volume declarations defined above. `allow_privileged`
# stays false no factory workload needs privileged containers today, # is true — woodpecker-agent requires `privileged = true` to access
# and flipping it is an audit-worthy change. # docker.sock and spawn CI pipeline containers.
plugin "docker" { plugin "docker" {
config { config {
allow_privileged = false allow_privileged = true
volumes { volumes {
enabled = true enabled = true

207
nomad/jobs/agents.hcl Normal file
View file

@ -0,0 +1,207 @@
# =============================================================================
# nomad/jobs/agents.hcl All-role agent polling loop (Nomad service job)
#
# Part of the Nomad+Vault migration (S4.1, issue #955). Runs the main bot
# polling loop with all 7 agent roles (review, dev, gardener, architect,
# planner, predictor, supervisor) against the local llama server.
#
# Host_volume contract:
# This job mounts agent-data, project-repos, and ops-repo from
# nomad/client.hcl. Paths under /srv/disinto/* are created by
# lib/init/nomad/cluster-up.sh before any job references them.
#
# Vault integration (S4.1):
# - vault { role = "service-agents" } at group scope workload-identity
# JWT exchanged for a Vault token carrying the composite service-agents
# policy (vault/policies/service-agents.hcl), which grants read access
# to all 7 bot KV namespaces + vault bot + shared forge config.
# - template stanza renders per-bot FORGE_*_TOKEN + FORGE_PASS from Vault
# KV v2 at kv/disinto/bots/<role>.
# - Seeded on fresh boxes by tools/vault-seed-agents.sh.
#
# Not the runtime yet: docker-compose.yml is still the factory's live stack
# until cutover. This file exists so CI can validate it and S4.2 can wire
# `disinto init --backend=nomad --with agents` to `nomad job run` it.
# =============================================================================
job "agents" {
type = "service"
datacenters = ["dc1"]
group "agents" {
count = 1
# Vault workload identity (S4.1, issue #955)
# Composite role covering all 7 bot identities + vault bot. Role defined
# in vault/roles.yaml, policy in vault/policies/service-agents.hcl.
# Bound claim pins nomad_job_id = "agents".
vault {
role = "service-agents"
}
# No network port agents are outbound-only (poll forgejo, call llama).
# No service discovery block nothing health-checks agents over HTTP.
volume "agent-data" {
type = "host"
source = "agent-data"
read_only = false
}
volume "project-repos" {
type = "host"
source = "project-repos"
read_only = false
}
volume "ops-repo" {
type = "host"
source = "ops-repo"
read_only = true
}
# Conservative restart fail fast to the scheduler.
restart {
attempts = 3
interval = "5m"
delay = "15s"
mode = "delay"
}
# Service registration
# Agents are outbound-only (poll forgejo, call llama) no HTTP/TCP
# endpoint to probe. The Nomad native provider only supports tcp/http
# checks, not script checks. Registering without a check block means
# Nomad tracks health via task lifecycle: task running = healthy,
# task dead = service deregistered. This matches the docker-compose
# pgrep healthcheck semantics (process alive = healthy).
service {
name = "agents"
provider = "nomad"
}
task "agents" {
driver = "docker"
config {
image = "disinto/agents:local"
force_pull = false
# apparmor=unconfined matches docker-compose Claude Code needs
# ptrace for node.js inspector and /proc access.
security_opt = ["apparmor=unconfined"]
}
volume_mount {
volume = "agent-data"
destination = "/home/agent/data"
read_only = false
}
volume_mount {
volume = "project-repos"
destination = "/home/agent/repos"
read_only = false
}
volume_mount {
volume = "ops-repo"
destination = "/home/agent/repos/_factory/disinto-ops"
read_only = true
}
# Non-secret env
env {
FORGE_URL = "http://forgejo:3000"
FORGE_REPO = "disinto-admin/disinto"
ANTHROPIC_BASE_URL = "http://10.10.10.1:8081"
ANTHROPIC_API_KEY = "sk-no-key-required"
CLAUDE_MODEL = "unsloth/Qwen3.5-35B-A3B"
AGENT_ROLES = "review,dev,gardener,architect,planner,predictor,supervisor"
POLL_INTERVAL = "300"
DISINTO_CONTAINER = "1"
PROJECT_NAME = "project"
PROJECT_REPO_ROOT = "/home/agent/repos/project"
CLAUDE_TIMEOUT = "7200"
# llama-specific Claude Code tuning
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = "1"
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS = "1"
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE = "60"
}
# Vault-templated bot tokens (S4.1, issue #955)
# Renders per-bot FORGE_*_TOKEN + FORGE_PASS from Vault KV v2.
# Each `with secret ...` block reads one bot's KV path; the `else`
# branch emits short placeholders on fresh installs where the path
# is absent. Seed with tools/vault-seed-agents.sh.
#
# Placeholder values kept < 16 chars to avoid secret-scan CI failures.
# error_on_missing_key = false prevents template-pending hangs.
template {
destination = "secrets/bots.env"
env = true
change_mode = "restart"
error_on_missing_key = false
data = <<EOT
{{- with secret "kv/data/disinto/bots/dev" -}}
FORGE_TOKEN={{ .Data.data.token }}
FORGE_PASS={{ .Data.data.pass }}
{{- else -}}
# WARNING: run tools/vault-seed-agents.sh
FORGE_TOKEN=seed-me
FORGE_PASS=seed-me
{{- end }}
{{ with secret "kv/data/disinto/bots/review" -}}
FORGE_REVIEW_TOKEN={{ .Data.data.token }}
{{- else -}}
FORGE_REVIEW_TOKEN=seed-me
{{- end }}
{{ with secret "kv/data/disinto/bots/gardener" -}}
FORGE_GARDENER_TOKEN={{ .Data.data.token }}
{{- else -}}
FORGE_GARDENER_TOKEN=seed-me
{{- end }}
{{ with secret "kv/data/disinto/bots/architect" -}}
FORGE_ARCHITECT_TOKEN={{ .Data.data.token }}
{{- else -}}
FORGE_ARCHITECT_TOKEN=seed-me
{{- end }}
{{ with secret "kv/data/disinto/bots/planner" -}}
FORGE_PLANNER_TOKEN={{ .Data.data.token }}
{{- else -}}
FORGE_PLANNER_TOKEN=seed-me
{{- end }}
{{ with secret "kv/data/disinto/bots/predictor" -}}
FORGE_PREDICTOR_TOKEN={{ .Data.data.token }}
{{- else -}}
FORGE_PREDICTOR_TOKEN=seed-me
{{- end }}
{{ with secret "kv/data/disinto/bots/supervisor" -}}
FORGE_SUPERVISOR_TOKEN={{ .Data.data.token }}
{{- else -}}
FORGE_SUPERVISOR_TOKEN=seed-me
{{- end }}
{{ with secret "kv/data/disinto/bots/vault" -}}
FORGE_VAULT_TOKEN={{ .Data.data.token }}
{{- else -}}
FORGE_VAULT_TOKEN=seed-me
{{- end }}
EOT
}
# Agents run Claude/llama sessions need CPU + memory headroom.
resources {
cpu = 500
memory = 1024
}
}
}
}

188
nomad/jobs/chat.hcl Normal file
View file

@ -0,0 +1,188 @@
# =============================================================================
# nomad/jobs/chat.hcl Claude chat UI (Nomad service job)
#
# Part of the Nomad+Vault migration (S5.2, issue #989). Lightweight service
# job for the Claude chat UI with sandbox hardening (#706).
#
# Build:
# Custom image built from docker/chat/Dockerfile as disinto/chat:local
# (same :local pattern as disinto/agents:local).
#
# Sandbox hardening (#706):
# - Read-only root filesystem (enforced via entrypoint)
# - tmpfs /tmp:size=64m for runtime temp files
# - cap_drop ALL (no Linux capabilities)
# - pids_limit 128 (prevent fork bombs)
# - mem_limit 512m (matches compose sandbox hardening)
#
# Vault integration:
# - vault { role = "service-chat" } at group scope
# - Template stanza renders CHAT_OAUTH_CLIENT_ID, CHAT_OAUTH_CLIENT_SECRET,
# FORWARD_AUTH_SECRET from kv/disinto/shared/chat
# - Seeded on fresh boxes by tools/vault-seed-chat.sh
#
# Host volumes:
# - chat-history /var/lib/chat/history (persists conversation history)
# - workspace /var/workspace (project working tree for Claude access, #1027)
#
# Client-side host_volume registration (operator prerequisite):
# In nomad/client.hcl on each Nomad node:
# host_volume "chat-workspace" {
# path = "/var/disinto/chat-workspace"
# read_only = false
# }
# Nodes without the host_volume registered will not schedule the workspace mount.
#
# Not the runtime yet: docker-compose.yml is still the factory's live stack
# until cutover. This file exists so CI can validate it and S5.2 can wire
# `disinto init --backend=nomad --with chat` to `nomad job run` it.
# =============================================================================
job "chat" {
type = "service"
datacenters = ["dc1"]
group "chat" {
count = 1
# Vault workload identity (S5.2, issue #989)
# Role `service-chat` defined in vault/roles.yaml, policy in
# vault/policies/service-chat.hcl. Bound claim pins nomad_job_id = "chat".
vault {
role = "service-chat"
}
# Network
# External port 8080 for chat UI access (via edge proxy or direct).
network {
port "http" {
static = 8080
to = 8080
}
}
# Host volumes
# chat-history volume: declared in nomad/client.hcl, path
# /srv/disinto/chat-history on the factory box.
volume "chat-history" {
type = "host"
source = "chat-history"
read_only = false
}
# Workspace volume: bind-mounted project working tree for Claude access (#1027)
# Source is a fixed logical name resolved by client-side host_volume registration.
volume "workspace" {
type = "host"
source = "chat-workspace"
read_only = false
}
# Metadata (per-dispatch env var via NOMAD_META_*)
# CHAT_WORKSPACE_DIR: project working tree path, injected into task env
# as NOMAD_META_CHAT_WORKSPACE_DIR for the workspace volume mount target.
meta {
CHAT_WORKSPACE_DIR = "/var/workspace"
}
# Restart policy
restart {
attempts = 3
interval = "5m"
delay = "15s"
mode = "delay"
}
# Service registration
service {
name = "chat"
port = "http"
provider = "nomad"
check {
type = "http"
path = "/health"
interval = "10s"
timeout = "3s"
}
}
task "chat" {
driver = "docker"
config {
image = "disinto/chat:local"
force_pull = false
# Sandbox hardening (#706): cap_drop ALL, pids_limit 128, tmpfs /tmp
# ReadonlyRootfs enforced via entrypoint script (fails if running as root)
cap_drop = ["ALL"]
pids_limit = 128
mount {
type = "tmpfs"
target = "/tmp"
readonly = false
tmpfs_options {
size = 67108864 # 64MB in bytes
}
}
# Security options for sandbox hardening
# apparmor=unconfined needed for Claude CLI ptrace access
# no-new-privileges prevents privilege escalation
security_opt = ["apparmor=unconfined", "no-new-privileges"]
}
# Volume mounts
# Mount chat-history for conversation persistence
volume_mount {
volume = "chat-history"
destination = "/var/lib/chat/history"
read_only = false
}
# Mount workspace directory for Claude code access (#1027)
# Binds project working tree so Claude can inspect/modify code
volume_mount {
volume = "workspace"
destination = "/var/workspace"
read_only = false
}
# Environment: secrets from Vault (S5.2)
# CHAT_OAUTH_CLIENT_ID, CHAT_OAUTH_CLIENT_SECRET, FORWARD_AUTH_SECRET
# rendered from kv/disinto/shared/chat via template stanza.
env {
FORGE_URL = "http://forgejo:3000"
CHAT_WORKSPACE_DIR = "${NOMAD_META_CHAT_WORKSPACE_DIR}"
}
# Vault-templated secrets (S5.2, issue #989)
# Renders chat-secrets.env from Vault KV v2 at kv/disinto/shared/chat.
# Placeholder values kept < 16 chars to avoid secret-scan CI failures.
template {
destination = "secrets/chat-secrets.env"
env = true
change_mode = "restart"
error_on_missing_key = false
data = <<EOT
{{- with secret "kv/data/disinto/shared/chat" -}}
CHAT_OAUTH_CLIENT_ID={{ .Data.data.chat_oauth_client_id }}
CHAT_OAUTH_CLIENT_SECRET={{ .Data.data.chat_oauth_client_secret }}
FORWARD_AUTH_SECRET={{ .Data.data.forward_auth_secret }}
{{- else -}}
# WARNING: run tools/vault-seed-chat.sh
CHAT_OAUTH_CLIENT_ID=seed-me
CHAT_OAUTH_CLIENT_SECRET=seed-me
FORWARD_AUTH_SECRET=seed-me
{{- end -}}
EOT
}
# Sandbox hardening (S5.2, #706)
# Memory = 512MB (matches docker-compose sandbox hardening)
resources {
cpu = 200
memory = 512
}
}
}
}

285
nomad/jobs/edge.hcl Normal file
View file

@ -0,0 +1,285 @@
# =============================================================================
# nomad/jobs/edge.hcl Edge proxy (Caddy + dispatcher sidecar) (Nomad service job)
#
# Part of the Nomad+Vault migration (S5.1, issue #988). Caddy reverse proxy
# routes traffic to Forgejo, Woodpecker, staging, and chat services. The
# dispatcher sidecar polls disinto-ops for vault actions and dispatches them
# via Nomad batch jobs.
#
# Host networking (issue #1031):
# Caddy uses network_mode = "host" so upstreams are reached at
# 127.0.0.1:<port> (forgejo :3000, woodpecker :8000, chat :8080).
# Staging uses Nomad service discovery (S5-fix-7, issue #1018).
#
# Host_volume contract:
# This job mounts caddy-data from nomad/client.hcl. Path
# /srv/disinto/caddy-data is created by lib/init/nomad/cluster-up.sh before
# any job references it. Keep the `source = "caddy-data"` below in sync
# with the host_volume stanza in client.hcl.
#
# Build step (S5.1):
# docker/edge/Dockerfile is custom (adds bash, jq, curl, git, docker-cli,
# python3, openssh-client, autossh to caddy:latest). Build as
# disinto/edge:local using the same pattern as disinto/agents:local.
# Command: docker build -t disinto/edge:local -f docker/edge/Dockerfile docker/edge
#
# Not the runtime yet: docker-compose.yml is still the factory's live stack
# until cutover. This file exists so CI can validate it and S5.2 can wire
# `disinto init --backend=nomad --with edge` to `nomad job run` it.
# =============================================================================
job "edge" {
type = "service"
datacenters = ["dc1"]
group "edge" {
count = 1
# Vault workload identity for dispatcher (S5.1, issue #988)
# Service role for dispatcher task to fetch vault actions from KV v2.
# Role defined in vault/roles.yaml, policy in vault/policies/dispatcher.hcl.
vault {
role = "service-dispatcher"
}
# Network ports (S5.1, issue #988)
# Caddy listens on :80 and :443. Expose both on the host.
network {
port "http" {
static = 80
to = 80
}
port "https" {
static = 443
to = 443
}
}
# Host-volume mounts (S5.1, issue #988)
# caddy-data: ACME certificates, Caddy config state.
volume "caddy-data" {
type = "host"
source = "caddy-data"
read_only = false
}
# ops-repo: disinto-ops clone for vault actions polling.
volume "ops-repo" {
type = "host"
source = "ops-repo"
read_only = false
}
# Conservative restart policy
# Caddy should be stable; dispatcher may restart on errors.
restart {
attempts = 3
interval = "5m"
delay = "15s"
mode = "delay"
}
# Service registration
# Caddy is an HTTP reverse proxy health check on port 80.
service {
name = "edge"
port = "http"
provider = "nomad"
check {
type = "http"
path = "/"
interval = "10s"
timeout = "3s"
}
}
# Caddy task (S5.1, issue #988)
task "caddy" {
driver = "docker"
config {
# Use pre-built disinto/edge:local image (custom Dockerfile adds
# bash, jq, curl, git, docker-cli, python3, openssh-client, autossh).
image = "disinto/edge:local"
force_pull = false
network_mode = "host"
ports = ["http", "https"]
# apparmor=unconfined matches docker-compose needed for autossh
# in the entrypoint script.
security_opt = ["apparmor=unconfined"]
}
# Mount caddy-data volume for ACME state and config directory.
# Caddyfile is mounted at /etc/caddy/Caddyfile by entrypoint-edge.sh.
volume_mount {
volume = "caddy-data"
destination = "/data"
read_only = false
}
# Caddyfile via Nomad service discovery (S5-fix-7, issue #1018)
# Renders staging upstream from Nomad service registration instead of
# hardcoded staging:80. Caddy picks up /local/Caddyfile via entrypoint.
# Forge URL via Nomad service discovery (issue #1034) resolves forgejo
# service address/port dynamically for bridge network compatibility.
template {
destination = "local/forge.env"
env = true
change_mode = "restart"
data = <<EOT
{{ range service "forgejo" -}}
FORGE_URL=http://{{ .Address }}:{{ .Port }}
{{- end }}
EOT
}
template {
destination = "local/Caddyfile"
change_mode = "restart"
data = <<EOT
# Caddyfile edge proxy configuration (Nomad-rendered)
# Staging upstream discovered via Nomad service registration.
:80 {
# Redirect root to Forgejo
handle / {
redir /forge/ 302
}
# Reverse proxy to Forgejo
handle /forge/* {
reverse_proxy 127.0.0.1:3000
}
# Reverse proxy to Woodpecker CI
handle /ci/* {
reverse_proxy 127.0.0.1:8000
}
# Reverse proxy to staging dynamic port via Nomad service discovery
handle /staging/* {
uri strip_prefix /staging
{{ range nomadService "staging" }} reverse_proxy {{ .Address }}:{{ .Port }}
{{ end }} }
# Chat service reverse proxy to disinto-chat backend (#705)
# OAuth routes bypass forward_auth unauthenticated users need these (#709)
handle /chat/login {
reverse_proxy 127.0.0.1:8080
}
handle /chat/oauth/callback {
reverse_proxy 127.0.0.1:8080
}
# WebSocket endpoint for streaming (#1026)
handle /chat/ws {
header_up Upgrade $http.upgrade
header_up Connection $http.connection
reverse_proxy 127.0.0.1:8080
}
# Defense-in-depth: forward_auth stamps X-Forwarded-User from session (#709)
handle /chat/* {
forward_auth 127.0.0.1:8080 {
uri /chat/auth/verify
copy_headers X-Forwarded-User
header_up X-Forward-Auth-Secret {$FORWARD_AUTH_SECRET}
}
reverse_proxy 127.0.0.1:8080
}
}
EOT
}
# Non-secret env
env {
FORGE_REPO = "disinto-admin/disinto"
DISINTO_CONTAINER = "1"
PROJECT_NAME = "disinto"
}
# Caddy needs CPU + memory headroom for reverse proxy work.
resources {
cpu = 200
memory = 256
}
}
# Dispatcher task (S5.1, issue #988)
task "dispatcher" {
driver = "docker"
config {
# Use same disinto/agents:local image as other agents.
image = "disinto/agents:local"
force_pull = false
network_mode = "host"
# apparmor=unconfined matches docker-compose.
security_opt = ["apparmor=unconfined"]
# Mount docker.sock via bind-volume (not host volume) for legacy
# docker backend compat. Nomad host volumes require named volumes
# from client.hcl; socket files cannot be host volumes.
volumes = ["/var/run/docker.sock:/var/run/docker.sock:ro"]
}
# Mount ops-repo for vault actions polling.
volume_mount {
volume = "ops-repo"
destination = "/home/agent/repos/disinto-ops"
read_only = false
}
# Forge URL via Nomad service discovery (issue #1034)
# Resolves forgejo service address/port dynamically for bridge network
# compatibility. Template-scoped to dispatcher task (Nomad doesn't
# propagate templates across tasks).
template {
destination = "local/forge.env"
env = true
change_mode = "restart"
data = <<EOT
{{ range service "forgejo" -}}
FORGE_URL=http://{{ .Address }}:{{ .Port }}
{{- end }}
EOT
}
# Vault-templated secrets (S5.1, issue #988)
# Renders FORGE_TOKEN from Vault KV v2 for ops repo access.
template {
destination = "secrets/dispatcher.env"
env = true
change_mode = "restart"
error_on_missing_key = false
data = <<EOT
{{- with secret "kv/data/disinto/shared/ops-repo" -}}
FORGE_TOKEN={{ .Data.data.token }}
{{- else -}}
# WARNING: kv/disinto/shared/ops-repo is empty run tools/vault-seed-ops-repo.sh
FORGE_TOKEN=seed-me
{{- end }}
EOT
}
# Non-secret env
env {
DISPATCHER_BACKEND = "nomad"
FORGE_REPO = "disinto-admin/disinto"
FORGE_OPS_REPO = "disinto-admin/disinto-ops"
PRIMARY_BRANCH = "main"
DISINTO_CONTAINER = "1"
OPS_REPO_ROOT = "/home/agent/repos/disinto-ops"
FORGE_ADMIN_USERS = "vault-bot,admin"
}
# Dispatcher is lightweight minimal CPU + memory.
resources {
cpu = 100
memory = 256
}
}
}
}

View file

@ -1,9 +1,11 @@
# ============================================================================= # =============================================================================
# nomad/jobs/forgejo.hcl Forgejo git server (Nomad service job) # nomad/jobs/forgejo.hcl Forgejo git server (Nomad service job)
# #
# Part of the Nomad+Vault migration (S1.1, issue #840). First jobspec to # Part of the Nomad+Vault migration (S1.1, issue #840; S2.4, issue #882).
# land under nomad/jobs/ proves the docker driver + host_volume plumbing # First jobspec to land under nomad/jobs/ proves the docker driver +
# from Step 0 (client.hcl) by running a real factory service. # host_volume plumbing from Step 0 (client.hcl) by running a real factory
# service. S2.4 layered Vault integration on top: admin/internal secrets
# now render via workload identity + template stanza instead of inline env.
# #
# Host_volume contract: # Host_volume contract:
# This job mounts the `forgejo-data` host_volume declared in # This job mounts the `forgejo-data` host_volume declared in
@ -12,11 +14,18 @@
# references it. Keep the `source = "forgejo-data"` below in sync with the # references it. Keep the `source = "forgejo-data"` below in sync with the
# host_volume stanza in client.hcl — drift = scheduling failures. # host_volume stanza in client.hcl — drift = scheduling failures.
# #
# No Vault integration yet Step 2 (#...) templates in OAuth secrets and # Vault integration (S2.4):
# replaces the inline FORGEJO__oauth2__* bits. The env vars below are the # - vault { role = "service-forgejo" } at the group scope the task's
# subset of docker-compose.yml's forgejo service that does NOT depend on # workload-identity JWT is exchanged for a Vault token carrying the
# secrets: DB type, public URL, install lock, registration lockdown, webhook # policy named on that role. Role + policy are defined in
# allow-list. OAuth app registration lands later, per-service. # vault/roles.yaml + vault/policies/service-forgejo.hcl.
# - template { destination = "secrets/forgejo.env" env = true } pulls
# FORGEJO__security__{SECRET_KEY,INTERNAL_TOKEN} out of Vault KV v2
# at kv/disinto/shared/forgejo and merges them into the task env.
# Seeded on fresh boxes by tools/vault-seed-forgejo.sh.
# - Non-secret env (DB type, ROOT_URL, ports, registration lockdown,
# webhook allow-list) stays inline below not sensitive, not worth
# round-tripping through Vault.
# #
# Not the runtime yet: docker-compose.yml is still the factory's live stack # Not the runtime yet: docker-compose.yml is still the factory's live stack
# until cutover. This file exists so CI can validate it and S1.3 can wire # until cutover. This file exists so CI can validate it and S1.3 can wire
@ -30,6 +39,16 @@ job "forgejo" {
group "forgejo" { group "forgejo" {
count = 1 count = 1
# Vault workload identity (S2.4, issue #882)
# `role = "service-forgejo"` is defined in vault/roles.yaml and
# applied by tools/vault-apply-roles.sh (S2.3). The role's bound
# claim pins nomad_job_id = "forgejo" renaming this jobspec's
# `job "forgejo"` without updating vault/roles.yaml will make token
# exchange fail at placement with a "claim mismatch" error.
vault {
role = "service-forgejo"
}
# Static :3000 matches docker-compose's published port so the rest of # Static :3000 matches docker-compose's published port so the rest of
# the factory (agents, woodpecker, caddy) keeps reaching forgejo at the # the factory (agents, woodpecker, caddy) keeps reaching forgejo at the
# same host:port during and after cutover. `to = 3000` maps the host # same host:port during and after cutover. `to = 3000` maps the host
@ -89,9 +108,10 @@ job "forgejo" {
read_only = false read_only = false
} }
# Mirrors the non-secret env set from docker-compose.yml's forgejo # Non-secret env DB type, public URL, ports, install lock,
# service. OAuth/secret-bearing env vars land in Step 2 via Vault # registration lockdown, webhook allow-list. Nothing sensitive here,
# templates do NOT add them here. # so this stays inline. Secret-bearing env (SECRET_KEY, INTERNAL_TOKEN)
# lives in the template stanza below and is merged into task env.
env { env {
FORGEJO__database__DB_TYPE = "sqlite3" FORGEJO__database__DB_TYPE = "sqlite3"
FORGEJO__server__ROOT_URL = "http://forgejo:3000/" FORGEJO__server__ROOT_URL = "http://forgejo:3000/"
@ -101,6 +121,62 @@ job "forgejo" {
FORGEJO__webhook__ALLOWED_HOST_LIST = "private" FORGEJO__webhook__ALLOWED_HOST_LIST = "private"
} }
# Vault-templated secrets env (S2.4, issue #882)
# Renders `<task-dir>/secrets/forgejo.env` (per-alloc secrets dir,
# never on disk on the host root filesystem, never in `nomad job
# inspect` output). `env = true` merges every KEY=VAL line into the
# task environment. `change_mode = "restart"` re-runs the task
# whenever a watched secret's value in Vault changes so `vault kv
# put ` alone is enough to roll new secrets; no manual
# `nomad alloc restart` required (though that also works it
# forces a re-render).
#
# Vault path: `kv/data/disinto/shared/forgejo`. The literal `/data/`
# segment is required by consul-template for KV v2 mounts without
# it the template would read from a KV v1 path that doesn't exist
# (the policy in vault/policies/service-forgejo.hcl grants
# `kv/data/disinto/shared/forgejo/*`, confirming v2).
#
# Empty-Vault fallback (`with ... else ...`): on a fresh LXC where
# the KV path is absent, consul-template's `with` short-circuits to
# the `else` branch. Emitting visible placeholders (instead of no
# env vars) means the container still boots, but with obviously-bad
# secrets that an operator will spot in `env | grep FORGEJO`
# better than forgejo silently regenerating SECRET_KEY on every
# restart and invalidating every prior session. Seed the path with
# tools/vault-seed-forgejo.sh to replace the placeholders.
#
# Placeholder values are kept short on purpose: the repo-wide
# secret-scan (.woodpecker/secret-scan.yml lib/secret-scan.sh)
# flags `TOKEN=<16+ non-space chars>` as a plaintext secret, so a
# descriptive long placeholder (e.g. "run-tools-vault-seed-...") on
# the INTERNAL_TOKEN line would fail CI on every PR that touched
# this file. "seed-me" is < 16 chars and still distinctive enough
# to surface in a `grep FORGEJO__security__` audit. The template
# comment below carries the operator-facing fix pointer.
# `error_on_missing_key = false` stops consul-template from blocking
# the alloc on template-pending when the Vault KV path exists but a
# referenced key is absent (or the path itself is absent and the
# else-branch placeholders are used). Without this, a fresh-LXC
# `disinto init --with forgejo` against an empty Vault hangs on
# template-pending until deploy.sh times out (issue #912, bug #4).
template {
destination = "secrets/forgejo.env"
env = true
change_mode = "restart"
error_on_missing_key = false
data = <<EOT
{{- with secret "kv/data/disinto/shared/forgejo" -}}
FORGEJO__security__SECRET_KEY={{ .Data.data.secret_key }}
FORGEJO__security__INTERNAL_TOKEN={{ .Data.data.internal_token }}
{{- else -}}
# WARNING: kv/disinto/shared/forgejo is empty run tools/vault-seed-forgejo.sh
FORGEJO__security__SECRET_KEY=seed-me
FORGEJO__security__INTERNAL_TOKEN=seed-me
{{- end -}}
EOT
}
# Baseline tune once we have real usage numbers under nomad. The # Baseline tune once we have real usage numbers under nomad. The
# docker-compose stack runs forgejo uncapped; these limits exist so # docker-compose stack runs forgejo uncapped; these limits exist so
# an unhealthy forgejo can't starve the rest of the node. # an unhealthy forgejo can't starve the rest of the node.

86
nomad/jobs/staging.hcl Normal file
View file

@ -0,0 +1,86 @@
# =============================================================================
# nomad/jobs/staging.hcl Staging file server (Nomad service job)
#
# Part of the Nomad+Vault migration (S5.2, issue #989). Lightweight service job
# for the staging file server using Caddy as a static file server.
#
# Mount contract:
# This job mounts the `docker/` directory as `/srv/site` (read-only).
# The docker/ directory contains static content (images, HTML, etc.)
# served to staging environment users.
#
# Network:
# Dynamic host port edge discovers via Nomad service registration.
# No static port to avoid collisions with edge (which owns 80/443).
#
# Not the runtime yet: docker-compose.yml is still the factory's live stack
# until cutover. This file exists so CI can validate it and S5.2 can wire
# `disinto init --backend=nomad --with staging` to `nomad job run` it.
# =============================================================================
job "staging" {
type = "service"
datacenters = ["dc1"]
group "staging" {
count = 1
# No Vault integration needed no secrets required (static file server)
# Internal service dynamic host port. Edge discovers via Nomad service.
network {
port "http" {
to = 80
}
}
volume "site-content" {
type = "host"
source = "site-content"
read_only = true
}
restart {
attempts = 3
interval = "5m"
delay = "15s"
mode = "delay"
}
service {
name = "staging"
port = "http"
provider = "nomad"
check {
type = "http"
path = "/"
interval = "10s"
timeout = "3s"
}
}
task "staging" {
driver = "docker"
config {
image = "caddy:alpine"
ports = ["http"]
command = "caddy"
args = ["file-server", "--root", "/srv/site"]
}
# Mount docker/ directory as /srv/site:ro (static content)
volume_mount {
volume = "site-content"
destination = "/srv/site"
read_only = true
}
resources {
cpu = 100
memory = 256
}
}
}
}

137
nomad/jobs/vault-runner.hcl Normal file
View file

@ -0,0 +1,137 @@
# =============================================================================
# nomad/jobs/vault-runner.hcl Parameterized batch job for vault action dispatch
#
# Part of the Nomad+Vault migration (S5.3, issue #990). Replaces the
# `docker run --rm vault-runner-${action_id}` pattern in dispatcher.sh with
# a Nomad-native parameterized batch job. Dispatched by the edge dispatcher
# (S5.4) via `nomad job dispatch`.
#
# Parameterized meta:
# action_id vault action identifier (used by entrypoint-runner.sh)
# secrets_csv comma-separated secret names (e.g. "GITHUB_TOKEN,DEPLOY_KEY")
#
# Vault integration (approach A pre-defined templates):
# All 6 known runner secrets are rendered via template stanzas with
# error_on_missing_key = false. Secrets not granted by the dispatch's
# Vault policies render as empty strings. The dispatcher (S5.4) sets
# vault { policies = [...] } per-dispatch based on the action TOML's
# secrets=[...] list, scoping access to only the declared secrets.
#
# Cleanup: Nomad garbage-collects completed batch dispatches automatically.
# =============================================================================
job "vault-runner" {
type = "batch"
datacenters = ["dc1"]
parameterized {
meta_required = ["action_id", "secrets_csv"]
}
group "runner" {
count = 1
# Vault workload identity
# Per-dispatch policies are composed by the dispatcher (S5.4) based on the
# action TOML's secrets=[...] list. Each policy grants read access to
# exactly one kv/data/disinto/runner/<NAME> path. Roles defined in
# vault/roles.yaml (runner-<NAME>), policies in vault/policies/.
vault {}
volume "ops-repo" {
type = "host"
source = "ops-repo"
read_only = true
}
# No restart for batch fail fast, let the dispatcher handle retries.
restart {
attempts = 0
mode = "fail"
}
task "runner" {
driver = "docker"
config {
image = "disinto/agents:local"
force_pull = false
entrypoint = ["bash"]
args = [
"/home/agent/disinto/docker/runner/entrypoint-runner.sh",
"${NOMAD_META_action_id}",
]
}
volume_mount {
volume = "ops-repo"
destination = "/home/agent/ops"
read_only = true
}
# Non-secret env
env {
DISINTO_CONTAINER = "1"
FACTORY_ROOT = "/home/agent/disinto"
OPS_REPO_ROOT = "/home/agent/ops"
}
# Vault-templated runner secrets (approach A)
# Pre-defined templates for all 6 known runner secrets. Each renders
# from kv/data/disinto/runner/<NAME>. Secrets not granted by the
# dispatch's Vault policies produce empty env vars (harmless).
# error_on_missing_key = false prevents template-pending hangs when
# a secret path is absent or the policy doesn't grant access.
#
# Placeholder values kept < 16 chars to avoid secret-scan CI failures.
template {
destination = "secrets/runner.env"
env = true
error_on_missing_key = false
data = <<EOT
{{- with secret "kv/data/disinto/runner/GITHUB_TOKEN" -}}
GITHUB_TOKEN={{ .Data.data.value }}
{{- else -}}
GITHUB_TOKEN=
{{- end }}
{{ with secret "kv/data/disinto/runner/CODEBERG_TOKEN" -}}
CODEBERG_TOKEN={{ .Data.data.value }}
{{- else -}}
CODEBERG_TOKEN=
{{- end }}
{{ with secret "kv/data/disinto/runner/CLAWHUB_TOKEN" -}}
CLAWHUB_TOKEN={{ .Data.data.value }}
{{- else -}}
CLAWHUB_TOKEN=
{{- end }}
{{ with secret "kv/data/disinto/runner/DEPLOY_KEY" -}}
DEPLOY_KEY={{ .Data.data.value }}
{{- else -}}
DEPLOY_KEY=
{{- end }}
{{ with secret "kv/data/disinto/runner/NPM_TOKEN" -}}
NPM_TOKEN={{ .Data.data.value }}
{{- else -}}
NPM_TOKEN=
{{- end }}
{{ with secret "kv/data/disinto/runner/DOCKER_HUB_TOKEN" -}}
DOCKER_HUB_TOKEN={{ .Data.data.value }}
{{- else -}}
DOCKER_HUB_TOKEN=
{{- end }}
EOT
}
# Formula execution headroom matches agents.hcl baseline.
resources {
cpu = 500
memory = 1024
}
}
}
}

View file

@ -0,0 +1,147 @@
# =============================================================================
# nomad/jobs/woodpecker-agent.hcl Woodpecker CI agent (Nomad service job)
#
# Part of the Nomad+Vault migration (S3.2, issue #935).
# Drop-in for the current docker-compose setup with host networking +
# docker.sock mount, enabling the agent to spawn containers via the
# mounted socket.
#
# Host networking:
# Uses network_mode = "host" to match the compose setup. The Woodpecker
# server gRPC endpoint is addressed via Nomad service discovery using
# the host's IP address (10.10.10.x:9000), since the server's port
# binding in Nomad binds to the allocation's IP, not localhost.
#
# Vault integration:
# - vault { role = "service-woodpecker-agent" } at the group scope the
# task's workload-identity JWT is exchanged for a Vault token carrying
# the policy named on that role. Role + policy are defined in
# vault/roles.yaml + vault/policies/service-woodpecker.hcl.
# - template stanza pulls WOODPECKER_AGENT_SECRET from Vault KV v2
# at kv/disinto/shared/woodpecker and writes it to secrets/agent.env.
# Seeded on fresh boxes by tools/vault-seed-woodpecker.sh.
# =============================================================================
job "woodpecker-agent" {
type = "service"
datacenters = ["dc1"]
group "woodpecker-agent" {
count = 1
# Vault workload identity
# `role = "service-woodpecker-agent"` is defined in vault/roles.yaml and
# applied by tools/vault-apply-roles.sh. The role's bound
# claim pins nomad_job_id = "woodpecker-agent" renaming this
# jobspec's `job "woodpecker-agent"` without updating vault/roles.yaml
# will make token exchange fail at placement with a "claim mismatch"
# error.
vault {
role = "service-woodpecker-agent"
}
# Health check port: static 3333 for Nomad service discovery. The agent
# exposes :3333/healthz for Nomad to probe.
network {
port "healthz" {
static = 3333
}
}
# Native Nomad service discovery for the health check endpoint.
service {
name = "woodpecker-agent"
port = "healthz"
provider = "nomad"
check {
type = "http"
path = "/healthz"
interval = "10s"
timeout = "3s"
}
}
# Conservative restart policy fail fast to the scheduler instead of
# spinning on a broken image/config. 3 attempts over 5m, then back off.
restart {
attempts = 3
interval = "5m"
delay = "15s"
mode = "delay"
}
task "woodpecker-agent" {
driver = "docker"
config {
image = "woodpeckerci/woodpecker-agent:v3"
network_mode = "host"
privileged = true
volumes = ["/var/run/docker.sock:/var/run/docker.sock"]
}
# Non-secret env server address, gRPC security, concurrency limit,
# and health check endpoint. Nothing sensitive here.
#
# WOODPECKER_SERVER uses Nomad's attribute template to get the host's
# IP address (10.10.10.x). The server's gRPC port 9000 is bound via
# Nomad's port stanza to the allocation's IP (not localhost), so the
# agent must use the LXC's eth0 IP, not 127.0.0.1.
env {
WOODPECKER_SERVER = "${attr.unique.network.ip-address}:9000"
WOODPECKER_GRPC_SECURE = "false"
WOODPECKER_GRPC_KEEPALIVE_TIME = "10s"
WOODPECKER_GRPC_KEEPALIVE_TIMEOUT = "20s"
WOODPECKER_GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS = "true"
WOODPECKER_MAX_WORKFLOWS = "1"
WOODPECKER_HEALTHCHECK_ADDR = ":3333"
}
# Vault-templated agent secret
# Renders <task-dir>/secrets/agent.env (per-alloc secrets dir,
# never on disk on the host root filesystem, never in `nomad job
# inspect` output). `env = true` merges WOODPECKER_AGENT_SECRET
# from the file into the task environment.
#
# Vault path: `kv/data/disinto/shared/woodpecker`. The literal
# `/data/` segment is required by consul-template for KV v2 mounts.
#
# Empty-Vault fallback (`with ... else ...`): on a fresh LXC where
# the KV path is absent, consul-template's `with` short-circuits to
# the `else` branch. Emitting a visible placeholder means the
# container still boots, but with an obviously-bad secret that an
# operator will spot better than the agent failing silently with
# auth errors. Seed the path with tools/vault-seed-woodpecker.sh
# to replace the placeholder.
#
# Placeholder values are kept short on purpose: the repo-wide
# secret-scan (.woodpecker/secret-scan.yml lib/secret-scan.sh)
# flags `TOKEN=<16+ non-space chars>` as a plaintext secret, so a
# descriptive long placeholder would fail CI on every PR that touched
# this file. "seed-me" is < 16 chars and still distinctive enough
# to surface in a `grep WOODPECKER` audit.
template {
destination = "secrets/agent.env"
env = true
change_mode = "restart"
error_on_missing_key = false
data = <<EOT
{{- with secret "kv/data/disinto/shared/woodpecker" -}}
WOODPECKER_AGENT_SECRET={{ .Data.data.agent_secret }}
{{- else -}}
# WARNING: kv/disinto/shared/woodpecker is empty run tools/vault-seed-woodpecker.sh
WOODPECKER_AGENT_SECRET=seed-me
{{- end -}}
EOT
}
# Baseline tune once we have real usage numbers under nomad.
# Conservative limits so an unhealthy agent can't starve the node.
resources {
cpu = 200
memory = 256
}
}
}
}

View file

@ -0,0 +1,173 @@
# =============================================================================
# nomad/jobs/woodpecker-server.hcl Woodpecker CI server (Nomad service job)
#
# Part of the Nomad+Vault migration (S3.1, issue #934).
# Runs the Woodpecker CI web UI + gRPC endpoint as a Nomad service job,
# reading its Forgejo OAuth + agent secret from Vault via workload identity.
#
# Host_volume contract:
# This job mounts the `woodpecker-data` host_volume declared in
# nomad/client.hcl. That volume is backed by /srv/disinto/woodpecker-data
# on the factory box, created by lib/init/nomad/cluster-up.sh before any
# job references it. Keep the `source = "woodpecker-data"` below in sync
# with the host_volume stanza in client.hcl — drift = scheduling failures.
#
# Vault integration (S2.4 pattern):
# - vault { role = "service-woodpecker" } at the group scope the task's
# workload-identity JWT is exchanged for a Vault token carrying the
# policy named on that role. Role + policy are defined in
# vault/roles.yaml + vault/policies/service-woodpecker.hcl.
# - template { destination = "secrets/wp.env" env = true } pulls
# WOODPECKER_AGENT_SECRET, WOODPECKER_FORGEJO_CLIENT, and
# WOODPECKER_FORGEJO_SECRET out of Vault KV v2 at
# kv/disinto/shared/woodpecker and merges them into the task env.
# Agent secret seeded by tools/vault-seed-woodpecker.sh; OAuth
# client/secret seeded by S3.3 (wp-oauth-register.sh).
# - Non-secret env (DB driver, Forgejo URL, host URL, open registration)
# stays inline below not sensitive, not worth round-tripping through
# Vault.
#
# Not the runtime yet: docker-compose.yml is still the factory's live stack
# until cutover. This file exists so CI can validate it and S3.4 can wire
# `disinto init --backend=nomad --with woodpecker` to `nomad job run` it.
# =============================================================================
job "woodpecker-server" {
type = "service"
datacenters = ["dc1"]
group "woodpecker-server" {
count = 1
# Vault workload identity (S2.4 pattern)
# `role = "service-woodpecker"` is defined in vault/roles.yaml and
# applied by tools/vault-apply-roles.sh (S2.3). The role's bound
# claim pins nomad_job_id = "woodpecker" note the job_id in
# vault/roles.yaml is "woodpecker" (matching the roles.yaml entry),
# but the actual Nomad job name here is "woodpecker-server". Update
# vault/roles.yaml job_id to "woodpecker-server" if the bound claim
# enforces an exact match at placement.
vault {
role = "service-woodpecker"
}
# HTTP UI (:8000) + gRPC agent endpoint (:9000). Static ports match
# docker-compose's published ports so the rest of the factory keeps
# reaching woodpecker at the same host:port during and after cutover.
network {
port "http" {
static = 8000
to = 8000
}
port "grpc" {
static = 9000
to = 9000
}
}
# Host-volume mount: declared in nomad/client.hcl, path
# /srv/disinto/woodpecker-data on the factory box.
volume "woodpecker-data" {
type = "host"
source = "woodpecker-data"
read_only = false
}
# Conservative restart policy fail fast to the scheduler instead of
# spinning on a broken image/config. 3 attempts over 5m, then back off.
restart {
attempts = 3
interval = "5m"
delay = "15s"
mode = "delay"
}
# Native Nomad service discovery (no Consul in this factory cluster).
# Health check gates the service as healthy only after the HTTP API is
# up; initial_status is deliberately unset so Nomad waits for the first
# probe to pass before marking the allocation healthy on boot.
service {
name = "woodpecker"
port = "http"
provider = "nomad"
check {
type = "http"
path = "/healthz"
interval = "10s"
timeout = "3s"
}
}
task "woodpecker-server" {
driver = "docker"
config {
image = "woodpeckerci/woodpecker-server:v3"
ports = ["http", "grpc"]
}
volume_mount {
volume = "woodpecker-data"
destination = "/var/lib/woodpecker"
read_only = false
}
# Non-secret env Forgejo integration flags, public URL, DB driver.
# Nothing sensitive here, so this stays inline. Secret-bearing env
# (agent secret, OAuth client/secret) lives in the template stanza
# below and is merged into task env.
env {
WOODPECKER_FORGEJO = "true"
WOODPECKER_FORGEJO_URL = "http://forgejo:3000"
WOODPECKER_HOST = "http://woodpecker:8000"
WOODPECKER_OPEN = "true"
WOODPECKER_DATABASE_DRIVER = "sqlite3"
WOODPECKER_DATABASE_DATASOURCE = "/var/lib/woodpecker/woodpecker.sqlite"
}
# Vault-templated secrets env (S2.4 pattern)
# Renders `<task-dir>/secrets/wp.env` (per-alloc secrets dir, never on
# disk on the host root filesystem). `env = true` merges every KEY=VAL
# line into the task environment. `change_mode = "restart"` re-runs the
# task whenever a watched secret's value in Vault changes.
#
# Vault path: `kv/data/disinto/shared/woodpecker`. The literal `/data/`
# segment is required by consul-template for KV v2 mounts.
#
# Empty-Vault fallback (`with ... else ...`): on a fresh LXC where
# the KV path is absent, consul-template's `with` short-circuits to
# the `else` branch. Emitting visible placeholders means the container
# still boots, but with obviously-bad secrets. Seed the path with
# tools/vault-seed-woodpecker.sh (agent_secret) and S3.3's
# wp-oauth-register.sh (forgejo_client, forgejo_secret).
#
# Placeholder values are kept short on purpose: the repo-wide
# secret-scan flags `TOKEN=<16+ non-space chars>` as a plaintext
# secret; "seed-me" is < 16 chars and still distinctive.
template {
destination = "secrets/wp.env"
env = true
change_mode = "restart"
error_on_missing_key = false
data = <<EOT
{{- with secret "kv/data/disinto/shared/woodpecker" -}}
WOODPECKER_AGENT_SECRET={{ .Data.data.agent_secret }}
WOODPECKER_FORGEJO_CLIENT={{ .Data.data.forgejo_client }}
WOODPECKER_FORGEJO_SECRET={{ .Data.data.forgejo_secret }}
{{- else -}}
# WARNING: kv/disinto/shared/woodpecker is empty run tools/vault-seed-woodpecker.sh + S3.3
WOODPECKER_AGENT_SECRET=seed-me
WOODPECKER_FORGEJO_CLIENT=seed-me
WOODPECKER_FORGEJO_SECRET=seed-me
{{- end -}}
EOT
}
resources {
cpu = 300
memory = 512
}
}
}
}

View file

@ -51,3 +51,26 @@ advertise {
ui { ui {
enabled = true enabled = true
} }
# Vault integration (S2.3, issue #881)
# Nomad jobs exchange their short-lived workload-identity JWT (signed by
# nomad's built-in signer at /.well-known/jwks.json on :4646) for a Vault
# token carrying the policies named by the role in `vault { role = "..." }`
# of each jobspec no shared VAULT_TOKEN in job env.
#
# The JWT auth path (jwt-nomad) + per-role bindings live on the Vault
# side, written by lib/init/nomad/vault-nomad-auth.sh + tools/vault-apply-roles.sh.
# Roles are defined in vault/roles.yaml.
#
# `default_identity.aud = ["vault.io"]` matches bound_audiences on every
# role in vault/roles.yaml a drift here would silently break every job's
# Vault token exchange at placement time.
vault {
enabled = true
address = "http://127.0.0.1:8200"
default_identity {
aud = ["vault.io"]
ttl = "1h"
}
}

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Planner Agent # Planner Agent
**Role**: Strategic planning using a Prerequisite Tree (Theory of Constraints), **Role**: Strategic planning using a Prerequisite Tree (Theory of Constraints),

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Predictor Agent # Predictor Agent
**Role**: Abstract adversary (the "goblin"). Runs a 2-step formula **Role**: Abstract adversary (the "goblin"). Runs a 2-step formula

View file

@ -59,6 +59,23 @@ check_pipeline_stall = false
# compact_pct = 60 # compact_pct = 60
# poll_interval = 60 # poll_interval = 60
# Edge routing mode (default: subpath)
#
# Controls how services are exposed through the edge proxy.
# subpath — all services under <project>.disinto.ai/{forge,ci,chat,staging}
# subdomain — per-service subdomains: forge.<project>, ci.<project>, chat.<project>
#
# Set to "subdomain" if subpath routing causes unfixable issues (redirect loops,
# OAuth callback mismatches, cookie collisions). See docs/edge-routing-fallback.md.
#
# Set in .env (not TOML) since it's consumed by docker-compose and shell scripts:
# EDGE_ROUTING_MODE=subdomain
#
# In subdomain mode, `disinto edge register` also writes:
# EDGE_TUNNEL_FQDN_FORGE=forge.<project>.disinto.ai
# EDGE_TUNNEL_FQDN_CI=ci.<project>.disinto.ai
# EDGE_TUNNEL_FQDN_CHAT=chat.<project>.disinto.ai
# [mirrors] # [mirrors]
# github = "git@github.com:johba/disinto.git" # github = "git@github.com:johba/disinto.git"
# codeberg = "git@codeberg.org:johba/disinto.git" # codeberg = "git@codeberg.org:johba/disinto.git"

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Review Agent # Review Agent
**Role**: AI-powered PR review — post structured findings and formal **Role**: AI-powered PR review — post structured findings and formal

View file

@ -52,8 +52,35 @@ REVIEW_TMPDIR=$(mktemp -d)
log() { printf '[%s] PR#%s %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$PR_NUMBER" "$*" >> "$LOGFILE"; } log() { printf '[%s] PR#%s %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$PR_NUMBER" "$*" >> "$LOGFILE"; }
status() { printf '[%s] PR #%s: %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$PR_NUMBER" "$*" > "$STATUSFILE"; log "$*"; } status() { printf '[%s] PR #%s: %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$PR_NUMBER" "$*" > "$STATUSFILE"; log "$*"; }
cleanup() { rm -rf "$REVIEW_TMPDIR" "$LOCKFILE" "$STATUSFILE" "/tmp/${PROJECT_NAME}-review-graph-${PR_NUMBER}.json"; }
trap cleanup EXIT # cleanup — remove temp files (NOT lockfile — cleanup_on_exit handles that)
cleanup() {
rm -rf "$REVIEW_TMPDIR" "$STATUSFILE" "/tmp/${PROJECT_NAME}-review-graph-${PR_NUMBER}.json"
}
# cleanup_on_exit — defensive cleanup: remove lockfile if we own it, kill residual children
# This handles the case where review-pr.sh is terminated unexpectedly (e.g., watchdog SIGTERM)
cleanup_on_exit() {
local ec=$?
# Remove lockfile only if we own it (PID matches $$)
if [ -f "$LOCKFILE" ] && [ -n "$(cat "$LOCKFILE" 2>/dev/null)" ]; then
if [ "$(cat "$LOCKFILE" 2>/dev/null)" = "$$" ]; then
rm -f "$LOCKFILE"
log "cleanup_on_exit: removed lockfile (we owned it)"
fi
fi
# Kill any direct children that may have been spawned by this process
# (e.g., bash -c commands from Claude's Bash tool that didn't get reaped)
pkill -P $$ 2>/dev/null || true
# Call the main cleanup function to remove temp files
cleanup
exit "$ec"
}
trap cleanup_on_exit EXIT INT TERM
# Note: EXIT trap is already set above. The cleanup function is still available for
# non-error exits (e.g., normal completion via exit 0 after verdict posted).
# When review succeeds, we want to skip lockfile removal since the verdict was posted.
# ============================================================================= # =============================================================================
# LOG ROTATION # LOG ROTATION
@ -104,6 +131,7 @@ if [ "$PR_STATE" != "open" ]; then
log "SKIP: state=${PR_STATE}" log "SKIP: state=${PR_STATE}"
worktree_cleanup "$WORKTREE" worktree_cleanup "$WORKTREE"
rm -f "$OUTPUT_FILE" "$SID_FILE" 2>/dev/null || true rm -f "$OUTPUT_FILE" "$SID_FILE" 2>/dev/null || true
rm -f "$LOCKFILE"
exit 0 exit 0
fi fi
@ -113,7 +141,7 @@ fi
CI_STATE=$(ci_commit_status "$PR_SHA") CI_STATE=$(ci_commit_status "$PR_SHA")
CI_NOTE="" CI_NOTE=""
if ! ci_passed "$CI_STATE"; then if ! ci_passed "$CI_STATE"; then
ci_required_for_pr "$PR_NUMBER" && { log "SKIP: CI=${CI_STATE}"; exit 0; } ci_required_for_pr "$PR_NUMBER" && { log "SKIP: CI=${CI_STATE}"; rm -f "$LOCKFILE"; exit 0; }
CI_NOTE=" (not required — non-code PR)" CI_NOTE=" (not required — non-code PR)"
fi fi
@ -123,10 +151,10 @@ fi
ALL_COMMENTS=$(forge_api_all "/issues/${PR_NUMBER}/comments") ALL_COMMENTS=$(forge_api_all "/issues/${PR_NUMBER}/comments")
HAS_CMT=$(printf '%s' "$ALL_COMMENTS" | jq --arg s "$PR_SHA" \ HAS_CMT=$(printf '%s' "$ALL_COMMENTS" | jq --arg s "$PR_SHA" \
'[.[]|select(.body|contains("<!-- reviewed: "+$s+" -->"))]|length') '[.[]|select(.body|contains("<!-- reviewed: "+$s+" -->"))]|length')
[ "${HAS_CMT:-0}" -gt 0 ] && [ "$FORCE" != "--force" ] && { log "SKIP: reviewed ${PR_SHA:0:7}"; exit 0; } [ "${HAS_CMT:-0}" -gt 0 ] && [ "$FORCE" != "--force" ] && { log "SKIP: reviewed ${PR_SHA:0:7}"; rm -f "$LOCKFILE"; exit 0; }
HAS_FML=$(forge_api_all "/pulls/${PR_NUMBER}/reviews" | jq --arg s "$PR_SHA" \ HAS_FML=$(forge_api_all "/pulls/${PR_NUMBER}/reviews" | jq --arg s "$PR_SHA" \
'[.[]|select(.commit_id==$s)|select(.state!="COMMENT")]|length') '[.[]|select(.commit_id==$s)|select(.state!="COMMENT")]|length')
[ "${HAS_FML:-0}" -gt 0 ] && [ "$FORCE" != "--force" ] && { log "SKIP: formal review"; exit 0; } [ "${HAS_FML:-0}" -gt 0 ] && [ "$FORCE" != "--force" ] && { log "SKIP: formal review"; rm -f "$LOCKFILE"; exit 0; }
# ============================================================================= # =============================================================================
# RE-REVIEW DETECTION # RE-REVIEW DETECTION
@ -324,3 +352,7 @@ esac
profile_write_journal "review-${PR_NUMBER}" "Review PR #${PR_NUMBER} (${VERDICT})" "${VERDICT,,}" "" || true profile_write_journal "review-${PR_NUMBER}" "Review PR #${PR_NUMBER} (${VERDICT})" "${VERDICT,,}" "" || true
log "DONE: ${VERDICT} (re-review: ${IS_RE_REVIEW})" log "DONE: ${VERDICT} (re-review: ${IS_RE_REVIEW})"
# Remove lockfile on successful completion (cleanup_on_exit will also do this,
# but we do it here to avoid the trap running twice)
rm -f "$LOCKFILE"

View file

@ -209,3 +209,72 @@ jq -nc \
log "Engagement report written to ${OUTPUT}: ${UNIQUE_VISITORS} visitors, ${PAGE_VIEWS} page views" log "Engagement report written to ${OUTPUT}: ${UNIQUE_VISITORS} visitors, ${PAGE_VIEWS} page views"
echo "Engagement report: ${UNIQUE_VISITORS} unique visitors, ${PAGE_VIEWS} page views → ${OUTPUT}" echo "Engagement report: ${UNIQUE_VISITORS} unique visitors, ${PAGE_VIEWS} page views → ${OUTPUT}"
# ── Commit evidence to ops repo via Forgejo API ─────────────────────────────
commit_evidence_via_forgejo() {
local evidence_file="$1"
local report_date
report_date=$(basename "$evidence_file" .json)
local file_path="evidence/engagement/${report_date}.json"
# Check if ops repo is available
if [ -z "${OPS_REPO_ROOT:-}" ] || [ ! -d "${OPS_REPO_ROOT}/.git" ]; then
log "SKIP: OPS_REPO_ROOT not set or not a git repo — evidence file not committed"
return 0
fi
# Check if Forgejo credentials are available
if [ -z "${FORGE_TOKEN:-}" ] || [ -z "${FORGE_URL:-}" ] || [ -z "${FORGE_OPS_REPO:-}" ]; then
log "SKIP: Forgejo credentials not available (FORGE_TOKEN/FORGE_URL/FORGE_OPS_REPO) — evidence file not committed"
return 0
fi
# Read and encode the file content
local content
content=$(base64 < "$evidence_file")
local ops_owner="${OPS_FORGE_OWNER:-${FORGE_REPO%%/*}}"
local ops_repo="${OPS_FORGE_REPO:-${PROJECT_NAME:-disinto}-ops}"
# Check if file already exists in the ops repo
local existing
existing=$(curl -sf \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_URL}/api/v1/repos/${ops_owner}/${ops_repo}/contents/${file_path}" \
2>/dev/null || echo "")
if [ -n "$existing" ] && printf '%s' "$existing" | jq -e '.sha' >/dev/null 2>&1; then
# Update existing file
local sha
sha=$(printf '%s' "$existing" | jq -r '.sha')
if curl -sf -X PUT \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/repos/${ops_owner}/${ops_repo}/contents/${file_path}" \
-d "$(jq -nc --arg content "$content" --arg sha "$sha" --arg msg "evidence: engagement ${report_date}" \
'{message: $msg, content: $content, sha: $sha}')" >/dev/null 2>&1; then
log "Updated evidence file in ops repo: ${file_path}"
return 0
else
log "ERROR: failed to update evidence file in ops repo"
return 1
fi
else
# Create new file
if curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/repos/${ops_owner}/${ops_repo}/contents/${file_path}" \
-d "$(jq -nc --arg content "$content" --arg msg "evidence: engagement ${report_date}" \
'{message: $msg, content: $content}')" >/dev/null 2>&1; then
log "Created evidence file in ops repo: ${file_path}"
return 0
else
log "ERROR: failed to create evidence file in ops repo"
return 1
fi
fi
}
# Attempt to commit evidence (non-fatal — data collection succeeded even if commit fails)
commit_evidence_via_forgejo "$OUTPUT" || log "WARNING: evidence commit skipped or failed — file exists locally at ${OUTPUT}"

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 --> <!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# Supervisor Agent # Supervisor Agent
**Role**: Health monitoring and auto-remediation, executed as a formula-driven **Role**: Health monitoring and auto-remediation, executed as a formula-driven
@ -24,10 +24,18 @@ Both invoke the same `supervisor-run.sh`. Sources `lib/guard.sh` and calls `chec
files for `PHASE:escalate` entries and auto-removes any whose linked issue files for `PHASE:escalate` entries and auto-removes any whose linked issue
is confirmed closed (24h grace period after closure to avoid races). Reports is confirmed closed (24h grace period after closure to avoid races). Reports
**stale crashed worktrees** (worktrees preserved after crash) — supervisor **stale crashed worktrees** (worktrees preserved after crash) — supervisor
housekeeping removes them after 24h housekeeping removes them after 24h. Collects **Woodpecker agent health**
(added #933): container `disinto-woodpecker-agent` health/running status,
gRPC error count in last 20 min, fast-failure pipeline count (<60s, last 15 min),
and overall health verdict (healthy/unhealthy). Unhealthy verdict triggers
automatic container restart + `blocked:ci_exhausted` issue recovery in
`supervisor-run.sh` before the Claude session starts.
- `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review, - `formulas/run-supervisor.toml` — Execution spec: five steps (preflight review,
health-assessment, decide-actions, report, journal) with `needs` dependencies. health-assessment, decide-actions, report, journal) with `needs` dependencies.
Claude evaluates all metrics and takes actions in a single interactive session Claude evaluates all metrics and takes actions in a single interactive session.
Health-assessment now includes P2 **Woodpecker agent unhealthy** classification
(container not running, ≥3 gRPC errors/20m, or ≥3 fast-failure pipelines/15m);
decide-actions documents the pre-session auto-recovery path
- `$OPS_REPO_ROOT/knowledge/*.md` — Domain-specific remediation guides (memory, - `$OPS_REPO_ROOT/knowledge/*.md` — Domain-specific remediation guides (memory,
disk, CI, git, dev-agent, review-agent, forge) disk, CI, git, dev-agent, review-agent, forge)
@ -47,5 +55,6 @@ P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).
- Logs a WARNING message at startup indicating degraded mode - Logs a WARNING message at startup indicating degraded mode
**Lifecycle**: supervisor-run.sh (invoked by polling loop every 20min, `check_active supervisor`) **Lifecycle**: supervisor-run.sh (invoked by polling loop every 20min, `check_active supervisor`)
→ lock + memory guard → run preflight.sh (collect metrics) → load formula + context → run → lock + memory guard → run preflight.sh (collect metrics) → **WP agent health recovery**
(if unhealthy: restart container + recover ci_exhausted issues) → load formula + context → run
claude -p via agent-sdk.sh → Claude assesses health, auto-fixes, writes journal → `PHASE:done`. claude -p via agent-sdk.sh → Claude assesses health, auto-fixes, writes journal → `PHASE:done`.

View file

@ -224,3 +224,108 @@ for _vf in "${_va_root}"/*.md; do
done done
[ "$_found_vault" = false ] && echo " None" [ "$_found_vault" = false ] && echo " None"
echo "" echo ""
# ── Woodpecker Agent Health ────────────────────────────────────────────────
echo "## Woodpecker Agent Health"
# Check WP agent container health status
_wp_container="disinto-woodpecker-agent"
_wp_health_status="unknown"
_wp_health_start=""
if command -v docker &>/dev/null; then
# Get health status via docker inspect
_wp_health_status=$(docker inspect "$_wp_container" --format '{{.State.Health.Status}}' 2>/dev/null || echo "not_found")
if [ "$_wp_health_status" = "not_found" ] || [ -z "$_wp_health_status" ]; then
# Container may not exist or not have health check configured
_wp_health_status=$(docker inspect "$_wp_container" --format '{{.State.Status}}' 2>/dev/null || echo "not_found")
fi
# Get container start time for age calculation
_wp_start_time=$(docker inspect "$_wp_container" --format '{{.State.StartedAt}}' 2>/dev/null || echo "")
if [ -n "$_wp_start_time" ] && [ "$_wp_start_time" != "0001-01-01T00:00:00Z" ]; then
_wp_health_start=$(date -d "$_wp_start_time" '+%Y-%m-%d %H:%M UTC' 2>/dev/null || echo "$_wp_start_time")
fi
fi
echo "Container: $_wp_container"
echo "Status: $_wp_health_status"
[ -n "$_wp_health_start" ] && echo "Started: $_wp_health_start"
# Check for gRPC errors in agent logs (last 20 minutes)
_wp_grpc_errors=0
if [ "$_wp_health_status" != "not_found" ] && [ -n "$_wp_health_status" ]; then
_wp_grpc_errors=$(docker logs --since 20m "$_wp_container" 2>&1 | grep -c 'grpc error' || echo "0")
echo "gRPC errors (last 20m): $_wp_grpc_errors"
fi
# Fast-failure heuristic: check for pipelines completing in <60s
_wp_fast_failures=0
_wp_recent_failures=""
if [ -n "${WOODPECKER_REPO_ID:-}" ] && [ "${WOODPECKER_REPO_ID}" != "0" ]; then
_now=$(date +%s)
_pipelines=$(woodpecker_api "/repos/${WOODPECKER_REPO_ID}/pipelines?perPage=100" 2>/dev/null || echo '[]')
# Count failures with duration < 60s in last 15 minutes
_wp_fast_failures=$(echo "$_pipelines" | jq --argjson now "$_now" '
[.[] | select(.status == "failure") | select((.finished - .started) < 60) | select(($now - .finished) < 900)]
| length' 2>/dev/null || echo "0")
if [ "$_wp_fast_failures" -gt 0 ]; then
_wp_recent_failures=$(echo "$_pipelines" | jq -r --argjson now "$_now" '
[.[] | select(.status == "failure") | select((.finished - .started) < 60) | select(($now - .finished) < 900)]
| .[] | "\(.number)\t\((.finished - .started))s"' 2>/dev/null || echo "")
fi
fi
echo "Fast-fail pipelines (<60s, last 15m): $_wp_fast_failures"
if [ -n "$_wp_recent_failures" ] && [ "$_wp_fast_failures" -gt 0 ]; then
echo "Recent failures:"
echo "$_wp_recent_failures" | while IFS=$'\t' read -r _num _dur; do
echo " #$_num: ${_dur}"
done
fi
# Determine overall WP agent health
_wp_agent_healthy=true
_wp_health_reason=""
if [ "$_wp_health_status" = "not_found" ]; then
_wp_agent_healthy=false
_wp_health_reason="Container not running"
elif [ "$_wp_health_status" = "unhealthy" ]; then
_wp_agent_healthy=false
_wp_health_reason="Container health check failed"
elif [ "$_wp_health_status" != "running" ]; then
_wp_agent_healthy=false
_wp_health_reason="Container not in running state: $_wp_health_status"
elif [ "$_wp_grpc_errors" -ge 3 ]; then
_wp_agent_healthy=false
_wp_health_reason="High gRPC error count (>=3 in 20m)"
elif [ "$_wp_fast_failures" -ge 3 ]; then
_wp_agent_healthy=false
_wp_health_reason="High fast-failure count (>=3 in 15m)"
fi
echo ""
echo "WP Agent Health: $([ "$_wp_agent_healthy" = true ] && echo "healthy" || echo "UNHEALTHY")"
[ -n "$_wp_health_reason" ] && echo "Reason: $_wp_health_reason"
echo ""
# ── WP Agent Health History (for idempotency) ──────────────────────────────
echo "## WP Agent Health History"
# Track last restart timestamp to avoid duplicate restarts in same run
_WP_HEALTH_HISTORY_FILE="${DISINTO_LOG_DIR}/supervisor/wp-agent-health.history"
_wp_last_restart="never"
_wp_last_restart_ts=0
if [ -f "$_WP_HEALTH_HISTORY_FILE" ]; then
_wp_last_restart_ts=$(grep -m1 '^LAST_RESTART_TS=' "$_WP_HEALTH_HISTORY_FILE" 2>/dev/null | cut -d= -f2 || echo "0")
if [ -n "$_wp_last_restart_ts" ] && [ "$_wp_last_restart_ts" -gt 0 ] 2>/dev/null; then
_wp_last_restart=$(date -d "@$_wp_last_restart_ts" '+%Y-%m-%d %H:%M UTC' 2>/dev/null || echo "$_wp_last_restart_ts")
fi
fi
echo "Last restart: $_wp_last_restart"
echo ""

View file

@ -47,6 +47,9 @@ SID_FILE="/tmp/supervisor-session-${PROJECT_NAME}.sid"
SCRATCH_FILE="/tmp/supervisor-${PROJECT_NAME}-scratch.md" SCRATCH_FILE="/tmp/supervisor-${PROJECT_NAME}-scratch.md"
WORKTREE="/tmp/${PROJECT_NAME}-supervisor-run" WORKTREE="/tmp/${PROJECT_NAME}-supervisor-run"
# WP agent container name (configurable via env var)
export WP_AGENT_CONTAINER_NAME="${WP_AGENT_CONTAINER_NAME:-disinto-woodpecker-agent}"
# Override LOG_AGENT for consistent agent identification # Override LOG_AGENT for consistent agent identification
# shellcheck disable=SC2034 # consumed by agent-sdk.sh and env.sh log() # shellcheck disable=SC2034 # consumed by agent-sdk.sh and env.sh log()
LOG_AGENT="supervisor" LOG_AGENT="supervisor"
@ -166,6 +169,160 @@ ${FORMULA_CONTENT}
${SCRATCH_INSTRUCTION} ${SCRATCH_INSTRUCTION}
${PROMPT_FOOTER}" ${PROMPT_FOOTER}"
# ── WP Agent Health Recovery ──────────────────────────────────────────────
# Check preflight output for WP agent health issues and trigger recovery if needed
_WP_HEALTH_CHECK_FILE="${DISINTO_LOG_DIR}/supervisor/wp-agent-health-check.md"
echo "$PREFLIGHT_OUTPUT" > "$_WP_HEALTH_CHECK_FILE"
# Extract WP agent health status from preflight output
# Note: match exact "healthy" not "UNHEALTHY" (substring issue)
_wp_agent_healthy=$(grep "^WP Agent Health: healthy$" "$_WP_HEALTH_CHECK_FILE" 2>/dev/null && echo "true" || echo "false")
_wp_health_reason=$(grep "^Reason:" "$_WP_HEALTH_CHECK_FILE" 2>/dev/null | sed 's/^Reason: //' || echo "")
if [ "$_wp_agent_healthy" = "false" ] && [ -n "$_wp_health_reason" ]; then
log "WP agent detected as UNHEALTHY: $_wp_health_reason"
# Check for idempotency guard - have we already restarted in this run?
_WP_HEALTH_HISTORY_FILE="${DISINTO_LOG_DIR}/supervisor/wp-agent-health.history"
_wp_last_restart_ts=0
_wp_last_restart="never"
if [ -f "$_WP_HEALTH_HISTORY_FILE" ]; then
_wp_last_restart_ts=$(grep -m1 '^LAST_RESTART_TS=' "$_WP_HEALTH_HISTORY_FILE" 2>/dev/null | cut -d= -f2 || echo "0")
if [ -n "$_wp_last_restart_ts" ] && [ "$_wp_last_restart_ts" != "0" ] 2>/dev/null; then
_wp_last_restart=$(date -d "@$_wp_last_restart_ts" '+%Y-%m-%d %H:%M UTC' 2>/dev/null || echo "$_wp_last_restart_ts")
fi
fi
_current_ts=$(date +%s)
_restart_threshold=300 # 5 minutes between restarts
if [ -z "$_wp_last_restart_ts" ] || [ "$_wp_last_restart_ts" = "0" ] || [ $((_current_ts - _wp_last_restart_ts)) -gt $_restart_threshold ]; then
log "Triggering WP agent restart..."
# Restart the WP agent container
if docker restart "$WP_AGENT_CONTAINER_NAME" >/dev/null 2>&1; then
_restart_time=$(date -u '+%Y-%m-%d %H:%M UTC')
log "Successfully restarted WP agent container: $WP_AGENT_CONTAINER_NAME"
# Update history file
echo "LAST_RESTART_TS=$_current_ts" > "$_WP_HEALTH_HISTORY_FILE"
echo "LAST_RESTART_TIME=$_restart_time" >> "$_WP_HEALTH_HISTORY_FILE"
# Post recovery notice to journal
_journal_file="${OPS_JOURNAL_ROOT}/$(date -u +%Y-%m-%d).md"
if [ -f "$_journal_file" ]; then
{
echo ""
echo "### WP Agent Recovery - $_restart_time"
echo ""
echo "WP agent was unhealthy: $_wp_health_reason"
echo "Container restarted automatically."
} >> "$_journal_file"
fi
# Scan for issues updated in the last 30 minutes with blocked: ci_exhausted label
log "Scanning for ci_exhausted issues updated in last 30 minutes..."
_now_epoch=$(date +%s)
_thirty_min_ago=$(( _now_epoch - 1800 ))
# Fetch open issues with blocked label
_blocked_issues=$(forge_api GET "/issues?state=open&labels=blocked&type=issues&limit=100" 2>/dev/null || echo "[]")
_blocked_count=$(echo "$_blocked_issues" | jq 'length' 2>/dev/null || echo "0")
_issues_processed=0
_issues_recovered=0
if [ "$_blocked_count" -gt 0 ]; then
# Process each blocked issue
echo "$_blocked_issues" | jq -c '.[]' 2>/dev/null | while IFS= read -r issue_json; do
[ -z "$issue_json" ] && continue
_issue_num=$(echo "$issue_json" | jq -r '.number // empty')
_issue_updated=$(echo "$issue_json" | jq -r '.updated_at // empty')
_issue_labels=$(echo "$issue_json" | jq -r '.labels | map(.name) | join(",")' 2>/dev/null || echo "")
# Check if issue has ci_exhausted label
if ! echo "$_issue_labels" | grep -q "ci_exhausted"; then
continue
fi
# Parse updated_at timestamp
_issue_updated_epoch=$(date -d "$_issue_updated" +%s 2>/dev/null || echo "0")
_time_since_update=$(( _now_epoch - _issue_updated_epoch ))
# Check if updated in last 30 minutes
if [ "$_time_since_update" -lt 1800 ] && [ "$_time_since_update" -ge 0 ]; then
_issues_processed=$(( _issues_processed + 1 ))
# Check for idempotency guard - already swept by supervisor?
_issue_body=$(echo "$issue_json" | jq -r '.body // ""' 2>/dev/null || echo "")
if echo "$_issue_body" | grep -q "<!-- supervisor-swept -->"; then
log "Issue #$_issue_num already swept by supervisor, skipping"
continue
fi
log "Processing ci_exhausted issue #$_issue_num (updated $_time_since_update seconds ago)"
# Get issue assignee
_issue_assignee=$(echo "$issue_json" | jq -r '.assignee.login // empty' 2>/dev/null || echo "")
# Unassign the issue
if [ -n "$_issue_assignee" ]; then
log "Unassigning issue #$_issue_num from $_issue_assignee"
curl -sf -X PATCH \
-H "Authorization: token ${FORGE_SUPERVISOR_TOKEN:-$FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/$_issue_num" \
-d '{"assignees":[]}' >/dev/null 2>&1 || true
fi
# Remove blocked label
_blocked_label_id=$(forge_api GET "/labels" 2>/dev/null | jq -r '.[] | select(.name == "blocked") | .id' 2>/dev/null || echo "")
if [ -n "$_blocked_label_id" ]; then
log "Removing blocked label from issue #$_issue_num"
curl -sf -X DELETE \
-H "Authorization: token ${FORGE_SUPERVISOR_TOKEN:-$FORGE_TOKEN}" \
"${FORGE_API}/issues/$_issue_num/labels/$_blocked_label_id" >/dev/null 2>&1 || true
fi
# Add comment about infra-flake recovery
_recovery_comment=$(cat <<EOF
<!-- supervisor-swept -->
**Automated Recovery — $(date -u '+%Y-%m-%d %H:%M UTC')**
CI agent was unhealthy between $_restart_time and now. The prior retry budget may have been spent on infra flake, not real failures.
**Recovery Actions:**
- Unassigned from pool and returned for fresh attempt
- CI agent container restarted
- Related pipelines will be retriggered automatically
**Next Steps:**
Please re-attempt this issue. The CI environment has been refreshed.
EOF
)
curl -sf -X POST \
-H "Authorization: token ${FORGE_SUPERVISOR_TOKEN:-$FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/$_issue_num/comments" \
-d "$(jq -n --arg body "$_recovery_comment" '{body: $body}')" >/dev/null 2>&1 || true
log "Recovered issue #$_issue_num - returned to pool"
fi
done
fi
log "WP agent restart and issue recovery complete"
else
log "ERROR: Failed to restart WP agent container"
fi
else
log "WP agent restart already performed in this run (since $_wp_last_restart), skipping"
fi
fi
# ── Run agent ───────────────────────────────────────────────────────────── # ── Run agent ─────────────────────────────────────────────────────────────
agent_run --worktree "$WORKTREE" "$PROMPT" agent_run --worktree "$WORKTREE" "$PROMPT"
log "agent_run complete" log "agent_run complete"

View file

@ -155,6 +155,44 @@ setup_file() {
[[ "$output" == *"[deploy] dry-run complete"* ]] [[ "$output" == *"[deploy] dry-run complete"* ]]
} }
# S2.6 / #928 — every --with <svc> that ships tools/vault-seed-<svc>.sh
# must auto-invoke the seeder before deploy.sh runs. forgejo is the
# only service with a seeder today, so the dry-run plan must include
# its seed line when --with forgejo is set. The seed block must also
# appear BEFORE the deploy block (seeded secrets must exist before
# nomad reads the template stanza) — pinned here by scanning output
# order. Services without a seeder (e.g. unknown hypothetical future
# ones) are silently skipped by the loop convention.
@test "disinto init --backend=nomad --with forgejo --dry-run prints seed plan before deploy plan" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with forgejo --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"Vault seed dry-run"* ]]
[[ "$output" == *"tools/vault-seed-forgejo.sh --dry-run"* ]]
# Order: seed header must appear before deploy header.
local seed_line deploy_line
seed_line=$(echo "$output" | grep -n "Vault seed dry-run" | head -1 | cut -d: -f1)
deploy_line=$(echo "$output" | grep -n "Deploy services dry-run" | head -1 | cut -d: -f1)
[ -n "$seed_line" ]
[ -n "$deploy_line" ]
[ "$seed_line" -lt "$deploy_line" ]
}
# Regression guard (PR #929 review): `sudo -n VAR=val -- cmd` is subject
# to sudoers env_reset policy and silently drops VAULT_ADDR unless it's
# in env_keep (it isn't in default configs). vault-seed-forgejo.sh
# requires VAULT_ADDR and dies at its own precondition check if unset,
# so the non-root branch MUST invoke `sudo -n -- env VAR=val cmd` so
# that `env` sets the variable in the child process regardless of
# sudoers policy. This grep-level guard catches a revert to the unsafe
# form that silently broke non-root seed runs on a fresh LXC.
@test "seed loop invokes sudo via 'env VAR=val' (bypasses sudoers env_reset)" {
run grep -F 'sudo -n -- env "VAULT_ADDR=' "$DISINTO_BIN"
[ "$status" -eq 0 ]
# Negative: no bare `sudo -n "VAR=val" --` form anywhere in the file.
run grep -F 'sudo -n "VAULT_ADDR=' "$DISINTO_BIN"
[ "$status" -ne 0 ]
}
@test "disinto init --backend=nomad --with forgejo,forgejo --dry-run handles comma-separated services" { @test "disinto init --backend=nomad --with forgejo,forgejo --dry-run handles comma-separated services" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with forgejo,forgejo --dry-run run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with forgejo,forgejo --dry-run
[ "$status" -eq 0 ] [ "$status" -eq 0 ]
@ -177,7 +215,44 @@ setup_file() {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with unknown-service --dry-run run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with unknown-service --dry-run
[ "$status" -ne 0 ] [ "$status" -ne 0 ]
[[ "$output" == *"unknown service"* ]] [[ "$output" == *"unknown service"* ]]
[[ "$output" == *"known: forgejo"* ]] [[ "$output" == *"known: forgejo, woodpecker-server, woodpecker-agent, agents, staging, chat, edge"* ]]
}
# S3.4: woodpecker auto-expansion and forgejo auto-inclusion
@test "disinto init --backend=nomad --with woodpecker auto-expands to server+agent" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with woodpecker --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"services to deploy: forgejo,woodpecker-server,woodpecker-agent"* ]]
[[ "$output" == *"deployment order: forgejo woodpecker-server woodpecker-agent"* ]]
}
@test "disinto init --backend=nomad --with woodpecker auto-includes forgejo with note" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with woodpecker --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"Note: --with woodpecker implies --with forgejo"* ]]
}
@test "disinto init --backend=nomad --with forgejo,woodpecker expands woodpecker" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with forgejo,woodpecker --dry-run
[ "$status" -eq 0 ]
# Order follows input: forgejo first, then woodpecker expanded
[[ "$output" == *"services to deploy: forgejo,woodpecker-server,woodpecker-agent"* ]]
[[ "$output" == *"deployment order: forgejo woodpecker-server woodpecker-agent"* ]]
}
@test "disinto init --backend=nomad --with woodpecker seeds both forgejo and woodpecker" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with woodpecker --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"tools/vault-seed-forgejo.sh --dry-run"* ]]
[[ "$output" == *"tools/vault-seed-woodpecker.sh --dry-run"* ]]
}
@test "disinto init --backend=nomad --with forgejo,woodpecker deploys all three services" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with forgejo,woodpecker --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"forgejo.hcl"* ]]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"woodpecker-server.hcl"* ]]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"woodpecker-agent.hcl"* ]]
} }
@test "disinto init --backend=nomad --with forgejo (flag=value syntax) works" { @test "disinto init --backend=nomad --with forgejo (flag=value syntax) works" {
@ -191,3 +266,179 @@ setup_file() {
[ "$status" -ne 0 ] [ "$status" -ne 0 ]
[[ "$output" == *"--empty and --with are mutually exclusive"* ]] [[ "$output" == *"--empty and --with are mutually exclusive"* ]]
} }
# ── --import-env / --import-sops / --age-key (S2.5, #883) ────────────────────
#
# Step 2.5 wires Vault policies + JWT auth + optional KV import into
# `disinto init --backend=nomad`. The tests below exercise the flag
# grammar (who-requires-whom + who-requires-backend=nomad) and the
# dry-run plan shape (each --import-* flag prints its own path line,
# independently). A prior attempt at this issue regressed the "print
# every set flag" invariant by using if/elif — covered by the
# "--import-env --import-sops --age-key" case.
@test "disinto init --backend=nomad --import-env only is accepted" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --import-env /tmp/.env --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"--import-env"* ]]
[[ "$output" == *"env file: /tmp/.env"* ]]
}
@test "disinto init --backend=nomad --import-sops without --age-key errors" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --import-sops /tmp/.env.vault.enc --dry-run
[ "$status" -ne 0 ]
[[ "$output" == *"--import-sops requires --age-key"* ]]
}
@test "disinto init --backend=nomad --age-key without --import-sops errors" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --age-key /tmp/keys.txt --dry-run
[ "$status" -ne 0 ]
[[ "$output" == *"--age-key requires --import-sops"* ]]
}
@test "disinto init --backend=docker --import-env errors with backend requirement" {
run "$DISINTO_BIN" init placeholder/repo --backend=docker --import-env /tmp/.env
[ "$status" -ne 0 ]
[[ "$output" == *"--import-env, --import-sops, and --age-key require --backend=nomad"* ]]
}
@test "disinto init --backend=nomad --import-sops --age-key --dry-run shows import plan" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --import-sops /tmp/.env.vault.enc --age-key /tmp/keys.txt --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"Vault import dry-run"* ]]
[[ "$output" == *"--import-sops"* ]]
[[ "$output" == *"--age-key"* ]]
[[ "$output" == *"sops file: /tmp/.env.vault.enc"* ]]
[[ "$output" == *"age key: /tmp/keys.txt"* ]]
}
# When all three flags are set, each one must print its own path line —
# if/elif regressed this to "only one printed" in a prior attempt (#883).
@test "disinto init --backend=nomad --import-env --import-sops --age-key --dry-run shows full import plan" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --import-env /tmp/.env --import-sops /tmp/.env.vault.enc --age-key /tmp/keys.txt --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"Vault import dry-run"* ]]
[[ "$output" == *"env file: /tmp/.env"* ]]
[[ "$output" == *"sops file: /tmp/.env.vault.enc"* ]]
[[ "$output" == *"age key: /tmp/keys.txt"* ]]
}
@test "disinto init --backend=nomad without import flags shows skip message" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"no --import-env/--import-sops"* ]]
[[ "$output" == *"skipping"* ]]
}
@test "disinto init --backend=nomad --import-env --import-sops --age-key --with forgejo --dry-run shows all plans" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --import-env /tmp/.env --import-sops /tmp/.env.vault.enc --age-key /tmp/keys.txt --with forgejo --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"Vault import dry-run"* ]]
[[ "$output" == *"Vault policies dry-run"* ]]
[[ "$output" == *"Vault auth dry-run"* ]]
[[ "$output" == *"Deploy services dry-run"* ]]
}
@test "disinto init --backend=nomad --dry-run prints policies + auth plan even without --import-*" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --dry-run
[ "$status" -eq 0 ]
# Policies + auth run on every nomad path (idempotent), so the dry-run
# plan always lists them — regardless of whether --import-* is set.
[[ "$output" == *"Vault policies dry-run"* ]]
[[ "$output" == *"Vault auth dry-run"* ]]
[[ "$output" != *"Vault import dry-run"* ]]
}
# --import-env=PATH (=-form) must work alongside --import-env PATH.
@test "disinto init --backend=nomad --import-env=PATH (equals form) works" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --import-env=/tmp/.env --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"env file: /tmp/.env"* ]]
}
# --empty short-circuits after cluster-up: no policies, no auth, no
# import, no deploy. The dry-run plan must match that — cluster-up plan
# appears, but none of the S2.x section banners do.
@test "disinto init --backend=nomad --empty --dry-run skips policies/auth/import sections" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --empty --dry-run
[ "$status" -eq 0 ]
# Cluster-up still runs (it's what --empty brings up).
[[ "$output" == *"Cluster-up dry-run"* ]]
# Policies + auth + import must NOT appear under --empty.
[[ "$output" != *"Vault policies dry-run"* ]]
[[ "$output" != *"Vault auth dry-run"* ]]
[[ "$output" != *"Vault import dry-run"* ]]
[[ "$output" != *"no --import-env/--import-sops"* ]]
}
# --empty + any --import-* flag silently does nothing (import is skipped),
# so the CLI rejects the combination up front rather than letting it
# look like the import "succeeded".
@test "disinto init --backend=nomad --empty --import-env errors" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --empty --import-env /tmp/.env --dry-run
[ "$status" -ne 0 ]
[[ "$output" == *"--empty and --import-env/--import-sops/--age-key are mutually exclusive"* ]]
}
@test "disinto init --backend=nomad --empty --import-sops --age-key errors" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --empty --import-sops /tmp/.env.vault.enc --age-key /tmp/keys.txt --dry-run
[ "$status" -ne 0 ]
[[ "$output" == *"--empty and --import-env/--import-sops/--age-key are mutually exclusive"* ]]
}
# S4.2: agents service auto-expansion and dependencies
@test "disinto init --backend=nomad --with agents auto-includes forgejo and woodpecker" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with agents --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"services to deploy: forgejo,agents,woodpecker-server,woodpecker-agent"* ]]
[[ "$output" == *"Note: --with agents implies --with forgejo"* ]]
[[ "$output" == *"Note: --with agents implies --with woodpecker"* ]]
}
@test "disinto init --backend=nomad --with agents deploys in correct order" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with agents --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"deployment order: forgejo woodpecker-server woodpecker-agent agents"* ]]
}
@test "disinto init --backend=nomad --with agents seeds agents service" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with agents --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"tools/vault-seed-forgejo.sh --dry-run"* ]]
[[ "$output" == *"tools/vault-seed-woodpecker.sh --dry-run"* ]]
[[ "$output" == *"tools/vault-seed-agents.sh --dry-run"* ]]
}
@test "disinto init --backend=nomad --with agents deploys all four services" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with agents --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"forgejo.hcl"* ]]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"woodpecker-server.hcl"* ]]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"woodpecker-agent.hcl"* ]]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"agents.hcl"* ]]
}
@test "disinto init --backend=nomad --with woodpecker,agents expands correctly" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with woodpecker,agents --dry-run
[ "$status" -eq 0 ]
# woodpecker expands to server+agent, agents is already explicit
# forgejo is auto-included by agents
[[ "$output" == *"services to deploy: forgejo,woodpecker-server,woodpecker-agent,agents"* ]]
[[ "$output" == *"deployment order: forgejo woodpecker-server woodpecker-agent agents"* ]]
}
# S5.1 / #1035 — edge service seeds ops-repo (dispatcher FORGE_TOKEN)
@test "disinto init --backend=nomad --with edge deploys edge" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with edge --dry-run
[ "$status" -eq 0 ]
# edge depends on all backend services, so all are included
[[ "$output" == *"services to deploy: edge,forgejo"* ]]
[[ "$output" == *"deployment order: forgejo woodpecker-server woodpecker-agent agents staging chat edge"* ]]
[[ "$output" == *"[deploy] [dry-run] nomad job validate"*"edge.hcl"* ]]
}
@test "disinto init --backend=nomad --with edge seeds ops-repo" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --with edge --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"tools/vault-seed-ops-repo.sh --dry-run"* ]]
}

20
tests/fixtures/.env.vault.enc vendored Normal file
View file

@ -0,0 +1,20 @@
{
"data": "ENC[AES256_GCM,data:SsLdIiZDVkkV1bbKeHQ8A1K/4vgXQFJF8y4J87GGwsGa13lNnPoqRaCmPAtuQr3hR5JNqARUhFp8aEusyzwi/lZLU2Reo32YjE26ObVOHf47EGmmHM/tEgh6u0fa1AmFtuqJVQzhG2eZhJmZJFgdRH36+bhdBwI1mkORmsRNtBPHHjtQJDbsgN47maDhuP4B7WvB4/TdnJ++GNMlMbyrbr0pEf2uqqOVO55cJ3I4v/Jcg8tq0clPuW1k5dNFsmFSMbbjE5N25EGrc7oEH5GVZ6I6L6p0Fzyj/MV4hKacboFHiZmBZgRQ,iv:UnXTa800G3PW4IaErkPBIZKjPHAU3LmiCvAqDdhFE/Q=,tag:kdWpHQ8fEPGFlmfVoTMskA==,type:str]",
"sops": {
"kms": null,
"gcp_kms": null,
"azure_kv": null,
"hc_vault": null,
"age": [
{
"recipient": "age1ztkm8yvdk42m2cn4dj2v9ptfknq8wpgr3ry9dpmtmlaeas6p7yyqft0ldg",
"enc": "-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBrVUlmaEdTNU1iMGg4dFA4\nNFNOSzlBc1NER1U3SHlwVFU1dm5tR1kyeldzCjZ2NXI3MjR4Zkd1RVBKNzJoQ1Jm\nQWpEZU5VMkNuYnhTTVJNc0RpTXlIZE0KLS0tIDFpQ2tlN0MzL1NuS2hKZU5JTG9B\nNWxXMzE0bGZpQkVBTnhWRXZBQlhrc1EKG76DM98cCuqIwUkbfJWHhJdYV77O9r8Q\nRJrq6jH59Gcp9W8iHg/aeShPHZFEOLg1q9azV9Wt9FjJn3SxyTmgvA==\n-----END AGE ENCRYPTED FILE-----\n"
}
],
"lastmodified": "2026-04-16T15:43:34Z",
"mac": "ENC[AES256_GCM,data:jVRr2TxSZH2paD2doIX4JwCqo5wiPYfTowpj189w1IVlS0EY/XQoqxiWbunX/LmIDdQlTPCSe/vTp1EJA0cx6vzN2xENrwsfzCP6dwDGaRlZhH3V0CVhtfHIkMTEKWrAUx5hFtiwJPkLYUUYi5aRWRxhZQM1eBeRvuGKdlwvmHA=,iv:H57a61AfVNLrlg+4aMl9mwXI5O38O5ZoRhpxe2PTTkY=,tag:2jwH1855VNYlKseTE/XtTg==,type:str]",
"pgp": null,
"unencrypted_suffix": "_unencrypted",
"version": "3.9.4"
}
}

5
tests/fixtures/age-keys.txt vendored Normal file
View file

@ -0,0 +1,5 @@
# Test age key for sops
# Generated: 2026-04-16
# Public key: age1ztkm8yvdk42m2cn4dj2v9ptfknq8wpgr3ry9dpmtmlaeas6p7yyqft0ldg
AGE-SECRET-KEY-1PCQQX37MTZDGES76H9TGQN5XTG2ZZX2UUR87KR784NZ4MQ3NJ56S0Z23SF

40
tests/fixtures/dot-env-complete vendored Normal file
View file

@ -0,0 +1,40 @@
# Test fixture .env file for vault-import.sh
# This file contains all expected keys for the import test
# Generic forge creds
FORGE_TOKEN=generic-forge-token
FORGE_PASS=generic-forge-pass
FORGE_ADMIN_TOKEN=generic-admin-token
# Bot tokens (review, dev, gardener, architect, planner, predictor, supervisor, vault)
FORGE_REVIEW_TOKEN=review-token
FORGE_REVIEW_PASS=review-pass
FORGE_DEV_TOKEN=dev-token
FORGE_DEV_PASS=dev-pass
FORGE_GARDENER_TOKEN=gardener-token
FORGE_GARDENER_PASS=gardener-pass
FORGE_ARCHITECT_TOKEN=architect-token
FORGE_ARCHITECT_PASS=architect-pass
FORGE_PLANNER_TOKEN=planner-token
FORGE_PLANNER_PASS=planner-pass
FORGE_PREDICTOR_TOKEN=predictor-token
FORGE_PREDICTOR_PASS=predictor-pass
FORGE_SUPERVISOR_TOKEN=supervisor-token
FORGE_SUPERVISOR_PASS=supervisor-pass
FORGE_VAULT_TOKEN=vault-token
FORGE_VAULT_PASS=vault-pass
# Llama bot
FORGE_TOKEN_LLAMA=llama-token
FORGE_PASS_LLAMA=llama-pass
# Woodpecker secrets
WOODPECKER_AGENT_SECRET=wp-agent-secret
WP_FORGEJO_CLIENT=wp-forgejo-client
WP_FORGEJO_SECRET=wp-forgejo-secret
WOODPECKER_TOKEN=wp-token
# Chat secrets
FORWARD_AUTH_SECRET=forward-auth-secret
CHAT_OAUTH_CLIENT_ID=chat-client-id
CHAT_OAUTH_CLIENT_SECRET=chat-client-secret

27
tests/fixtures/dot-env-incomplete vendored Normal file
View file

@ -0,0 +1,27 @@
# Test fixture .env file with missing required keys
# This file is intentionally missing some keys to test error handling
# Generic forge creds - missing FORGE_ADMIN_TOKEN
FORGE_TOKEN=generic-forge-token
FORGE_PASS=generic-forge-pass
# Bot tokens - missing several roles
FORGE_REVIEW_TOKEN=review-token
FORGE_REVIEW_PASS=review-pass
FORGE_DEV_TOKEN=dev-token
FORGE_DEV_PASS=dev-pass
# Llama bot - missing (only token, no pass)
FORGE_TOKEN_LLAMA=llama-token
# FORGE_PASS_LLAMA=llama-pass
# Woodpecker secrets - missing some
WOODPECKER_AGENT_SECRET=wp-agent-secret
# WP_FORGEJO_CLIENT=wp-forgejo-client
# WP_FORGEJO_SECRET=wp-forgejo-secret
# WOODPECKER_TOKEN=wp-token
# Chat secrets - missing some
FORWARD_AUTH_SECRET=forward-auth-secret
# CHAT_OAUTH_CLIENT_ID=chat-client-id
# CHAT_OAUTH_CLIENT_SECRET=chat-client-secret

6
tests/fixtures/dot-env.vault.plain vendored Normal file
View file

@ -0,0 +1,6 @@
GITHUB_TOKEN=github-test-token-abc123
CODEBERG_TOKEN=codeberg-test-token-def456
CLAWHUB_TOKEN=clawhub-test-token-ghi789
DEPLOY_KEY=deploy-key-test-jkl012
NPM_TOKEN=npm-test-token-mno345
DOCKER_HUB_TOKEN=dockerhub-test-token-pqr678

View file

@ -126,7 +126,7 @@ setup() {
@test "hvault_policy_apply creates a policy" { @test "hvault_policy_apply creates a policy" {
local pfile="${BATS_TEST_TMPDIR}/test-policy.hcl" local pfile="${BATS_TEST_TMPDIR}/test-policy.hcl"
cat > "$pfile" <<'HCL' cat > "$pfile" <<'HCL'
path "secret/data/test/*" { path "kv/data/test/*" {
capabilities = ["read"] capabilities = ["read"]
} }
HCL HCL
@ -138,12 +138,12 @@ HCL
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \ run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/sys/policies/acl/test-reader" "${VAULT_ADDR}/v1/sys/policies/acl/test-reader"
[ "$status" -eq 0 ] [ "$status" -eq 0 ]
echo "$output" | jq -e '.data.policy' | grep -q "secret/data/test" echo "$output" | jq -e '.data.policy' | grep -q "kv/data/test"
} }
@test "hvault_policy_apply is idempotent" { @test "hvault_policy_apply is idempotent" {
local pfile="${BATS_TEST_TMPDIR}/idem-policy.hcl" local pfile="${BATS_TEST_TMPDIR}/idem-policy.hcl"
printf 'path "secret/*" { capabilities = ["list"] }\n' > "$pfile" printf 'path "kv/*" { capabilities = ["list"] }\n' > "$pfile"
run hvault_policy_apply "idem-policy" "$pfile" run hvault_policy_apply "idem-policy" "$pfile"
[ "$status" -eq 0 ] [ "$status" -eq 0 ]

310
tests/smoke-edge-subpath.sh Executable file
View file

@ -0,0 +1,310 @@
#!/usr/bin/env bash
# =============================================================================
# smoke-edge-subpath.sh — End-to-end subpath routing smoke test
#
# Verifies Forgejo, Woodpecker, and chat function correctly under subpaths:
# - Forgejo at /forge/
# - Woodpecker at /ci/
# - Chat at /chat/
# - Staging at /staging/
#
# Usage:
# smoke-edge-subpath.sh [--base-url BASE_URL]
#
# Environment variables:
# BASE_URL — Edge proxy URL (default: http://localhost)
# EDGE_TIMEOUT — Request timeout in seconds (default: 30)
# EDGE_MAX_RETRIES — Max retries per request (default: 3)
#
# Exit codes:
# 0 — All checks passed
# 1 — One or more checks failed
# =============================================================================
set -euo pipefail
# Script directory for relative paths
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# Source common helpers if available
source "${SCRIPT_DIR}/../lib/env.sh" 2>/dev/null || true
# ─────────────────────────────────────────────────────────────────────────────
# Configuration
# ─────────────────────────────────────────────────────────────────────────────
BASE_URL="${BASE_URL:-http://localhost}"
EDGE_TIMEOUT="${EDGE_TIMEOUT:-30}"
EDGE_MAX_RETRIES="${EDGE_MAX_RETRIES:-3}"
# Subpaths to test
FORGE_PATH="/forge/"
CI_PATH="/ci/"
CHAT_PATH="/chat/"
STAGING_PATH="/staging/"
# Track overall test status
FAILED=0
PASSED=0
SKIPPED=0
# ─────────────────────────────────────────────────────────────────────────────
# Logging helpers
# ─────────────────────────────────────────────────────────────────────────────
log_info() {
echo "[INFO] $*"
}
log_pass() {
echo "[PASS] $*"
((PASSED++)) || true
}
log_fail() {
echo "[FAIL] $*"
((FAILED++)) || true
}
log_skip() {
echo "[SKIP] $*"
((SKIPPED++)) || true
}
log_section() {
echo ""
echo "=== $* ==="
echo ""
}
# ─────────────────────────────────────────────────────────────────────────────
# HTTP helpers
# ─────────────────────────────────────────────────────────────────────────────
# Make an HTTP request with retry logic
# Usage: http_request <method> <url> [options...]
# Returns: HTTP status code on stdout
http_request() {
local method="$1"
local url="$2"
shift 2
local retries=0
local response status
while [ "$retries" -lt "$EDGE_MAX_RETRIES" ]; do
response=$(curl -sS -w '\n%{http_code}' -X "$method" \
--max-time "$EDGE_TIMEOUT" \
-o /tmp/edge-response-$$ \
"$@" 2>&1) || {
retries=$((retries + 1))
log_info "Retry $retries/$EDGE_MAX_RETRIES for $url"
sleep 1
continue
}
status=$(echo "$response" | tail -n1)
echo "$status"
return 0
done
log_fail "Max retries exceeded for $url"
return 1
}
# Make a GET request and return status code
http_get() {
local url="$1"
shift || true
http_request "GET" "$url" "$@"
}
# Make a HEAD request (no body)
http_head() {
local url="$1"
shift || true
http_request "HEAD" "$url" "$@"
}
# Make a GET request and return the response body
http_get_body() {
local url="$1"
shift || true
curl -sS --max-time "$EDGE_TIMEOUT" "$@" "$url"
}
# ─────────────────────────────────────────────────────────────────────────────
# Test functions
# ─────────────────────────────────────────────────────────────────────────────
test_root_redirect() {
log_section "Test 1: Root redirect to /forge/"
local status
status=$(http_head "$BASE_URL/")
if [ "$status" = "302" ]; then
log_pass "Root / redirects with 302"
else
log_fail "Expected 302 redirect from /, got status $status"
fi
}
test_forgejo_subpath() {
log_section "Test 2: Forgejo at /forge/"
local status
status=$(http_head "$BASE_URL${FORGE_PATH}")
if [ "$status" -ge 200 ] && [ "$status" -lt 400 ]; then
log_pass "Forgejo at ${BASE_URL}${FORGE_PATH} returns status $status"
else
log_fail "Forgejo at ${BASE_URL}${FORGE_PATH} returned unexpected status $status"
fi
}
test_woodpecker_subpath() {
log_section "Test 3: Woodpecker at /ci/"
local status
status=$(http_head "$BASE_URL${CI_PATH}")
if [ "$status" -ge 200 ] && [ "$status" -lt 400 ]; then
log_pass "Woodpecker at ${BASE_URL}${CI_PATH} returns status $status"
else
log_fail "Woodpecker at ${BASE_URL}${CI_PATH} returned unexpected status $status"
fi
}
test_chat_subpath() {
log_section "Test 4: Chat at /chat/"
# Test chat login endpoint
local status
status=$(http_head "$BASE_URL${CHAT_PATH}login")
if [ "$status" -ge 200 ] && [ "$status" -lt 400 ]; then
log_pass "Chat login at ${BASE_URL}${CHAT_PATH}login returns status $status"
else
log_fail "Chat login at ${BASE_URL}${CHAT_PATH}login returned unexpected status $status"
fi
# Test chat OAuth callback endpoint
status=$(http_head "$BASE_URL${CHAT_PATH}oauth/callback")
if [ "$status" -ge 200 ] && [ "$status" -lt 400 ]; then
log_pass "Chat OAuth callback at ${BASE_URL}${CHAT_PATH}oauth/callback returns status $status"
else
log_fail "Chat OAuth callback at ${BASE_URL}${CHAT_PATH}oauth/callback returned unexpected status $status"
fi
}
test_staging_subpath() {
log_section "Test 5: Staging at /staging/"
local status
status=$(http_head "$BASE_URL${STAGING_PATH}")
if [ "$status" -ge 200 ] && [ "$status" -lt 400 ]; then
log_pass "Staging at ${BASE_URL}${STAGING_PATH} returns status $status"
else
log_fail "Staging at ${BASE_URL}${STAGING_PATH} returned unexpected status $status"
fi
}
test_forward_auth_rejection() {
log_section "Test 6: Forward auth on /chat/* rejects unauthenticated requests"
# Request a protected chat endpoint without auth header
# Should return 401 (Unauthorized) due to forward_auth
local status
status=$(http_head "$BASE_URL${CHAT_PATH}auth/verify")
if [ "$status" = "401" ]; then
log_pass "Unauthenticated /chat/auth/verify returns 401 (forward_auth working)"
elif [ "$status" -ge 200 ] && [ "$status" -lt 400 ]; then
log_skip "Unauthenticated /chat/auth/verify returns $status (forward_auth may be disabled)"
else
log_fail "Expected 401 for unauthenticated /chat/auth/verify, got status $status"
fi
}
test_forgejo_oauth_callback() {
log_section "Test 7: Forgejo OAuth callback for Woodpecker under subpath"
# Test that Forgejo OAuth callback path works (Woodpecker OAuth integration)
local status
status=$(http_head "$BASE_URL${FORGE_PATH}login/oauth/callback")
if [ "$status" -ge 200 ] && [ "$status" -lt 400 ]; then
log_pass "Forgejo OAuth callback at ${BASE_URL}${FORGE_PATH}login/oauth/callback works"
else
log_fail "Forgejo OAuth callback returned unexpected status $status"
fi
}
# ─────────────────────────────────────────────────────────────────────────────
# Main
# ─────────────────────────────────────────────────────────────────────────────
main() {
log_info "Starting subpath routing smoke test"
log_info "Base URL: $BASE_URL"
log_info "Timeout: ${EDGE_TIMEOUT}s, Max retries: ${EDGE_MAX_RETRIES}"
# Run all tests
test_root_redirect
test_forgejo_subpath
test_woodpecker_subpath
test_chat_subpath
test_staging_subpath
test_forward_auth_rejection
test_forgejo_oauth_callback
# Summary
log_section "Test Summary"
log_info "Passed: $PASSED"
log_info "Failed: $FAILED"
log_info "Skipped: $SKIPPED"
if [ "$FAILED" -gt 0 ]; then
log_fail "Some tests failed"
exit 1
fi
log_pass "All tests passed!"
exit 0
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
--base-url)
BASE_URL="$2"
shift 2
;;
--base-url=*)
BASE_URL="${1#*=}"
shift
;;
--help)
echo "Usage: $0 [options]"
echo ""
echo "Options:"
echo " --base-url URL Set base URL (default: http://localhost)"
echo " --help Show this help message"
echo ""
echo "Environment variables:"
echo " BASE_URL Base URL for edge proxy (default: http://localhost)"
echo " EDGE_TIMEOUT Request timeout in seconds (default: 30)"
echo " EDGE_MAX_RETRIES Max retries per request (default: 3)"
exit 0
;;
*)
echo "Unknown option: $1" >&2
exit 1
;;
esac
done
main

View file

@ -15,6 +15,7 @@
set -euo pipefail set -euo pipefail
FACTORY_ROOT="$(cd "$(dirname "$0")/.." && pwd)" FACTORY_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
export FACTORY_ROOT_REAL="$FACTORY_ROOT"
# Always use localhost for mock Forgejo (in case FORGE_URL is set from docker-compose) # Always use localhost for mock Forgejo (in case FORGE_URL is set from docker-compose)
export FORGE_URL="http://localhost:3000" export FORGE_URL="http://localhost:3000"
MOCK_BIN="/tmp/smoke-mock-bin" MOCK_BIN="/tmp/smoke-mock-bin"
@ -30,7 +31,8 @@ cleanup() {
rm -rf "$MOCK_BIN" /tmp/smoke-test-repo \ rm -rf "$MOCK_BIN" /tmp/smoke-test-repo \
"${FACTORY_ROOT}/projects/smoke-repo.toml" \ "${FACTORY_ROOT}/projects/smoke-repo.toml" \
/tmp/smoke-claude-shared /tmp/smoke-home-claude \ /tmp/smoke-claude-shared /tmp/smoke-home-claude \
/tmp/smoke-env-before-rerun /tmp/smoke-env-before-dryrun /tmp/smoke-env-before-rerun /tmp/smoke-env-before-dryrun \
"${FACTORY_ROOT}/docker-compose.yml"
# Restore .env only if we created the backup # Restore .env only if we created the backup
if [ -f "${FACTORY_ROOT}/.env.smoke-backup" ]; then if [ -f "${FACTORY_ROOT}/.env.smoke-backup" ]; then
mv "${FACTORY_ROOT}/.env.smoke-backup" "${FACTORY_ROOT}/.env" mv "${FACTORY_ROOT}/.env.smoke-backup" "${FACTORY_ROOT}/.env"
@ -423,6 +425,51 @@ export CLAUDE_SHARED_DIR="$ORIG_CLAUDE_SHARED_DIR"
export CLAUDE_CONFIG_DIR="$ORIG_CLAUDE_CONFIG_DIR" export CLAUDE_CONFIG_DIR="$ORIG_CLAUDE_CONFIG_DIR"
rm -rf /tmp/smoke-claude-shared /tmp/smoke-home-claude rm -rf /tmp/smoke-claude-shared /tmp/smoke-home-claude
# ── 8. Test duplicate service name detection ──────────────────────────────
echo "=== 8/8 Testing duplicate service name detection ==="
# Isolated factory root — do NOT touch the real ${FACTORY_ROOT}/projects/
SMOKE_DUP_ROOT=$(mktemp -d)
mkdir -p "$SMOKE_DUP_ROOT/projects"
cat > "$SMOKE_DUP_ROOT/projects/duplicate-test.toml" <<'TOMLEOF'
name = "duplicate-test"
description = "dup-detection smoke"
[ci]
woodpecker_repo_id = "999"
[agents.llama]
base_url = "http://localhost:8080"
model = "qwen:latest"
roles = ["dev"]
forge_user = "llama-bot"
TOMLEOF
# Call the generator directly — no `disinto init` to overwrite the TOML.
# FACTORY_ROOT tells generators.sh where projects/ + compose_file live.
(
export FACTORY_ROOT="$SMOKE_DUP_ROOT"
export ENABLE_LLAMA_AGENT=1
# shellcheck disable=SC1091
source "${FACTORY_ROOT_REAL:-$(cd "$(dirname "$0")/.." && pwd)}/lib/generators.sh"
# Use a temp file to capture output since pipefail will kill the pipeline
# when _generate_compose_impl returns non-zero
_generate_compose_impl > /tmp/smoke-dup-output.txt 2>&1 || true
if grep -q "Duplicate service name" /tmp/smoke-dup-output.txt; then
pass "Duplicate service detection: conflict between ENABLE_LLAMA_AGENT and [agents.llama] reported"
rm -f /tmp/smoke-dup-output.txt
exit 0
else
fail "Duplicate service detection: no error raised for ENABLE_LLAMA_AGENT + [agents.llama]"
cat /tmp/smoke-dup-output.txt >&2
rm -f /tmp/smoke-dup-output.txt
exit 1
fi
) || FAILED=1
rm -rf "$SMOKE_DUP_ROOT"
unset ENABLE_LLAMA_AGENT
# ── Summary ────────────────────────────────────────────────────────────────── # ── Summary ──────────────────────────────────────────────────────────────────
echo "" echo ""
if [ "$FAILED" -ne 0 ]; then if [ "$FAILED" -ne 0 ]; then

238
tests/test-caddyfile-routing.sh Executable file
View file

@ -0,0 +1,238 @@
#!/usr/bin/env bash
# =============================================================================
# test-caddyfile-routing.sh — Caddyfile routing block unit test
#
# Extracts the Caddyfile template from nomad/jobs/edge.hcl and validates its
# structure without requiring a running Caddy instance.
#
# Checks:
# - Forgejo subpath (/forge/* -> :3000)
# - Woodpecker subpath (/ci/* -> :8000)
# - Staging subpath (/staging/* -> nomadService discovery)
# - Chat subpath (/chat/* with forward_auth and OAuth routes)
# - Root redirect to /forge/
#
# Usage:
# test-caddyfile-routing.sh
#
# Exit codes:
# 0 — All checks passed
# 1 — One or more checks failed
# =============================================================================
set -euo pipefail
# Script directory for relative paths
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
EDGE_TEMPLATE="${REPO_ROOT}/nomad/jobs/edge.hcl"
# Track test status
FAILED=0
PASSED=0
# ─────────────────────────────────────────────────────────────────────────────
# Logging helpers
# ─────────────────────────────────────────────────────────────────────────────
tr_info() {
echo "[INFO] $*"
}
tr_pass() {
echo "[PASS] $*"
((PASSED++)) || true
}
tr_fail() {
echo "[FAIL] $*"
((FAILED++)) || true
}
tr_section() {
echo ""
echo "=== $* ==="
echo ""
}
# ─────────────────────────────────────────────────────────────────────────────
# Caddyfile extraction
# ─────────────────────────────────────────────────────────────────────────────
extract_caddyfile() {
local template_file="$1"
# Extract the Caddyfile template (content between <<EOT and EOT markers
# within the template stanza)
local caddyfile
caddyfile=$(sed -n '/data[[:space:]]*=[[:space:]]*<<[Ee][Oo][Tt]/,/^EOT$/p' "$template_file" | sed '1s/.*/# Caddyfile extracted from Nomad template/; $d')
if [ -z "$caddyfile" ]; then
echo "ERROR: Could not extract Caddyfile template from $template_file" >&2
return 1
fi
echo "$caddyfile"
}
# ─────────────────────────────────────────────────────────────────────────────
# Validation functions
# ─────────────────────────────────────────────────────────────────────────────
check_forgejo_routing() {
tr_section "Validating Forgejo routing"
# Check handle block for /forge/*
if echo "$CADDYFILE" | grep -q "handle /forge/\*"; then
tr_pass "Forgejo handle block (handle /forge/*)"
else
tr_fail "Missing Forgejo handle block (handle /forge/*)"
fi
# Check reverse_proxy to Forgejo on port 3000
if echo "$CADDYFILE" | grep -q "reverse_proxy 127.0.0.1:3000"; then
tr_pass "Forgejo reverse_proxy configured (127.0.0.1:3000)"
else
tr_fail "Missing Forgejo reverse_proxy (127.0.0.1:3000)"
fi
}
check_woodpecker_routing() {
tr_section "Validating Woodpecker routing"
# Check handle block for /ci/*
if echo "$CADDYFILE" | grep -q "handle /ci/\*"; then
tr_pass "Woodpecker handle block (handle /ci/*)"
else
tr_fail "Missing Woodpecker handle block (handle /ci/*)"
fi
# Check reverse_proxy to Woodpecker on port 8000
if echo "$CADDYFILE" | grep -q "reverse_proxy 127.0.0.1:8000"; then
tr_pass "Woodpecker reverse_proxy configured (127.0.0.1:8000)"
else
tr_fail "Missing Woodpecker reverse_proxy (127.0.0.1:8000)"
fi
}
check_staging_routing() {
tr_section "Validating Staging routing"
# Check handle block for /staging/*
if echo "$CADDYFILE" | grep -q "handle /staging/\*"; then
tr_pass "Staging handle block (handle /staging/*)"
else
tr_fail "Missing Staging handle block (handle /staging/*)"
fi
# Check for uri strip_prefix /staging directive
if echo "$CADDYFILE" | grep -q "uri strip_prefix /staging"; then
tr_pass "Staging uri strip_prefix configured (/staging)"
else
tr_fail "Missing uri strip_prefix /staging for staging"
fi
# Check for nomadService discovery (dynamic port)
if echo "$CADDYFILE" | grep -q "nomadService"; then
tr_pass "Staging uses Nomad service discovery"
else
tr_fail "Missing Nomad service discovery for staging"
fi
}
check_chat_routing() {
tr_section "Validating Chat routing"
# Check login endpoint
if echo "$CADDYFILE" | grep -q "handle /chat/login"; then
tr_pass "Chat login handle block (handle /chat/login)"
else
tr_fail "Missing Chat login handle block (handle /chat/login)"
fi
# Check OAuth callback endpoint
if echo "$CADDYFILE" | grep -q "handle /chat/oauth/callback"; then
tr_pass "Chat OAuth callback handle block (handle /chat/oauth/callback)"
else
tr_fail "Missing Chat OAuth callback handle block (handle /chat/oauth/callback)"
fi
# Check catch-all for /chat/*
if echo "$CADDYFILE" | grep -q "handle /chat/\*"; then
tr_pass "Chat catch-all handle block (handle /chat/*)"
else
tr_fail "Missing Chat catch-all handle block (handle /chat/*)"
fi
# Check reverse_proxy to Chat on port 8080
if echo "$CADDYFILE" | grep -q "reverse_proxy 127.0.0.1:8080"; then
tr_pass "Chat reverse_proxy configured (127.0.0.1:8080)"
else
tr_fail "Missing Chat reverse_proxy (127.0.0.1:8080)"
fi
# Check forward_auth block for /chat/*
if echo "$CADDYFILE" | grep -A10 "handle /chat/\*" | grep -q "forward_auth"; then
tr_pass "forward_auth block configured for /chat/*"
else
tr_fail "Missing forward_auth block for /chat/*"
fi
# Check forward_auth URI
if echo "$CADDYFILE" | grep -q "uri /chat/auth/verify"; then
tr_pass "forward_auth URI configured (/chat/auth/verify)"
else
tr_fail "Missing forward_auth URI (/chat/auth/verify)"
fi
}
check_root_redirect() {
tr_section "Validating root redirect"
# Check root redirect to /forge/
if echo "$CADDYFILE" | grep -q "redir /forge/ 302"; then
tr_pass "Root redirect to /forge/ configured (302)"
else
tr_fail "Missing root redirect to /forge/"
fi
}
# ─────────────────────────────────────────────────────────────────────────────
# Main
# ─────────────────────────────────────────────────────────────────────────────
main() {
tr_info "Extracting Caddyfile template from $EDGE_TEMPLATE"
# Extract Caddyfile
CADDYFILE=$(extract_caddyfile "$EDGE_TEMPLATE")
if [ -z "$CADDYFILE" ]; then
tr_fail "Could not extract Caddyfile template"
exit 1
fi
tr_pass "Caddyfile template extracted successfully"
# Run all validation checks
check_forgejo_routing
check_woodpecker_routing
check_staging_routing
check_chat_routing
check_root_redirect
# Summary
tr_section "Test Summary"
tr_info "Passed: $PASSED"
tr_info "Failed: $FAILED"
if [ "$FAILED" -gt 0 ]; then
tr_fail "Some checks failed"
exit 1
fi
tr_pass "All routing blocks validated!"
exit 0
}
main

View file

@ -0,0 +1,210 @@
#!/usr/bin/env bash
# tests/test-duplicate-service-detection.sh — Unit test for duplicate service detection
#
# Tests that the compose generator correctly detects duplicate service names
# between ENABLE_LLAMA_AGENT=1 and [agents.llama] TOML configuration.
set -euo pipefail
# Get the absolute path to the disinto root
DISINTO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
TEST_DIR=$(mktemp -d)
trap "rm -rf \"\$TEST_DIR\"" EXIT
FAILED=0
fail() { printf 'FAIL: %s\n' "$*" >&2; FAILED=1; }
pass() { printf 'PASS: %s\n' "$*"; }
# Test 1: Duplicate between ENABLE_LLAMA_AGENT and [agents.llama]
echo "=== Test 1: Duplicate between ENABLE_LLAMA_AGENT and [agents.llama] ==="
# Create projects directory and test project TOML with an agent named "llama"
mkdir -p "${TEST_DIR}/projects"
cat > "${TEST_DIR}/projects/test-project.toml" <<'TOMLEOF'
name = "test-project"
description = "Test project for duplicate detection"
[ci]
woodpecker_repo_id = "123"
[agents.llama]
base_url = "http://localhost:8080"
model = "qwen:latest"
roles = ["dev"]
forge_user = "llama-bot"
TOMLEOF
# Create a minimal compose file
cat > "${TEST_DIR}/docker-compose.yml" <<'COMPOSEEOF'
# Test compose file
services:
agents:
image: test:latest
command: echo "hello"
volumes:
test-data:
networks:
test-net:
COMPOSEEOF
# Set up the test environment
export FACTORY_ROOT="${TEST_DIR}"
export PROJECT_NAME="test-project"
export ENABLE_LLAMA_AGENT="1"
export FORGE_TOKEN=""
export FORGE_PASS=""
export CLAUDE_TIMEOUT="7200"
export POLL_INTERVAL="300"
export GARDENER_INTERVAL="21600"
export ARCHITECT_INTERVAL="21600"
export PLANNER_INTERVAL="43200"
export SUPERVISOR_INTERVAL="1200"
# Source the generators module and run the compose generator directly
source "${DISINTO_ROOT}/lib/generators.sh"
# Delete the compose file to force regeneration
rm -f "${TEST_DIR}/docker-compose.yml"
# Run the compose generator directly
if _generate_compose_impl 3000 false 2>&1 | tee "${TEST_DIR}/output.txt"; then
# Check if the output contains the duplicate error message
if grep -q "Duplicate service name 'agents-llama'" "${TEST_DIR}/output.txt"; then
pass "Duplicate detection: correctly detected conflict between ENABLE_LLAMA_AGENT and [agents.llama]"
else
fail "Duplicate detection: should have detected conflict between ENABLE_LLAMA_AGENT and [agents.llama]"
cat "${TEST_DIR}/output.txt" >&2
fi
else
# Generator should fail with non-zero exit code
if grep -q "Duplicate service name 'agents-llama'" "${TEST_DIR}/output.txt"; then
pass "Duplicate detection: correctly detected conflict and returned non-zero exit code"
else
fail "Duplicate detection: should have failed with duplicate error"
cat "${TEST_DIR}/output.txt" >&2
fi
fi
# Test 2: No duplicate when only ENABLE_LLAMA_AGENT is set (no conflicting TOML)
echo ""
echo "=== Test 2: No duplicate when only ENABLE_LLAMA_AGENT is set ==="
# Remove the projects directory created in Test 1
rm -rf "${TEST_DIR}/projects"
# Create a fresh compose file
cat > "${TEST_DIR}/docker-compose.yml" <<'COMPOSEEOF'
# Test compose file
services:
agents:
image: test:latest
volumes:
test-data:
networks:
test-net:
COMPOSEEOF
# Set ENABLE_LLAMA_AGENT
export ENABLE_LLAMA_AGENT="1"
# Delete the compose file to force regeneration
rm -f "${TEST_DIR}/docker-compose.yml"
if _generate_compose_impl 3000 false 2>&1 | tee "${TEST_DIR}/output2.txt"; then
if grep -q "Duplicate" "${TEST_DIR}/output2.txt"; then
fail "No duplicate: should not detect duplicate when only ENABLE_LLAMA_AGENT is set"
else
pass "No duplicate: correctly generated compose without duplicates"
fi
else
# Non-zero exit is fine if there's a legitimate reason (e.g., missing files)
if grep -q "Duplicate" "${TEST_DIR}/output2.txt"; then
fail "No duplicate: should not detect duplicate when only ENABLE_LLAMA_AGENT is set"
else
pass "No duplicate: generator failed for other reason (acceptable)"
fi
fi
# Test 3: Duplicate between two TOML agents with same name
echo ""
echo "=== Test 3: Duplicate between two TOML agents with same name ==="
rm -f "${TEST_DIR}/docker-compose.yml"
# Create projects directory for Test 3
mkdir -p "${TEST_DIR}/projects"
cat > "${TEST_DIR}/projects/project1.toml" <<'TOMLEOF'
name = "project1"
description = "First project"
[ci]
woodpecker_repo_id = "1"
[agents.llama]
base_url = "http://localhost:8080"
model = "qwen:latest"
roles = ["dev"]
forge_user = "llama-bot1"
TOMLEOF
cat > "${TEST_DIR}/projects/project2.toml" <<'TOMLEOF'
name = "project2"
description = "Second project"
[ci]
woodpecker_repo_id = "2"
[agents.llama]
base_url = "http://localhost:8080"
model = "qwen:latest"
roles = ["dev"]
forge_user = "llama-bot2"
TOMLEOF
cat > "${TEST_DIR}/docker-compose.yml" <<'COMPOSEEOF'
# Test compose file
services:
agents:
image: test:latest
volumes:
test-data:
networks:
test-net:
COMPOSEEOF
unset ENABLE_LLAMA_AGENT
# Delete the compose file to force regeneration
rm -f "${TEST_DIR}/docker-compose.yml"
if _generate_compose_impl 3000 false 2>&1 | tee "${TEST_DIR}/output3.txt"; then
if grep -q "Duplicate service name 'agents-llama'" "${TEST_DIR}/output3.txt"; then
pass "Duplicate detection: correctly detected conflict between two [agents.llama] blocks"
else
fail "Duplicate detection: should have detected conflict between two [agents.llama] blocks"
cat "${TEST_DIR}/output3.txt" >&2
fi
else
if grep -q "Duplicate service name 'agents-llama'" "${TEST_DIR}/output3.txt"; then
pass "Duplicate detection: correctly detected conflict and returned non-zero exit code"
else
fail "Duplicate detection: should have failed with duplicate error"
cat "${TEST_DIR}/output3.txt" >&2
fi
fi
# Summary
echo ""
if [ "$FAILED" -ne 0 ]; then
echo "=== TESTS FAILED ==="
exit 1
fi
echo "=== ALL TESTS PASSED ==="

View file

@ -0,0 +1,129 @@
#!/usr/bin/env bash
# test-watchdog-process-group.sh — Test that claude_run_with_watchdog kills orphan children
#
# This test verifies that when claude_run_with_watchdog terminates the Claude process,
# all child processes (including those spawned by Claude's Bash tool) are also killed.
#
# Reproducer scenario:
# 1. Create a fake "claude" stub that:
# a. Spawns a long-running child process (sleep 3600)
# b. Writes a result marker to stdout to trigger idle detection
# c. Stays running
# 2. Run claude_run_with_watchdog with the stub
# 3. Before the fix: sleep child survives (orphaned to PID 1)
# 4. After the fix: sleep child dies (killed as part of process group with -PID)
#
# Usage: ./tests/test-watchdog-process-group.sh
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")/.." && pwd)"
TEST_TMP="/tmp/test-watchdog-$$"
LOGFILE="${TEST_TMP}/log.txt"
PASS=true
# shellcheck disable=SC2317
cleanup_test() {
rm -rf "$TEST_TMP"
}
trap cleanup_test EXIT INT TERM
mkdir -p "$TEST_TMP"
log() {
printf '[TEST] %s\n' "$*" | tee -a "$LOGFILE"
}
fail() {
printf '[TEST] FAIL: %s\n' "$*" | tee -a "$LOGFILE"
PASS=false
}
pass() {
printf '[TEST] PASS: %s\n' "$*" | tee -a "$LOGFILE"
}
# Export required environment variables
export CLAUDE_TIMEOUT=10 # Short timeout for testing
export CLAUDE_IDLE_GRACE=2 # Short grace period for testing
export LOGFILE="${LOGFILE}" # Required by agent-sdk.sh
# Create a fake claude stub that:
# 1. Spawns a long-running child process (sleep 3600) that will become an orphan if parent is killed
# 2. Writes a result marker to stdout (to trigger the watchdog's idle-after-result path)
# 3. Stays running so the watchdog can kill it
cat > "${TEST_TMP}/fake-claude" << 'FAKE_CLAUDE_EOF'
#!/usr/bin/env bash
# Fake claude that spawns a child and stays running
# Simulates Claude's behavior when it spawns a Bash tool command
# Write result marker to stdout (triggers watchdog idle detection)
echo '{"type":"result","session_id":"test-session-123","verdict":"APPROVE"}'
# Spawn a child that simulates Claude's Bash tool hanging
# This is the process that should be killed when the parent is terminated
sleep 3600 &
CHILD_PID=$!
# Log the child PID for debugging
echo "FAKE_CLAUDE_CHILD_PID=$CHILD_PID" >&2
# Stay running - sleep in a loop so the watchdog can kill us
while true; do
sleep 3600 &
wait $! 2>/dev/null || true
done
FAKE_CLAUDE_EOF
chmod +x "${TEST_TMP}/fake-claude"
log "Testing claude_run_with_watchdog process group cleanup..."
# Source the library and run claude_run_with_watchdog
cd "$SCRIPT_DIR"
source lib/agent-sdk.sh
log "Starting claude_run_with_watchdog with fake claude..."
# Run the function directly (not as a script)
# We need to capture output and redirect stderr
OUTPUT_FILE="${TEST_TMP}/output.txt"
timeout 35 bash -c "
source '${SCRIPT_DIR}/lib/agent-sdk.sh'
CLAUDE_TIMEOUT=10 CLAUDE_IDLE_GRACE=2 LOGFILE='${LOGFILE}' claude_run_with_watchdog '${TEST_TMP}/fake-claude' > '${OUTPUT_FILE}' 2>&1
exit \$?
" || true
# Give the watchdog a moment to clean up
log "Waiting for cleanup..."
sleep 5
# More precise check: look for sleep 3600 processes
# These would be the orphans from our fake claude
ORPHAN_COUNT=$(pgrep -a sleep 2>/dev/null | grep -c "sleep 3600" 2>/dev/null || echo "0")
if [ "$ORPHAN_COUNT" -gt 0 ]; then
log "Found $ORPHAN_COUNT orphan sleep 3600 processes:"
pgrep -a sleep | grep "sleep 3600"
fail "Orphan children found - process group cleanup did not work"
else
pass "No orphan children found - process group cleanup worked"
fi
# Also verify that the fake claude itself is not running
FAKE_CLAUDE_COUNT=$(pgrep -c -f "fake-claude" 2>/dev/null || echo "0")
if [ "$FAKE_CLAUDE_COUNT" -gt 0 ]; then
log "Found $FAKE_CLAUDE_COUNT fake-claude processes still running"
fail "Fake claude process(es) still running"
else
pass "Fake claude process terminated"
fi
# Summary
echo ""
if [ "$PASS" = true ]; then
log "All tests passed!"
exit 0
else
log "Some tests failed. See log at $LOGFILE"
exit 1
fi

363
tests/vault-import.bats Normal file
View file

@ -0,0 +1,363 @@
#!/usr/bin/env bats
# tests/vault-import.bats — Tests for tools/vault-import.sh
#
# Runs against a dev-mode Vault server (single binary, no LXC needed).
# CI launches vault server -dev inline before running these tests.
VAULT_BIN="${VAULT_BIN:-vault}"
IMPORT_SCRIPT="${BATS_TEST_DIRNAME}/../tools/vault-import.sh"
FIXTURES_DIR="${BATS_TEST_DIRNAME}/fixtures"
setup_file() {
# Start dev-mode vault on a random port
export VAULT_DEV_PORT
VAULT_DEV_PORT="$(shuf -i 18200-18299 -n 1)"
export VAULT_ADDR="http://127.0.0.1:${VAULT_DEV_PORT}"
"$VAULT_BIN" server -dev \
-dev-listen-address="127.0.0.1:${VAULT_DEV_PORT}" \
-dev-root-token-id="test-root-token" \
-dev-no-store-token \
&>"${BATS_FILE_TMPDIR}/vault.log" &
export VAULT_PID=$!
export VAULT_TOKEN="test-root-token"
# Wait for vault to be ready (up to 10s)
local i=0
while ! curl -sf "${VAULT_ADDR}/v1/sys/health" >/dev/null 2>&1; do
sleep 0.5
i=$((i + 1))
if [ "$i" -ge 20 ]; then
echo "Vault failed to start. Log:" >&2
cat "${BATS_FILE_TMPDIR}/vault.log" >&2
return 1
fi
done
# Enable kv-v2 at path=kv (production mount per S2 migration). Dev-mode
# vault only auto-mounts kv-v2 at secret/; tests must mirror the real
# cluster layout so vault-import.sh writes land where we read them.
curl -sf -H "X-Vault-Token: test-root-token" \
-X POST -d '{"type":"kv","options":{"version":"2"}}' \
"${VAULT_ADDR}/v1/sys/mounts/kv" >/dev/null
}
teardown_file() {
if [ -n "${VAULT_PID:-}" ]; then
kill "$VAULT_PID" 2>/dev/null || true
wait "$VAULT_PID" 2>/dev/null || true
fi
}
setup() {
# Source the module under test for hvault functions
source "${BATS_TEST_DIRNAME}/../lib/hvault.sh"
export VAULT_ADDR VAULT_TOKEN
}
# --- Security checks ---
@test "refuses to run if VAULT_ADDR is not localhost" {
export VAULT_ADDR="http://prod-vault.example.com:8200"
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -ne 0 ]
echo "$output" | grep -q "Security check failed"
}
@test "refuses if age key file permissions are not 0400" {
# Create a temp file with wrong permissions
local bad_key="${BATS_TEST_TMPDIR}/bad-ages.txt"
echo "AGE-SECRET-KEY-1TEST" > "$bad_key"
chmod 644 "$bad_key"
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$bad_key"
[ "$status" -ne 0 ]
echo "$output" | grep -q "permissions"
}
# --- Dry-run mode ─────────────────────────────────────────────────────────────
@test "--dry-run prints plan without writing to Vault" {
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt" \
--dry-run
[ "$status" -eq 0 ]
echo "$output" | grep -q "DRY-RUN"
echo "$output" | grep -q "Import plan"
echo "$output" | grep -q "Planned operations"
# Verify nothing was written to Vault
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/bots/review"
[ "$status" -ne 0 ]
}
# --- Complete fixture import ─────────────────────────────────────────────────
@test "imports all keys from complete fixture" {
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -eq 0 ]
# Check bots/review
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/bots/review"
[ "$status" -eq 0 ]
echo "$output" | grep -q "review-token"
echo "$output" | grep -q "review-pass"
# Check bots/dev-qwen
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/bots/dev-qwen"
[ "$status" -eq 0 ]
echo "$output" | grep -q "llama-token"
echo "$output" | grep -q "llama-pass"
# Check forge
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/shared/forge"
[ "$status" -eq 0 ]
echo "$output" | grep -q "generic-forge-token"
echo "$output" | grep -q "generic-forge-pass"
echo "$output" | grep -q "generic-admin-token"
# Check woodpecker
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/shared/woodpecker"
[ "$status" -eq 0 ]
echo "$output" | grep -q "wp-agent-secret"
# Forgejo keys are normalized: WP_FORGEJO_* → forgejo_* (no wp_ prefix in key name)
echo "$output" | grep -q "wp-forgejo-client"
echo "$output" | grep -q "wp-forgejo-secret"
echo "$output" | grep -q "wp-token"
# Check chat
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/shared/chat"
[ "$status" -eq 0 ]
echo "$output" | grep -q "forward-auth-secret"
echo "$output" | grep -q "chat-client-id"
echo "$output" | grep -q "chat-client-secret"
# Check runner tokens from sops
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/runner/GITHUB_TOKEN"
[ "$status" -eq 0 ]
echo "$output" | jq -e '.data.data.value == "github-test-token-abc123"'
}
# --- Idempotency ──────────────────────────────────────────────────────────────
@test "re-run with unchanged fixtures reports all unchanged" {
# First run
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -eq 0 ]
# Second run - should report unchanged
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -eq 0 ]
# Check that all keys report unchanged
echo "$output" | grep -q "unchanged"
# Count unchanged occurrences (should be many)
local unchanged_count
unchanged_count=$(echo "$output" | grep -c "unchanged" || true)
[ "$unchanged_count" -gt 10 ]
}
@test "re-run with modified value reports only that key as updated" {
# Create a modified fixture
local modified_env="${BATS_TEST_TMPDIR}/dot-env-modified"
cp "$FIXTURES_DIR/dot-env-complete" "$modified_env"
# Modify one value
sed -i 's/llama-token/MODIFIED-LLAMA-TOKEN/' "$modified_env"
# Run with modified fixture
run "$IMPORT_SCRIPT" \
--env "$modified_env" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -eq 0 ]
# Check that dev-qwen token was updated
echo "$output" | grep -q "dev-qwen.*updated"
# Verify the new value was written (path is disinto/bots/dev-qwen, key is token)
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/bots/dev-qwen"
[ "$status" -eq 0 ]
echo "$output" | jq -e '.data.data.token == "MODIFIED-LLAMA-TOKEN"'
}
# --- Delimiter-in-value regression (#898) ────────────────────────────────────
@test "preserves secret values that contain a pipe character" {
# Regression: previous accumulator packed values into "value|status" and
# joined per-path kv pairs with '|', so any value containing '|' was
# silently truncated or misrouted.
local piped_env="${BATS_TEST_TMPDIR}/dot-env-piped"
cp "$FIXTURES_DIR/dot-env-complete" "$piped_env"
# Swap in values that contain the old delimiter. Exercise both:
# - a paired bot path (token + pass on same vault path, hitting the
# per-path kv-pair join)
# - a single-key path (admin token)
# Values are single-quoted so they survive `source` of the .env file;
# `|` is a shell metachar and unquoted would start a pipeline. That is
# orthogonal to the accumulator bug under test — users are expected to
# quote such values in .env, and the accumulator must then preserve them.
sed -i "s#^FORGE_REVIEW_TOKEN=.*#FORGE_REVIEW_TOKEN='abc|xyz'#" "$piped_env"
sed -i "s#^FORGE_REVIEW_PASS=.*#FORGE_REVIEW_PASS='p1|p2|p3'#" "$piped_env"
sed -i "s#^FORGE_ADMIN_TOKEN=.*#FORGE_ADMIN_TOKEN='admin|with|pipes'#" "$piped_env"
run "$IMPORT_SCRIPT" \
--env "$piped_env" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -eq 0 ]
# Verify each value round-trips intact.
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/bots/review"
[ "$status" -eq 0 ]
echo "$output" | jq -e '.data.data.token == "abc|xyz"'
echo "$output" | jq -e '.data.data.pass == "p1|p2|p3"'
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/kv/data/disinto/shared/forge"
[ "$status" -eq 0 ]
echo "$output" | jq -e '.data.data.admin_token == "admin|with|pipes"'
}
# --- Incomplete fixture ───────────────────────────────────────────────────────
@test "handles incomplete fixture gracefully" {
# The incomplete fixture is missing some keys, but that should be OK
# - it should only import what exists
# - it should warn about missing pairs
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-incomplete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -eq 0 ]
# Should have imported what was available
echo "$output" | grep -q "review"
# Should complete successfully even with incomplete fixture
# The script handles missing pairs gracefully with warnings to stderr
[ "$status" -eq 0 ]
}
# --- Security: no secrets in output ───────────────────────────────────────────
@test "never logs secret values in stdout" {
# Run the import
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -eq 0 ]
# Check that no actual secret values appear in output
# (only key names and status messages)
local secret_patterns=(
"generic-forge-token"
"generic-forge-pass"
"generic-admin-token"
"review-token"
"review-pass"
"llama-token"
"llama-pass"
"wp-agent-secret"
"forward-auth-secret"
"github-test-token"
"codeberg-test-token"
"clawhub-test-token"
"deploy-key-test"
"npm-test-token"
"dockerhub-test-token"
# Note: forgejo-client and forgejo-secret are NOT in the output
# because they are read from Vault, not logged
)
for pattern in "${secret_patterns[@]}"; do
if echo "$output" | grep -q "$pattern"; then
echo "FAIL: Found secret pattern '$pattern' in output" >&2
echo "Output was:" >&2
echo "$output" >&2
return 1
fi
done
}
# --- Error handling ───────────────────────────────────────────────────────────
@test "fails with missing --env argument" {
run "$IMPORT_SCRIPT" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -ne 0 ]
echo "$output" | grep -q "Missing required argument"
}
@test "fails with missing --sops argument" {
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -ne 0 ]
echo "$output" | grep -q "Missing required argument"
}
@test "fails with missing --age-key argument" {
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc"
[ "$status" -ne 0 ]
echo "$output" | grep -q "Missing required argument"
}
@test "fails with non-existent env file" {
run "$IMPORT_SCRIPT" \
--env "/nonexistent/.env" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -ne 0 ]
echo "$output" | grep -q "not found"
}
@test "fails with non-existent sops file" {
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "/nonexistent/.env.vault.enc" \
--age-key "$FIXTURES_DIR/age-keys.txt"
[ "$status" -ne 0 ]
echo "$output" | grep -q "not found"
}
@test "fails with non-existent age key file" {
run "$IMPORT_SCRIPT" \
--env "$FIXTURES_DIR/dot-env-complete" \
--sops "$FIXTURES_DIR/.env.vault.enc" \
--age-key "/nonexistent/age-keys.txt"
[ "$status" -ne 0 ]
echo "$output" | grep -q "not found"
}

View file

@ -21,6 +21,7 @@ This control plane runs on the public edge host (Debian DO box) and provides:
│ │ disinto-register│ │ /var/lib/disinto/ │ │ │ │ disinto-register│ │ /var/lib/disinto/ │ │
│ │ (authorized_keys│ │ ├── registry.json (source of truth) │ │ │ │ (authorized_keys│ │ ├── registry.json (source of truth) │ │
│ │ forced cmd) │ │ ├── registry.lock (flock) │ │ │ │ forced cmd) │ │ ├── registry.lock (flock) │ │
│ │ │ │ └── allowlist.json (admin-approved names) │ │
│ │ │ │ └── authorized_keys (rebuildable) │ │ │ │ │ │ └── authorized_keys (rebuildable) │ │
│ └────────┬─────────┘ └───────────────────────────────────────────────┘ │ │ └────────┬─────────┘ └───────────────────────────────────────────────┘ │
│ │ │ │ │ │
@ -79,7 +80,7 @@ curl -sL https://raw.githubusercontent.com/disinto-admin/disinto/fix/issue-621/t
- `disinto-tunnel` — no password, no shell, only receives reverse tunnels - `disinto-tunnel` — no password, no shell, only receives reverse tunnels
2. **Creates data directory**: 2. **Creates data directory**:
- `/var/lib/disinto/` with `registry.json`, `registry.lock` - `/var/lib/disinto/` with `registry.json`, `registry.lock`, `allowlist.json`
- Permissions: `root:disinto-register 0750` - Permissions: `root:disinto-register 0750`
3. **Installs Caddy**: 3. **Installs Caddy**:
@ -180,6 +181,48 @@ Shows all registered tunnels with their ports and FQDNs.
} }
``` ```
## Allowlist
The allowlist prevents project name squatting by requiring admin approval before a name can be registered. It is **opt-in**: when `allowlist.json` does not exist, registration is unrestricted. When the file exists, only project names listed in the `allowed` map can be registered.
### Install-time behavior
- **Fresh install**: `install.sh` seeds an empty allowlist (`{"version":1,"allowed":{}}`) and prints a warning that registration is now gated until entries are added.
- **Upgrade onto an existing box**: if `registry.json` has registered projects but `allowlist.json` does not exist, `install.sh` auto-populates the allowlist with each existing project name (unbound — `pubkey_fingerprint: ""`). This preserves current behavior so existing tunnels keep working. The operator can tighten pubkey bindings later.
### Format
`/var/lib/disinto/allowlist.json` (root-owned, `0644`):
```json
{
"version": 1,
"allowed": {
"myproject": {
"pubkey_fingerprint": "SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"open-project": {
"pubkey_fingerprint": ""
}
}
}
```
- **With `pubkey_fingerprint`** (non-empty): only the SSH key with that exact SHA256 fingerprint can register this project name.
- **With empty `pubkey_fingerprint`**: any caller may register this project name (name reservation without key binding).
- **Not listed in `allowed`**: registration is refused with `{"error":"name not approved"}`.
### Workflow
1. Admin edits `/var/lib/disinto/allowlist.json` (via ops repo PR, or direct `ssh root@edge`).
2. File is `root:root 0644``disinto-register` only reads it; `register.sh` never mutates it.
3. Callers run `register` as usual. The allowlist is checked transparently.
### Security
- The allowlist is a **first-come-first-serve defense**: once a name is approved for a key, no one else can claim it.
- It does **not** replace per-operation ownership checks (sibling issue #1094) — it only prevents the initial race.
## Recovery ## Recovery
### After State Loss ### After State Loss
@ -274,6 +317,7 @@ ssh disinto-register@edge.disinto.ai "register myproject $(cat ~/.ssh/id_ed25519
- `lib/ports.sh` — Port allocator over `20000-29999`, jq-based, flockd - `lib/ports.sh` — Port allocator over `20000-29999`, jq-based, flockd
- `lib/authorized_keys.sh` — Deterministic rebuild of `disinto-tunnel` authorized_keys - `lib/authorized_keys.sh` — Deterministic rebuild of `disinto-tunnel` authorized_keys
- `lib/caddy.sh` — POST to Caddy admin API for route mapping - `lib/caddy.sh` — POST to Caddy admin API for route mapping
- `/var/lib/disinto/allowlist.json` — Admin-approved project name allowlist (root-owned, read-only by register.sh)
## Dependencies ## Dependencies

View file

@ -7,10 +7,12 @@
# #
# What it does: # What it does:
# 1. Creates users: disinto-register, disinto-tunnel # 1. Creates users: disinto-register, disinto-tunnel
# 2. Creates /var/lib/disinto/ with registry.json, registry.lock # 2. Creates /var/lib/disinto/ with registry.json, registry.lock, allowlist.json
# 3. Installs Caddy with Gandi DNS plugin # 3. On upgrade: auto-populates allowlist.json from existing registry entries
# 4. Sets up SSH authorized_keys for both users # 4. On fresh install: seeds empty allowlist with warning (registration gated)
# 5. Installs control plane scripts to /opt/disinto-edge/ # 5. Installs Caddy with Gandi DNS plugin
# 6. Sets up SSH authorized_keys for both users
# 7. Installs control plane scripts to /opt/disinto-edge/
# #
# Requirements: # Requirements:
# - Fresh Debian 12 (Bookworm) # - Fresh Debian 12 (Bookworm)
@ -44,6 +46,7 @@ REGISTRY_DIR="/var/lib/disinto"
CADDY_VERSION="2.8.4" CADDY_VERSION="2.8.4"
DOMAIN_SUFFIX="disinto.ai" DOMAIN_SUFFIX="disinto.ai"
EXTRA_CADDYFILE="/etc/caddy/extra.d/*.caddy" EXTRA_CADDYFILE="/etc/caddy/extra.d/*.caddy"
ADMIN_TAG="admin"
usage() { usage() {
cat <<EOF cat <<EOF
@ -57,6 +60,7 @@ Options:
--domain-suffix <suffix> Domain suffix for tunnels (default: disinto.ai) --domain-suffix <suffix> Domain suffix for tunnels (default: disinto.ai)
--extra-caddyfile <path> Import path for operator-owned Caddy config --extra-caddyfile <path> Import path for operator-owned Caddy config
(default: /etc/caddy/extra.d/*.caddy) (default: /etc/caddy/extra.d/*.caddy)
--admin-tag <name> Caller tag for the initial admin key (default: admin)
-h, --help Show this help -h, --help Show this help
Example: Example:
@ -91,6 +95,10 @@ while [[ $# -gt 0 ]]; do
EXTRA_CADDYFILE="$2" EXTRA_CADDYFILE="$2"
shift 2 shift 2
;; ;;
--admin-tag)
ADMIN_TAG="$2"
shift 2
;;
-h|--help) -h|--help)
usage usage
;; ;;
@ -152,8 +160,80 @@ LOCK_FILE="${REGISTRY_DIR}/registry.lock"
touch "$LOCK_FILE" touch "$LOCK_FILE"
chmod 0644 "$LOCK_FILE" chmod 0644 "$LOCK_FILE"
# Initialize allowlist.json
ALLOWLIST_FILE="${REGISTRY_DIR}/allowlist.json"
_ALLOWLIST_MODE=""
if [ ! -f "$ALLOWLIST_FILE" ]; then
# Check whether the registry already has projects that need allowlisting
_EXISTING_PROJECTS=""
if [ -f "$REGISTRY_FILE" ]; then
_EXISTING_PROJECTS=$(jq -r '.projects // {} | keys[]' "$REGISTRY_FILE" 2>/dev/null) || _EXISTING_PROJECTS=""
fi
if [ -n "$_EXISTING_PROJECTS" ]; then
# Upgrade path: auto-populate allowlist with existing projects (unbound).
# This preserves current behavior — existing tunnels keep working.
# Operator can tighten pubkey bindings later.
_ALLOWED='{}'
_PROJECT_COUNT=0
while IFS= read -r _proj; do
_ALLOWED=$(echo "$_ALLOWED" | jq --arg p "$_proj" '. + {($p): {"pubkey_fingerprint": ""}}')
_PROJECT_COUNT=$((_PROJECT_COUNT + 1))
done <<< "$_EXISTING_PROJECTS"
echo "{\"version\":1,\"allowed\":${_ALLOWED}}" | jq '.' > "$ALLOWLIST_FILE"
chmod 0644 "$ALLOWLIST_FILE"
chown root:root "$ALLOWLIST_FILE"
_ALLOWLIST_MODE="upgraded:${_PROJECT_COUNT}"
log_info "Initialized allowlist with ${_PROJECT_COUNT} existing project(s): ${ALLOWLIST_FILE}"
else
# Fresh install: seed empty allowlist and warn the operator.
echo '{"version":1,"allowed":{}}' > "$ALLOWLIST_FILE"
chmod 0644 "$ALLOWLIST_FILE"
chown root:root "$ALLOWLIST_FILE"
_ALLOWLIST_MODE="fresh-empty"
log_warn "Allowlist seeded empty — no project can register until you add entries to ${ALLOWLIST_FILE}."
fi
log_info "Initialized allowlist: ${ALLOWLIST_FILE}"
fi
# ============================================================================= # =============================================================================
# Step 3: Install Caddy with Gandi DNS plugin # Step 3: Create audit log directory and logrotate config
# =============================================================================
log_info "Setting up audit log..."
LOG_DIR="/var/log/disinto"
LOG_FILE="${LOG_DIR}/edge-register.log"
mkdir -p "$LOG_DIR"
chown root:disinto-register "$LOG_DIR"
chmod 0750 "$LOG_DIR"
# Touch the log file so it exists from day one
touch "$LOG_FILE"
chmod 0660 "$LOG_FILE"
chown root:disinto-register "$LOG_FILE"
# Install logrotate config (daily rotation, 30 days retention)
LOGROTATE_CONF="/etc/logrotate.d/disinto-edge"
cat > "$LOGROTATE_CONF" <<EOF
${LOG_FILE} {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0660 root disinto-register
copytruncate
}
EOF
chmod 0644 "$LOGROTATE_CONF"
log_info "Audit log: ${LOG_FILE}"
log_info "Logrotate config: ${LOGROTATE_CONF}"
# =============================================================================
# Step 4: Install Caddy with Gandi DNS plugin
# ============================================================================= # =============================================================================
log_info "Installing Caddy ${CADDY_VERSION} with Gandi DNS plugin..." log_info "Installing Caddy ${CADDY_VERSION} with Gandi DNS plugin..."
@ -284,7 +364,7 @@ systemctl restart caddy 2>/dev/null || {
log_info "Caddy configured with admin API on 127.0.0.1:2019" log_info "Caddy configured with admin API on 127.0.0.1:2019"
# ============================================================================= # =============================================================================
# Step 4: Install control plane scripts # Step 5: Install control plane scripts
# ============================================================================= # =============================================================================
log_info "Installing control plane scripts to ${INSTALL_DIR}..." log_info "Installing control plane scripts to ${INSTALL_DIR}..."
@ -306,7 +386,7 @@ chmod 750 "${INSTALL_DIR}/lib"
log_info "Control plane scripts installed" log_info "Control plane scripts installed"
# ============================================================================= # =============================================================================
# Step 5: Set up SSH authorized_keys # Step 6: Set up SSH authorized_keys
# ============================================================================= # =============================================================================
log_info "Setting up SSH authorized_keys..." log_info "Setting up SSH authorized_keys..."
@ -348,7 +428,7 @@ source "${INSTALL_DIR}/lib/authorized_keys.sh"
rebuild_authorized_keys rebuild_authorized_keys
# ============================================================================= # =============================================================================
# Step 6: Configure forced command for disinto-register # Step 7: Configure forced command for disinto-register
# ============================================================================= # =============================================================================
log_info "Configuring forced command for disinto-register..." log_info "Configuring forced command for disinto-register..."
@ -359,8 +439,8 @@ if [ -n "$ADMIN_PUBKEY" ]; then
KEY_TYPE="${ADMIN_PUBKEY%% *}" KEY_TYPE="${ADMIN_PUBKEY%% *}"
KEY_DATA="${ADMIN_PUBKEY#* }" KEY_DATA="${ADMIN_PUBKEY#* }"
# Create forced command entry # Create forced command entry with caller attribution tag
FORCED_CMD="restrict,command=\"${INSTALL_DIR}/register.sh\" ${KEY_TYPE} ${KEY_DATA}" FORCED_CMD="restrict,command=\"${INSTALL_DIR}/register.sh --as ${ADMIN_TAG}\" ${KEY_TYPE} ${KEY_DATA}"
# Replace the pubkey line # Replace the pubkey line
echo "$FORCED_CMD" > /home/disinto-register/.ssh/authorized_keys echo "$FORCED_CMD" > /home/disinto-register/.ssh/authorized_keys
@ -371,7 +451,7 @@ if [ -n "$ADMIN_PUBKEY" ]; then
fi fi
# ============================================================================= # =============================================================================
# Step 7: Final configuration # Step 8: Final configuration
# ============================================================================= # =============================================================================
log_info "Configuring domain suffix: ${DOMAIN_SUFFIX}" log_info "Configuring domain suffix: ${DOMAIN_SUFFIX}"
@ -389,6 +469,7 @@ echo ""
echo "Configuration:" echo "Configuration:"
echo " Install directory: ${INSTALL_DIR}" echo " Install directory: ${INSTALL_DIR}"
echo " Registry: ${REGISTRY_FILE}" echo " Registry: ${REGISTRY_FILE}"
echo " Allowlist: ${ALLOWLIST_FILE}"
echo " Caddy admin API: http://127.0.0.1:2019" echo " Caddy admin API: http://127.0.0.1:2019"
echo " Operator site blocks: ${EXTRA_DIR}/ (import ${EXTRA_CADDYFILE})" echo " Operator site blocks: ${EXTRA_DIR}/ (import ${EXTRA_CADDYFILE})"
echo "" echo ""
@ -396,6 +477,23 @@ echo "Users:"
echo " disinto-register - SSH forced command (runs ${INSTALL_DIR}/register.sh)" echo " disinto-register - SSH forced command (runs ${INSTALL_DIR}/register.sh)"
echo " disinto-tunnel - Reverse tunnel receiver (no shell)" echo " disinto-tunnel - Reverse tunnel receiver (no shell)"
echo "" echo ""
echo "Allowlist:"
case "${_ALLOWLIST_MODE:-}" in
upgraded:*)
echo " Allowlist was auto-populated from existing registry entries."
echo " Existing projects can register without further action."
;;
fresh-empty)
echo " Allowlist is empty — registration is GATED until you add entries."
echo " Edit ${ALLOWLIST_FILE} as root:"
echo ' {"version":1,"allowed":{"myproject":{"pubkey_fingerprint":""}}}'
echo " See ${INSTALL_DIR}/../README.md for the full workflow."
;;
*)
echo " Allowlist already existed (no changes made)."
;;
esac
echo ""
echo "Next steps:" echo "Next steps:"
echo " 1. Verify Caddy is running: systemctl status caddy" echo " 1. Verify Caddy is running: systemctl status caddy"
echo " 2. Test SSH access: ssh disinto-register@localhost 'list'" echo " 2. Test SSH access: ssh disinto-register@localhost 'list'"

View file

@ -54,13 +54,14 @@ _registry_write() {
} }
# Allocate a port for a project # Allocate a port for a project
# Usage: allocate_port <project> <pubkey> <fqdn> # Usage: allocate_port <project> <pubkey> <fqdn> [<registered_by>]
# Returns: port number on stdout # Returns: port number on stdout
# Writes: registry.json with project entry # Writes: registry.json with project entry
allocate_port() { allocate_port() {
local project="$1" local project="$1"
local pubkey="$2" local pubkey="$2"
local fqdn="$3" local fqdn="$3"
local registered_by="${4:-unknown}"
_ensure_registry_dir _ensure_registry_dir
@ -116,11 +117,13 @@ allocate_port() {
--arg pubkey "$pubkey" \ --arg pubkey "$pubkey" \
--arg fqdn "$fqdn" \ --arg fqdn "$fqdn" \
--arg timestamp "$timestamp" \ --arg timestamp "$timestamp" \
--arg registered_by "$registered_by" \
'.projects[$project] = { '.projects[$project] = {
"port": $port, "port": $port,
"fqdn": $fqdn, "fqdn": $fqdn,
"pubkey": $pubkey, "pubkey": $pubkey,
"registered_at": $timestamp "registered_at": $timestamp,
"registered_by": $registered_by
}') }')
_registry_write "$new_registry" _registry_write "$new_registry"
@ -184,7 +187,7 @@ list_ports() {
local registry local registry
registry=$(_registry_read) registry=$(_registry_read)
echo "$registry" | jq -r '.projects | to_entries | map({name: .key, port: .value.port, fqdn: .value.fqdn}) | .[] | @json' 2>/dev/null echo "$registry" | jq -r '.projects | to_entries | map({name: .key, port: .value.port, fqdn: .value.fqdn, registered_by: (.value.registered_by // "unknown")}) | .[] | @json' 2>/dev/null
} }
# Get full project info from registry # Get full project info from registry

View file

@ -5,9 +5,13 @@
# This script runs as a forced command for the disinto-register SSH user. # This script runs as a forced command for the disinto-register SSH user.
# It parses SSH_ORIGINAL_COMMAND and dispatches to register|deregister|list. # It parses SSH_ORIGINAL_COMMAND and dispatches to register|deregister|list.
# #
# Per-caller attribution: each admin key's forced-command passes --as <tag>,
# which is stored as registered_by in the registry. Missing --as defaults to
# "unknown" for backwards compatibility.
#
# Usage (via SSH): # Usage (via SSH):
# ssh disinto-register@edge "register <project> <pubkey>" # ssh disinto-register@edge "register <project> <pubkey>"
# ssh disinto-register@edge "deregister <project>" # ssh disinto-register@edge "deregister <project> <pubkey>"
# ssh disinto-register@edge "list" # ssh disinto-register@edge "list"
# #
# Output: JSON on stdout # Output: JSON on stdout
@ -25,12 +29,68 @@ source "${SCRIPT_DIR}/lib/authorized_keys.sh"
# Domain suffix # Domain suffix
DOMAIN_SUFFIX="${DOMAIN_SUFFIX:-disinto.ai}" DOMAIN_SUFFIX="${DOMAIN_SUFFIX:-disinto.ai}"
# Reserved project names — operator-adjacent, internal roles, and subdomain-mode prefixes
RESERVED_NAMES=(www api admin root mail chat forge ci edge caddy disinto register tunnel)
# Allowlist path (root-owned, never mutated by this script)
ALLOWLIST_FILE="${ALLOWLIST_FILE:-/var/lib/disinto/allowlist.json}"
# Audit log path
AUDIT_LOG="${AUDIT_LOG:-/var/log/disinto/edge-register.log}"
# Captured error from check_allowlist (used for JSON response)
_ALLOWLIST_ERROR=""
# Caller tag (set via --as <tag> in forced command)
CALLER="unknown"
# Parse script arguments (from forced command, not SSH_ORIGINAL_COMMAND)
while [[ $# -gt 0 ]]; do
case $1 in
--as)
CALLER="$2"
shift 2
;;
*)
shift
;;
esac
done
# Append one line to the audit log.
# Usage: audit_log <action> <project> <port> <pubkey_fp>
# Fails silently — write errors are warned but never abort.
audit_log() {
local action="$1" project="$2" port="$3" pubkey_fp="$4"
local timestamp
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
local line="${timestamp} ${action} project=${project} port=${port} pubkey_fp=${pubkey_fp} caller=${CALLER}"
# Ensure log directory exists
local log_dir
log_dir=$(dirname "$AUDIT_LOG")
if [ ! -d "$log_dir" ]; then
mkdir -p "$log_dir" 2>/dev/null || {
echo "[WARN] audit log: cannot create ${log_dir}" >&2
return 0
}
chown root:disinto-register "$log_dir" 2>/dev/null || true
chmod 0750 "$log_dir"
fi
# Append — write failure is non-fatal
if ! printf '%s\n' "$line" >> "$AUDIT_LOG" 2>/dev/null; then
echo "[WARN] audit log: failed to write to ${AUDIT_LOG}" >&2
fi
}
# Print usage # Print usage
usage() { usage() {
cat <<EOF cat <<EOF
Usage: Usage:
register <project> <pubkey> Register a new tunnel register <project> <pubkey> Register a new tunnel
deregister <project> Remove a tunnel deregister <project> <pubkey> Remove a tunnel (requires owner pubkey)
list List all registered tunnels list List all registered tunnels
Example: Example:
@ -39,23 +99,75 @@ EOF
exit 1 exit 1
} }
# TODO(#713): Subdomain fallback — if subpath routing (#704/#708) fails, this # Check whether the project/pubkey pair is allowed by the allowlist.
# function would need to register additional routes for forge.<project>, # Usage: check_allowlist <project> <pubkey>
# ci.<project>, chat.<project> subdomains (or accept a --subdomain parameter). # Returns: 0 if allowed, 1 if denied (prints error JSON to stderr)
# See docs/edge-routing-fallback.md for the full pivot plan. check_allowlist() {
local project="$1"
local pubkey="$2"
# If allowlist file does not exist, allow all (opt-in policy)
if [ ! -f "$ALLOWLIST_FILE" ]; then
return 0
fi
# Look up the project in the allowlist
local entry
entry=$(jq -c --arg p "$project" '.allowed[$p] // empty' "$ALLOWLIST_FILE" 2>/dev/null) || entry=""
if [ -z "$entry" ]; then
# Project not in allowlist at all
_ALLOWLIST_ERROR="name not approved"
return 1
fi
# Project found — check pubkey fingerprint binding
local bound_fingerprint
bound_fingerprint=$(echo "$entry" | jq -r '.pubkey_fingerprint // ""' 2>/dev/null)
if [ -n "$bound_fingerprint" ]; then
# Fingerprint is bound — verify caller's pubkey matches
local caller_fingerprint
caller_fingerprint=$(ssh-keygen -lf /dev/stdin <<<"$pubkey" 2>/dev/null | awk '{print $2}') || caller_fingerprint=""
if [ -z "$caller_fingerprint" ]; then
_ALLOWLIST_ERROR="invalid pubkey for fingerprint check"
return 1
fi
if [ "$caller_fingerprint" != "$bound_fingerprint" ]; then
_ALLOWLIST_ERROR="pubkey does not match allowed key for this project"
return 1
fi
fi
return 0
}
# Register a new tunnel # Register a new tunnel
# Usage: do_register <project> <pubkey> # Usage: do_register <project> <pubkey>
# When EDGE_ROUTING_MODE=subdomain, also registers forge.<project>, ci.<project>,
# and chat.<project> subdomain routes (see docs/edge-routing-fallback.md).
do_register() { do_register() {
local project="$1" local project="$1"
local pubkey="$2" local pubkey="$2"
# Validate project name (alphanumeric, hyphens, underscores) # Validate project name — strict DNS label: lowercase alphanumeric, inner hyphens,
if ! [[ "$project" =~ ^[a-zA-Z0-9_-]+$ ]]; then # 3-63 chars, no leading/trailing hyphen, no underscore (RFC 1035)
if ! [[ "$project" =~ ^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$ ]]; then
echo '{"error":"invalid project name"}' echo '{"error":"invalid project name"}'
exit 1 exit 1
fi fi
# Check against reserved names
local reserved
for reserved in "${RESERVED_NAMES[@]}"; do
if [[ "$project" = "$reserved" ]]; then
echo '{"error":"name reserved"}'
exit 1
fi
done
# Extract key type and key from pubkey (format: "ssh-ed25519 AAAAC3...") # Extract key type and key from pubkey (format: "ssh-ed25519 AAAAC3...")
local key_type key local key_type key
key_type=$(echo "$pubkey" | awk '{print $1}') key_type=$(echo "$pubkey" | awk '{print $1}')
@ -75,30 +187,65 @@ do_register() {
# Full pubkey for registry # Full pubkey for registry
local full_pubkey="${key_type} ${key}" local full_pubkey="${key_type} ${key}"
# Check allowlist (opt-in: no file = allow all)
if ! check_allowlist "$project" "$full_pubkey"; then
echo "{\"error\":\"${_ALLOWLIST_ERROR}\"}"
exit 1
fi
# Allocate port (idempotent - returns existing if already registered) # Allocate port (idempotent - returns existing if already registered)
local port local port
port=$(allocate_port "$project" "$full_pubkey" "${project}.${DOMAIN_SUFFIX}") port=$(allocate_port "$project" "$full_pubkey" "${project}.${DOMAIN_SUFFIX}" "$CALLER")
# Add Caddy route # Add Caddy route for main project domain
add_route "$project" "$port" add_route "$project" "$port"
# Subdomain mode: register additional routes for per-service subdomains
local routing_mode="${EDGE_ROUTING_MODE:-subpath}"
if [ "$routing_mode" = "subdomain" ]; then
local subdomain
for subdomain in forge ci chat; do
add_route "${subdomain}.${project}" "$port"
done
fi
# Rebuild authorized_keys for tunnel user # Rebuild authorized_keys for tunnel user
rebuild_authorized_keys rebuild_authorized_keys
# Reload Caddy # Reload Caddy
reload_caddy reload_caddy
# Return JSON response # Audit log
echo "{\"port\":${port},\"fqdn\":\"${project}.${DOMAIN_SUFFIX}\"}" local pubkey_fp
pubkey_fp=$(ssh-keygen -lf /dev/stdin <<<"$full_pubkey" 2>/dev/null | awk '{print $2}') || pubkey_fp="unknown"
audit_log "register" "$project" "$port" "$pubkey_fp"
# Build JSON response
local response="{\"port\":${port},\"fqdn\":\"${project}.${DOMAIN_SUFFIX}\""
if [ "$routing_mode" = "subdomain" ]; then
response="${response},\"routing_mode\":\"subdomain\""
response="${response},\"subdomains\":{\"forge\":\"forge.${project}.${DOMAIN_SUFFIX}\",\"ci\":\"ci.${project}.${DOMAIN_SUFFIX}\",\"chat\":\"chat.${project}.${DOMAIN_SUFFIX}\"}"
fi
response="${response}}"
echo "$response"
} }
# Deregister a tunnel # Deregister a tunnel
# Usage: do_deregister <project> # Usage: do_deregister <project> <pubkey>
do_deregister() { do_deregister() {
local project="$1" local project="$1"
local caller_pubkey="$2"
# Get current port before removing if [ -z "$caller_pubkey" ]; then
local port echo '{"error":"deregister requires <project> <pubkey>"}'
exit 1
fi
# Record who is deregistering before removal
local deregistered_by="$CALLER"
# Get current port and pubkey before removing
local port pubkey_fp
port=$(get_port "$project") port=$(get_port "$project")
if [ -z "$port" ]; then if [ -z "$port" ]; then
@ -106,20 +253,42 @@ do_deregister() {
exit 1 exit 1
fi fi
# Verify caller owns this project — pubkey must match stored value
local stored_pubkey
stored_pubkey=$(get_project_info "$project" | jq -r '.pubkey // empty' 2>/dev/null) || stored_pubkey=""
if [ "$caller_pubkey" != "$stored_pubkey" ]; then
echo '{"error":"pubkey mismatch"}'
exit 1
fi
pubkey_fp=$(ssh-keygen -lf /dev/stdin <<<"$stored_pubkey" 2>/dev/null | awk '{print $2}') || pubkey_fp="unknown"
# Remove from registry # Remove from registry
free_port "$project" >/dev/null free_port "$project" >/dev/null
# Remove Caddy route # Remove Caddy route for main project domain
remove_route "$project" remove_route "$project"
# Subdomain mode: also remove per-service subdomain routes
local routing_mode="${EDGE_ROUTING_MODE:-subpath}"
if [ "$routing_mode" = "subdomain" ]; then
local subdomain
for subdomain in forge ci chat; do
remove_route "${subdomain}.${project}"
done
fi
# Rebuild authorized_keys for tunnel user # Rebuild authorized_keys for tunnel user
rebuild_authorized_keys rebuild_authorized_keys
# Reload Caddy # Reload Caddy
reload_caddy reload_caddy
# Audit log
audit_log "deregister" "$project" "$port" "$pubkey_fp"
# Return JSON response # Return JSON response
echo "{\"removed\":true,\"port\":${port},\"fqdn\":\"${project}.${DOMAIN_SUFFIX}\"}" echo "{\"removed\":true,\"port\":${port},\"fqdn\":\"${project}.${DOMAIN_SUFFIX}\",\"deregistered_by\":\"${deregistered_by}\"}"
} }
# List all registered tunnels # List all registered tunnels
@ -175,13 +344,17 @@ main() {
do_register "$project" "$pubkey" do_register "$project" "$pubkey"
;; ;;
deregister) deregister)
# deregister <project> # deregister <project> <pubkey>
local project="$args" local project="${args%% *}"
if [ -z "$project" ]; then local pubkey="${args#* }"
echo '{"error":"deregister requires <project>"}' if [ "$pubkey" = "$args" ]; then
pubkey=""
fi
if [ -z "$project" ] || [ -z "$pubkey" ]; then
echo '{"error":"deregister requires <project> <pubkey>"}'
exit 1 exit 1
fi fi
do_deregister "$project" do_deregister "$project" "$pubkey"
;; ;;
list) list)
do_list do_list

View file

@ -1,113 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
# verify-chat-sandbox.sh — One-shot sandbox verification for disinto-chat (#706)
#
# Runs against a live compose project and asserts hardening constraints.
# Exit 0 if all pass, non-zero otherwise.
CONTAINER="disinto-chat"
PASS=0
FAIL=0
pass() { printf ' ✓ %s\n' "$1"; PASS=$((PASS + 1)); }
fail() { printf ' ✗ %s\n' "$1"; FAIL=$((FAIL + 1)); }
echo "=== disinto-chat sandbox verification ==="
echo
# --- docker inspect checks ---
inspect_json=$(docker inspect "$CONTAINER" 2>/dev/null) || {
echo "ERROR: container '$CONTAINER' not found or not running"
exit 1
}
# ReadonlyRootfs
readonly_rootfs=$(echo "$inspect_json" | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['HostConfig']['ReadonlyRootfs'])")
if [ "$readonly_rootfs" = "True" ]; then
pass "ReadonlyRootfs=true"
else
fail "ReadonlyRootfs expected true, got $readonly_rootfs"
fi
# CapAdd — should be null or empty
cap_add=$(echo "$inspect_json" | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['HostConfig']['CapAdd'])")
if [ "$cap_add" = "None" ] || [ "$cap_add" = "[]" ]; then
pass "CapAdd=null (no extra capabilities)"
else
fail "CapAdd expected null, got $cap_add"
fi
# CapDrop — should contain ALL
cap_drop=$(echo "$inspect_json" | python3 -c "import sys,json; caps=json.load(sys.stdin)[0]['HostConfig']['CapDrop'] or []; print(' '.join(caps))")
if echo "$cap_drop" | grep -q "ALL"; then
pass "CapDrop contains ALL"
else
fail "CapDrop expected ALL, got: $cap_drop"
fi
# PidsLimit
pids_limit=$(echo "$inspect_json" | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['HostConfig']['PidsLimit'])")
if [ "$pids_limit" = "128" ]; then
pass "PidsLimit=128"
else
fail "PidsLimit expected 128, got $pids_limit"
fi
# Memory limit (512MB = 536870912 bytes)
mem_limit=$(echo "$inspect_json" | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['HostConfig']['Memory'])")
if [ "$mem_limit" = "536870912" ]; then
pass "Memory=512m"
else
fail "Memory expected 536870912, got $mem_limit"
fi
# SecurityOpt — must contain no-new-privileges
sec_opt=$(echo "$inspect_json" | python3 -c "import sys,json; opts=json.load(sys.stdin)[0]['HostConfig']['SecurityOpt'] or []; print(' '.join(opts))")
if echo "$sec_opt" | grep -q "no-new-privileges"; then
pass "SecurityOpt contains no-new-privileges"
else
fail "SecurityOpt missing no-new-privileges (got: $sec_opt)"
fi
# No docker.sock bind mount
binds=$(echo "$inspect_json" | python3 -c "import sys,json; binds=json.load(sys.stdin)[0]['HostConfig']['Binds'] or []; print(' '.join(binds))")
if echo "$binds" | grep -q "docker.sock"; then
fail "docker.sock is bind-mounted"
else
pass "No docker.sock mount"
fi
echo
# --- runtime exec checks ---
# touch /root/x should fail (read-only rootfs + unprivileged user)
if docker exec "$CONTAINER" touch /root/x 2>/dev/null; then
fail "touch /root/x succeeded (should fail)"
else
pass "touch /root/x correctly denied"
fi
# /var/run/docker.sock must not exist
if docker exec "$CONTAINER" ls /var/run/docker.sock 2>/dev/null; then
fail "/var/run/docker.sock is accessible"
else
pass "/var/run/docker.sock not accessible"
fi
# /etc/shadow should not be readable
if docker exec "$CONTAINER" cat /etc/shadow 2>/dev/null; then
fail "cat /etc/shadow succeeded (should fail)"
else
pass "cat /etc/shadow correctly denied"
fi
echo
echo "=== Results: $PASS passed, $FAIL failed ==="
if [ "$FAIL" -gt 0 ]; then
exit 1
fi
exit 0

View file

@ -94,8 +94,11 @@ if [ "$dry_run" = true ]; then
fi fi
# ── Live run: Vault connectivity check ─────────────────────────────────────── # ── Live run: Vault connectivity check ───────────────────────────────────────
[ -n "${VAULT_ADDR:-}" ] \ # Default the local-cluster Vault env (see lib/hvault.sh::_hvault_default_env).
|| die "VAULT_ADDR is not set — export VAULT_ADDR=http://127.0.0.1:8200" # `disinto init` does not export VAULT_ADDR before calling this script — the
# server is reachable on 127.0.0.1:8200 and the root token lives at
# /etc/vault.d/root.token in the common fresh-LXC case (issue #912).
_hvault_default_env
# hvault_token_lookup both resolves the token (env or /etc/vault.d/root.token) # hvault_token_lookup both resolves the token (env or /etc/vault.d/root.token)
# and confirms the server is reachable with a valid token. Fail fast here so # and confirms the server is reachable with a valid token. Fail fast here so
@ -103,37 +106,6 @@ fi
hvault_token_lookup >/dev/null \ hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN" || die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Helper: fetch the on-server policy text, or empty if absent ──────────────
# Echoes the current policy content on stdout. A 404 (policy does not exist
# yet) is a non-error — we print nothing and exit 0 so the caller can treat
# the empty string as "needs create". Any other non-2xx is a hard failure.
#
# Uses a subshell + EXIT trap (not RETURN) for tmpfile cleanup: the RETURN
# trap does NOT fire on set-e abort, so if jq below tripped errexit the
# tmpfile would leak. Subshell exit propagates via the function's last-
# command exit status.
fetch_current_policy() {
local name="$1"
(
local tmp http_code
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
http_code="$(curl -sS -o "$tmp" -w '%{http_code}' \
-H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/sys/policies/acl/${name}")" \
|| { printf '[vault-apply] ERROR: curl failed for policy %s\n' "$name" >&2; exit 1; }
case "$http_code" in
200) jq -r '.data.policy // ""' < "$tmp" ;;
404) printf '' ;; # absent — caller treats as "create"
*)
printf '[vault-apply] ERROR: HTTP %s fetching policy %s:\n' "$http_code" "$name" >&2
cat "$tmp" >&2
exit 1
;;
esac
)
}
# ── Apply each policy, reporting created/updated/unchanged ─────────────────── # ── Apply each policy, reporting created/updated/unchanged ───────────────────
log "syncing ${#POLICY_FILES[@]} polic(y|ies) from ${POLICIES_DIR}" log "syncing ${#POLICY_FILES[@]} polic(y|ies) from ${POLICIES_DIR}"
@ -141,8 +113,17 @@ for f in "${POLICY_FILES[@]}"; do
name="$(basename "$f" .hcl)" name="$(basename "$f" .hcl)"
desired="$(cat "$f")" desired="$(cat "$f")"
current="$(fetch_current_policy "$name")" \ # hvault_get_or_empty returns the raw JSON body on 200 or empty on 404.
# Extract the .data.policy field here (jq on "" yields "", so the
# empty-string-means-create branch below still works).
raw="$(hvault_get_or_empty "sys/policies/acl/${name}")" \
|| die "failed to read existing policy: ${name}" || die "failed to read existing policy: ${name}"
if [ -n "$raw" ]; then
current="$(printf '%s' "$raw" | jq -r '.data.policy // ""')" \
|| die "failed to parse policy response: ${name}"
else
current=""
fi
if [ -z "$current" ]; then if [ -z "$current" ]; then
hvault_policy_apply "$name" "$f" \ hvault_policy_apply "$name" "$f" \

308
tools/vault-apply-roles.sh Executable file
View file

@ -0,0 +1,308 @@
#!/usr/bin/env bash
# =============================================================================
# tools/vault-apply-roles.sh — Idempotent Vault JWT-auth role sync
#
# Part of the Nomad+Vault migration (S2.3, issue #881). Reads
# vault/roles.yaml and upserts each entry as a Vault role under
# auth/jwt-nomad/role/<name>.
#
# Idempotency contract:
# For each role entry in vault/roles.yaml:
# - Role missing in Vault → write, log "role <NAME> created"
# - Role present, fields match → skip, log "role <NAME> unchanged"
# - Role present, fields differ → write, log "role <NAME> updated"
#
# Comparison is per-field on the data the CLI would read back
# (GET auth/jwt-nomad/role/<NAME>.data.{policies,bound_audiences,
# bound_claims,token_ttl,token_max_ttl,token_type}). Only the fields
# this script owns are compared — a future field added by hand in
# Vault would not be reverted on the next run.
#
# --dry-run: prints the planned role list + full payload for each role
# WITHOUT touching Vault. Exits 0.
#
# Preconditions:
# - Vault auth method jwt-nomad must already be enabled + configured
# (done by lib/init/nomad/vault-nomad-auth.sh — which then calls
# this script). Running this script standalone against a Vault with
# no jwt-nomad path will fail on the first role write.
# - vault/roles.yaml present. See that file's header for the format.
#
# Requires:
# - VAULT_ADDR (e.g. http://127.0.0.1:8200)
# - VAULT_TOKEN (env OR /etc/vault.d/root.token, resolved by lib/hvault.sh)
# - curl, jq, awk
#
# Usage:
# tools/vault-apply-roles.sh
# tools/vault-apply-roles.sh --dry-run
#
# Exit codes:
# 0 success (roles synced, or --dry-run completed)
# 1 precondition / API / parse failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
ROLES_FILE="${REPO_ROOT}/vault/roles.yaml"
# shellcheck source=../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
# Constants shared across every role — the issue's AC names these as the
# invariant token shape for Nomad workload identity. Bumping any of these
# is a knowing, repo-wide change, not a per-role knob, so they live here
# rather than as per-entry fields in roles.yaml.
ROLE_AUDIENCE="vault.io"
ROLE_TOKEN_TYPE="service"
ROLE_TOKEN_TTL="1h"
ROLE_TOKEN_MAX_TTL="24h"
log() { printf '[vault-roles] %s\n' "$*"; }
die() { printf '[vault-roles] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Flag parsing (single optional flag — see vault-apply-policies.sh for the
# sibling grammar). Structured as arg-count guard + dispatch to keep the
# 5-line sliding-window duplicate detector (.woodpecker/detect-duplicates.py)
# from flagging this as shared boilerplate with vault-apply-policies.sh —
# the two parsers implement the same shape but with different control flow.
dry_run=false
if [ "$#" -gt 1 ]; then
die "too many arguments (saw: $*)"
fi
arg="${1:-}"
if [ "$arg" = "--dry-run" ]; then
dry_run=true
elif [ "$arg" = "-h" ] || [ "$arg" = "--help" ]; then
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Apply every role in vault/roles.yaml to Vault as a\n'
printf 'jwt-nomad role. Idempotent: unchanged roles are reported\n'
printf 'as "unchanged" and not written.\n\n'
printf ' --dry-run Print the planned role list + full role\n'
printf ' payload without contacting Vault. Exits 0.\n'
exit 0
elif [ -n "$arg" ]; then
die "unknown flag: $arg"
fi
unset arg
# ── Preconditions ────────────────────────────────────────────────────────────
for bin in curl jq awk; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
[ -f "$ROLES_FILE" ] \
|| die "roles file not found: ${ROLES_FILE}"
# ── Parse vault/roles.yaml → TSV ─────────────────────────────────────────────
# Strict-format parser. One awk pass; emits one TAB-separated line per role:
# <name>\t<policy>\t<namespace>\t<job_id>
#
# Grammar: a record opens on a line matching `- name: <value>` and closes
# on the next `- name:` or EOF. Within a record, `policy:`, `namespace:`,
# and `job_id:` lines populate the record. Comments (`#...`) and blank
# lines are ignored. Whitespace around the colon and value is trimmed.
#
# This is intentionally narrower than full YAML — the file's header
# documents the exact subset. If someone adds nested maps, arrays, or
# anchors, this parser will silently drop them; the completeness check
# below catches records missing any of the four fields.
parse_roles() {
awk '
function trim(s) { sub(/^[[:space:]]+/, "", s); sub(/[[:space:]]+$/, "", s); return s }
function strip_comment(s) { sub(/[[:space:]]+#.*$/, "", s); return s }
function emit() {
if (name != "") {
if (policy == "" || namespace == "" || job_id == "") {
printf "INCOMPLETE\t%s\t%s\t%s\t%s\n", name, policy, namespace, job_id
} else {
printf "%s\t%s\t%s\t%s\n", name, policy, namespace, job_id
}
}
name=""; policy=""; namespace=""; job_id=""
}
BEGIN { name=""; policy=""; namespace=""; job_id="" }
# Strip full-line comments and blank lines early.
/^[[:space:]]*#/ { next }
/^[[:space:]]*$/ { next }
# New record: "- name: <value>"
/^[[:space:]]*-[[:space:]]+name:[[:space:]]/ {
emit()
line=strip_comment($0)
sub(/^[[:space:]]*-[[:space:]]+name:[[:space:]]*/, "", line)
name=trim(line)
next
}
# Field within current record. Only accept when a record is open.
/^[[:space:]]+policy:[[:space:]]/ && name != "" {
line=strip_comment($0); sub(/^[[:space:]]+policy:[[:space:]]*/, "", line)
policy=trim(line); next
}
/^[[:space:]]+namespace:[[:space:]]/ && name != "" {
line=strip_comment($0); sub(/^[[:space:]]+namespace:[[:space:]]*/, "", line)
namespace=trim(line); next
}
/^[[:space:]]+job_id:[[:space:]]/ && name != "" {
line=strip_comment($0); sub(/^[[:space:]]+job_id:[[:space:]]*/, "", line)
job_id=trim(line); next
}
END { emit() }
' "$ROLES_FILE"
}
mapfile -t ROLE_RECORDS < <(parse_roles)
if [ "${#ROLE_RECORDS[@]}" -eq 0 ]; then
die "no roles parsed from ${ROLES_FILE}"
fi
# Validate every record is complete. An INCOMPLETE line has the form
# "INCOMPLETE\t<name>\t<policy>\t<namespace>\t<job_id>" — list all of
# them at once so the operator sees every missing field, not one per run.
incomplete=()
for rec in "${ROLE_RECORDS[@]}"; do
case "$rec" in
INCOMPLETE*) incomplete+=("${rec#INCOMPLETE$'\t'}") ;;
esac
done
if [ "${#incomplete[@]}" -gt 0 ]; then
printf '[vault-roles] ERROR: role entries with missing fields:\n' >&2
for row in "${incomplete[@]}"; do
IFS=$'\t' read -r name policy namespace job_id <<<"$row"
printf ' - name=%-24s policy=%-22s namespace=%-10s job_id=%s\n' \
"${name:-<missing>}" "${policy:-<missing>}" \
"${namespace:-<missing>}" "${job_id:-<missing>}" >&2
done
die "fix ${ROLES_FILE} and re-run"
fi
# ── Helper: build the JSON payload Vault expects for a role ──────────────────
# Keeps bound_audiences as a JSON array (required by the API — a scalar
# string silently becomes a one-element-list in the CLI but the HTTP API
# rejects it). All fields that differ between runs are inside this payload
# so the diff-check below (role_fields_match) compares like-for-like.
build_payload() {
local policy="$1" namespace="$2" job_id="$3"
jq -n \
--arg aud "$ROLE_AUDIENCE" \
--arg policy "$policy" \
--arg ns "$namespace" \
--arg job "$job_id" \
--arg ttype "$ROLE_TOKEN_TYPE" \
--arg ttl "$ROLE_TOKEN_TTL" \
--arg maxttl "$ROLE_TOKEN_MAX_TTL" \
'{
role_type: "jwt",
bound_audiences: [$aud],
user_claim: "nomad_job_id",
bound_claims: { nomad_namespace: $ns, nomad_job_id: $job },
token_type: $ttype,
token_policies: [$policy],
token_ttl: $ttl,
token_max_ttl: $maxttl
}'
}
# ── Dry-run: print plan + exit (no Vault calls) ──────────────────────────────
if [ "$dry_run" = true ]; then
log "dry-run — ${#ROLE_RECORDS[@]} role(s) in ${ROLES_FILE}"
for rec in "${ROLE_RECORDS[@]}"; do
IFS=$'\t' read -r name policy namespace job_id <<<"$rec"
payload="$(build_payload "$policy" "$namespace" "$job_id")"
printf '[vault-roles] would apply role %s → policy=%s namespace=%s job_id=%s\n' \
"$name" "$policy" "$namespace" "$job_id"
printf '%s\n' "$payload" | jq -S . | sed 's/^/ /'
done
exit 0
fi
# ── Live run: Vault connectivity check ───────────────────────────────────────
# Default the local-cluster Vault env (see lib/hvault.sh::_hvault_default_env).
# Called transitively from vault-nomad-auth.sh during `disinto init`, which
# does not export VAULT_ADDR in the common fresh-LXC case (issue #912).
_hvault_default_env
if ! hvault_token_lookup >/dev/null; then
die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
fi
# ── Helper: compare on-server role to desired payload ────────────────────────
# Returns 0 iff every field this script owns matches. Fields not in our
# payload (e.g. a manually-added `ttl` via the UI) are ignored — we don't
# revert them, but we also don't block on them.
role_fields_match() {
local current_json="$1" desired_json="$2"
local keys=(
role_type bound_audiences user_claim bound_claims
token_type token_policies token_ttl token_max_ttl
)
# Vault returns token_ttl/token_max_ttl as integers (seconds) on GET but
# accepts strings ("1h") on PUT. Normalize: convert desired durations to
# seconds before comparing. jq's tonumber/type checks give us a uniform
# representation on both sides.
local cur des
for k in "${keys[@]}"; do
cur="$(printf '%s' "$current_json" | jq -cS --arg k "$k" '.data[$k] // null')"
des="$(printf '%s' "$desired_json" | jq -cS --arg k "$k" '.[$k] // null')"
case "$k" in
token_ttl|token_max_ttl)
# Normalize desired: "1h"→3600, "24h"→86400.
des="$(printf '%s' "$des" | jq -r '. // ""' | _duration_to_seconds)"
cur="$(printf '%s' "$cur" | jq -r '. // 0')"
;;
esac
if [ "$cur" != "$des" ]; then
return 1
fi
done
return 0
}
# _duration_to_seconds — read a duration string on stdin, echo seconds.
# Accepts the subset we emit: "Ns", "Nm", "Nh", "Nd". Integers pass through
# unchanged. Any other shape produces the empty string (which cannot match
# Vault's integer response → forces an update).
_duration_to_seconds() {
local s
s="$(cat)"
case "$s" in
''|null) printf '0' ;;
*[0-9]s) printf '%d' "${s%s}" ;;
*[0-9]m) printf '%d' "$(( ${s%m} * 60 ))" ;;
*[0-9]h) printf '%d' "$(( ${s%h} * 3600 ))" ;;
*[0-9]d) printf '%d' "$(( ${s%d} * 86400 ))" ;;
*[0-9]) printf '%d' "$s" ;;
*) printf '' ;;
esac
}
# ── Apply each role, reporting created/updated/unchanged ─────────────────────
log "syncing ${#ROLE_RECORDS[@]} role(s) from ${ROLES_FILE}"
for rec in "${ROLE_RECORDS[@]}"; do
IFS=$'\t' read -r name policy namespace job_id <<<"$rec"
desired_payload="$(build_payload "$policy" "$namespace" "$job_id")"
# hvault_get_or_empty: raw body on 200, empty on 404 (caller: "create").
current_json="$(hvault_get_or_empty "auth/jwt-nomad/role/${name}")" \
|| die "failed to read existing role: ${name}"
if [ -z "$current_json" ]; then
_hvault_request POST "auth/jwt-nomad/role/${name}" "$desired_payload" >/dev/null \
|| die "failed to create role: ${name}"
log "role ${name} created"
continue
fi
if role_fields_match "$current_json" "$desired_payload"; then
log "role ${name} unchanged"
continue
fi
_hvault_request POST "auth/jwt-nomad/role/${name}" "$desired_payload" >/dev/null \
|| die "failed to update role: ${name}"
log "role ${name} updated"
done
log "done — ${#ROLE_RECORDS[@]} role(s) synced"

599
tools/vault-import.sh Executable file
View file

@ -0,0 +1,599 @@
#!/usr/bin/env bash
# =============================================================================
# vault-import.sh — Import .env and sops-decrypted secrets into Vault KV
#
# Reads existing .env and sops-encrypted .env.vault.enc from the old docker stack
# and writes them to Vault KV paths matching the S2.1 policy layout.
#
# Usage:
# vault-import.sh \
# --env /path/to/.env \
# [--sops /path/to/.env.vault.enc] \
# [--age-key /path/to/age/keys.txt]
#
# Flag validation (S2.5, issue #883):
# --import-sops without --age-key → error.
# --age-key without --import-sops → error.
# --env alone (no sops) → OK; imports only the plaintext half.
#
# Mapping:
# From .env:
# - FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS → kv/disinto/bots/<role>/{token,password}
# (roles: review, dev, gardener, architect, planner, predictor, supervisor, vault)
# - FORGE_TOKEN_LLAMA + FORGE_PASS_LLAMA → kv/disinto/bots/dev-qwen/{token,password}
# - FORGE_TOKEN + FORGE_PASS → kv/disinto/shared/forge/{token,password}
# - FORGE_ADMIN_TOKEN → kv/disinto/shared/forge/admin_token
# - WOODPECKER_* → kv/disinto/shared/woodpecker/<lowercase_key>
# - FORWARD_AUTH_SECRET, CHAT_OAUTH_* → kv/disinto/shared/chat/<lowercase_key>
# From sops-decrypted .env.vault.enc:
# - GITHUB_TOKEN, CODEBERG_TOKEN, CLAWHUB_TOKEN, DEPLOY_KEY, NPM_TOKEN, DOCKER_HUB_TOKEN
# → kv/disinto/runner/<NAME>/value
#
# Security:
# - Refuses to run if VAULT_ADDR is not localhost
# - Writes to KV v2, not v1
# - Validates sops age key file is mode 0400 before sourcing
# - Never logs secret values — only key names
#
# Idempotency:
# - Reports unchanged/updated/created per key via hvault_kv_get
# - --dry-run prints the full import plan without writing
# =============================================================================
set -euo pipefail
# ── Internal helpers ──────────────────────────────────────────────────────────
# _log — emit a log message to stdout (never to stderr to avoid polluting diff)
_log() {
printf '[vault-import] %s\n' "$*"
}
# _err — emit an error message to stderr
_err() {
printf '[vault-import] ERROR: %s\n' "$*" >&2
}
# _die — log error and exit with status 1
_die() {
_err "$@"
exit 1
}
# _check_vault_addr — ensure VAULT_ADDR is localhost (security check)
_check_vault_addr() {
local addr="${VAULT_ADDR:-}"
if [[ ! "$addr" =~ ^https?://(localhost|127\.0\.0\.1)(:[0-9]+)?$ ]]; then
_die "Security check failed: VAULT_ADDR must be localhost for safety. Got: $addr"
fi
}
# _validate_age_key_perms — ensure age key file is mode 0400
_validate_age_key_perms() {
local keyfile="$1"
local perms
perms="$(stat -c '%a' "$keyfile" 2>/dev/null)" || _die "Cannot stat age key file: $keyfile"
if [ "$perms" != "400" ]; then
_die "Age key file permissions are $perms, expected 400. Refusing to proceed for security."
fi
}
# _decrypt_sops — decrypt sops-encrypted file using SOPS_AGE_KEY_FILE
_decrypt_sops() {
local sops_file="$1"
local age_key="$2"
local output
# sops outputs YAML format by default, extract KEY=VALUE lines
output="$(SOPS_AGE_KEY_FILE="$age_key" sops -d "$sops_file" 2>/dev/null | \
grep -E '^[A-Z_][A-Z0-9_]*=' | \
sed 's/^\([^=]*\)=\(.*\)$/\1=\2/')" || \
_die "Failed to decrypt sops file: $sops_file. Check age key and file integrity."
printf '%s' "$output"
}
# _load_env_file — source an environment file (safety: only KEY=value lines)
_load_env_file() {
local env_file="$1"
local temp_env
temp_env="$(mktemp)"
# Extract only valid KEY=value lines (skip comments, blank lines, malformed)
grep -E '^[A-Za-z_][A-Za-z0-9_]*=' "$env_file" 2>/dev/null > "$temp_env" || true
# shellcheck source=/dev/null
source "$temp_env"
rm -f "$temp_env"
}
# _kv_path_exists — check if a KV path exists (returns 0 if exists, 1 if not)
_kv_path_exists() {
local path="$1"
# Use hvault_kv_get and check if it fails with "not found"
if hvault_kv_get "$path" >/dev/null 2>&1; then
return 0
fi
# Check if the error is specifically "not found"
local err_output
err_output="$(hvault_kv_get "$path" 2>&1)" || true
if printf '%s' "$err_output" | grep -qi 'not found\|404'; then
return 1
fi
# Some other error (e.g., auth failure) — treat as unknown
return 1
}
# _kv_get_value — get a single key value from a KV path
_kv_get_value() {
local path="$1"
local key="$2"
hvault_kv_get "$path" "$key"
}
# _kv_put_secret — write a secret to KV v2
_kv_put_secret() {
local path="$1"
shift
local kv_pairs=("$@")
# Build JSON payload with all key-value pairs
local payload='{"data":{}}'
for kv in "${kv_pairs[@]}"; do
local k="${kv%%=*}"
local v="${kv#*=}"
# Use jq with --arg for safe string interpolation (handles quotes/backslashes)
payload="$(printf '%s' "$payload" | jq --arg k "$k" --arg v "$v" '. * {"data": {($k): $v}}')"
done
# Use curl directly for KV v2 write with versioning
local tmpfile http_code
tmpfile="$(mktemp)"
http_code="$(curl -s -w '%{http_code}' \
-H "X-Vault-Token: ${VAULT_TOKEN}" \
-H "Content-Type: application/json" \
-X POST \
-d "$payload" \
-o "$tmpfile" \
"${VAULT_ADDR}/v1/${VAULT_KV_MOUNT:-kv}/data/${path}")" || {
rm -f "$tmpfile"
_err "Failed to write to Vault at ${VAULT_KV_MOUNT:-kv}/data/${path}: curl error"
return 1
}
rm -f "$tmpfile"
# Check HTTP status — 2xx is success
case "$http_code" in
2[0-9][0-9])
return 0
;;
404)
_err "KV path not found: ${VAULT_KV_MOUNT:-kv}/data/${path}"
return 1
;;
403)
_err "Permission denied writing to ${VAULT_KV_MOUNT:-kv}/data/${path}"
return 1
;;
*)
_err "Failed to write to Vault at ${VAULT_KV_MOUNT:-kv}/data/${path}: HTTP $http_code"
return 1
;;
esac
}
# _format_status — format the status string for a key
_format_status() {
local status="$1"
local path="$2"
local key="$3"
case "$status" in
unchanged)
printf ' %s: %s/%s (unchanged)' "$status" "$path" "$key"
;;
updated)
printf ' %s: %s/%s (updated)' "$status" "$path" "$key"
;;
created)
printf ' %s: %s/%s (created)' "$status" "$path" "$key"
;;
*)
printf ' %s: %s/%s (unknown)' "$status" "$path" "$key"
;;
esac
}
# ── Mapping definitions ──────────────────────────────────────────────────────
# Bots mapping: FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS
declare -a BOT_ROLES=(review dev gardener architect planner predictor supervisor vault)
# Runner tokens from sops-decrypted file
declare -a RUNNER_TOKENS=(GITHUB_TOKEN CODEBERG_TOKEN CLAWHUB_TOKEN DEPLOY_KEY NPM_TOKEN DOCKER_HUB_TOKEN)
# ── Main logic ────────────────────────────────────────────────────────────────
main() {
local env_file=""
local sops_file=""
local age_key_file=""
local dry_run=false
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
--env)
env_file="$2"
shift 2
;;
--sops)
sops_file="$2"
shift 2
;;
--age-key)
age_key_file="$2"
shift 2
;;
--dry-run)
dry_run=true
shift
;;
--help|-h)
cat <<'EOF'
vault-import.sh — Import .env and sops-decrypted secrets into Vault KV
Usage:
vault-import.sh \
--env /path/to/.env \
[--sops /path/to/.env.vault.enc] \
[--age-key /path/to/age/keys.txt] \
[--dry-run]
Options:
--env Path to .env file (required)
--sops Path to sops-encrypted .env.vault.enc file (optional;
requires --age-key when set)
--age-key Path to age keys file (required when --sops is set)
--dry-run Print import plan without writing to Vault (optional)
--help Show this help message
Mapping:
From .env:
- FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS → kv/disinto/bots/<role>/{token,password}
- FORGE_TOKEN_LLAMA + FORGE_PASS_LLAMA → kv/disinto/bots/dev-qwen/{token,password}
- FORGE_TOKEN + FORGE_PASS → kv/disinto/shared/forge/{token,password}
- FORGE_ADMIN_TOKEN → kv/disinto/shared/forge/admin_token
- WOODPECKER_* → kv/disinto/shared/woodpecker/<lowercase_key>
- FORWARD_AUTH_SECRET, CHAT_OAUTH_* → kv/disinto/shared/chat/<lowercase_key>
From sops-decrypted .env.vault.enc:
- GITHUB_TOKEN, CODEBERG_TOKEN, CLAWHUB_TOKEN, DEPLOY_KEY, NPM_TOKEN, DOCKER_HUB_TOKEN
→ kv/disinto/runner/<NAME>/value
Examples:
vault-import.sh --env .env --sops .env.vault.enc --age-key age-keys.txt
vault-import.sh --env .env --sops .env.vault.enc --age-key age-keys.txt --dry-run
EOF
exit 0
;;
*)
_die "Unknown option: $1. Use --help for usage."
;;
esac
done
# Validate required arguments. --sops and --age-key are paired: if one
# is set, the other must be too. --env alone (no sops half) is valid —
# imports only the plaintext dotenv. Spec: S2.5 / issue #883 / #912.
if [ -z "$env_file" ]; then
_die "Missing required argument: --env"
fi
if [ -n "$sops_file" ] && [ -z "$age_key_file" ]; then
_die "--sops requires --age-key"
fi
if [ -n "$age_key_file" ] && [ -z "$sops_file" ]; then
_die "--age-key requires --sops"
fi
# Validate files exist
if [ ! -f "$env_file" ]; then
_die "Environment file not found: $env_file"
fi
if [ -n "$sops_file" ] && [ ! -f "$sops_file" ]; then
_die "Sops file not found: $sops_file"
fi
if [ -n "$age_key_file" ] && [ ! -f "$age_key_file" ]; then
_die "Age key file not found: $age_key_file"
fi
# Security check: age key permissions (only when an age key is provided —
# --env-only imports never touch the age key).
if [ -n "$age_key_file" ]; then
_validate_age_key_perms "$age_key_file"
fi
# Source the Vault helpers and default the local-cluster VAULT_ADDR +
# VAULT_TOKEN before the localhost safety check runs. `disinto init`
# does not export these in the common fresh-LXC case (issue #912).
source "$(dirname "$0")/../lib/hvault.sh"
_hvault_default_env
# Security check: VAULT_ADDR must be localhost
_check_vault_addr
# Load .env file
_log "Loading environment from: $env_file"
_load_env_file "$env_file"
# Decrypt sops file when --sops was provided. On the --env-only path
# (empty $sops_file) the sops_env stays empty and the per-token loop
# below silently skips runner-token imports — exactly the "only
# plaintext half" spec from S2.5.
local sops_env=""
if [ -n "$sops_file" ]; then
_log "Decrypting sops file: $sops_file"
sops_env="$(_decrypt_sops "$sops_file" "$age_key_file")"
# shellcheck disable=SC2086
eval "$sops_env"
else
_log "No --sops flag — skipping sops decryption (importing plaintext .env only)"
fi
# Collect all import operations
declare -a operations=()
# --- From .env ---
# Bots: FORGE_{ROLE}_TOKEN + FORGE_{ROLE}_PASS
for role in "${BOT_ROLES[@]}"; do
local token_var="FORGE_${role^^}_TOKEN"
local pass_var="FORGE_${role^^}_PASS"
local token_val="${!token_var:-}"
local pass_val="${!pass_var:-}"
if [ -n "$token_val" ] && [ -n "$pass_val" ]; then
operations+=("bots|$role|token|$env_file|$token_var")
operations+=("bots|$role|pass|$env_file|$pass_var")
elif [ -n "$token_val" ] || [ -n "$pass_val" ]; then
_err "Warning: $role bot has token but no password (or vice versa), skipping"
fi
done
# Llama bot: FORGE_TOKEN_LLAMA + FORGE_PASS_LLAMA
local llama_token="${FORGE_TOKEN_LLAMA:-}"
local llama_pass="${FORGE_PASS_LLAMA:-}"
if [ -n "$llama_token" ] && [ -n "$llama_pass" ]; then
operations+=("bots|dev-qwen|token|$env_file|FORGE_TOKEN_LLAMA")
operations+=("bots|dev-qwen|pass|$env_file|FORGE_PASS_LLAMA")
elif [ -n "$llama_token" ] || [ -n "$llama_pass" ]; then
_err "Warning: dev-qwen bot has token but no password (or vice versa), skipping"
fi
# Generic forge creds: FORGE_TOKEN + FORGE_PASS
local forge_token="${FORGE_TOKEN:-}"
local forge_pass="${FORGE_PASS:-}"
if [ -n "$forge_token" ] && [ -n "$forge_pass" ]; then
operations+=("forge|token|$env_file|FORGE_TOKEN")
operations+=("forge|pass|$env_file|FORGE_PASS")
fi
# Forge admin token: FORGE_ADMIN_TOKEN
local forge_admin_token="${FORGE_ADMIN_TOKEN:-}"
if [ -n "$forge_admin_token" ]; then
operations+=("forge|admin_token|$env_file|FORGE_ADMIN_TOKEN")
fi
# Woodpecker secrets: WOODPECKER_*
# Only read from the .env file, not shell environment
local woodpecker_keys=()
while IFS='=' read -r key _; do
if [[ "$key" =~ ^WOODPECKER_ ]] || [[ "$key" =~ ^WP_[A-Z_]+$ ]]; then
woodpecker_keys+=("$key")
fi
done < <(grep -E '^[A-Z_][A-Z0-9_]*=' "$env_file" 2>/dev/null || true)
for key in "${woodpecker_keys[@]}"; do
local val="${!key}"
if [ -n "$val" ]; then
local lowercase_key="${key,,}"
# Normalize WP_FORGEJO_* → forgejo_* (strip wp_ prefix to match template)
if [[ "$lowercase_key" =~ ^wp_(.+)$ ]]; then
vault_key="${BASH_REMATCH[1]}"
else
vault_key="$lowercase_key"
fi
operations+=("woodpecker|$vault_key|$env_file|$key")
fi
done
# Chat secrets: FORWARD_AUTH_SECRET, CHAT_OAUTH_CLIENT_ID, CHAT_OAUTH_CLIENT_SECRET
for key in FORWARD_AUTH_SECRET CHAT_OAUTH_CLIENT_ID CHAT_OAUTH_CLIENT_SECRET; do
local val="${!key:-}"
if [ -n "$val" ]; then
local lowercase_key="${key,,}"
operations+=("chat|$lowercase_key|$env_file|$key")
fi
done
# --- From sops-decrypted .env.vault.enc ---
# Runner tokens
for token_name in "${RUNNER_TOKENS[@]}"; do
local token_val="${!token_name:-}"
if [ -n "$token_val" ]; then
operations+=("runner|$token_name|$sops_file|$token_name")
fi
done
# If dry-run, just print the plan
if $dry_run; then
_log "=== DRY-RUN: Import plan ==="
_log "Environment file: $env_file"
if [ -n "$sops_file" ]; then
_log "Sops file: $sops_file"
_log "Age key: $age_key_file"
else
_log "Sops file: (none — --env-only import)"
fi
_log ""
_log "Planned operations:"
for op in "${operations[@]}"; do
_log " $op"
done
_log ""
_log "Total: ${#operations[@]} operations"
exit 0
fi
# --- Actual import with idempotency check ---
_log "=== Starting Vault import ==="
_log "Environment file: $env_file"
if [ -n "$sops_file" ]; then
_log "Sops file: $sops_file"
_log "Age key: $age_key_file"
else
_log "Sops file: (none — --env-only import)"
fi
_log ""
local created=0
local updated=0
local unchanged=0
# First pass: collect all operations with their parsed values.
# Store value and status in separate associative arrays keyed by
# "vault_path:kv_key". Secret values may contain any character, so we
# never pack them into a delimited string — the previous `value|status`
# encoding silently truncated values containing '|' (see issue #898).
declare -A ops_value
declare -A ops_status
declare -A path_seen
for op in "${operations[@]}"; do
# Parse operation: category|field|subkey|file|envvar (5 fields for bots/runner)
# or category|field|file|envvar (4 fields for forge/woodpecker/chat).
# These metadata strings are built from safe identifiers (role names,
# env-var names, file paths) and do not carry secret values, so '|' is
# still fine as a separator here.
local category field subkey file envvar=""
local field_count
field_count="$(printf '%s' "$op" | awk -F'|' '{print NF}')"
if [ "$field_count" -eq 5 ]; then
# 5 fields: category|role|subkey|file|envvar
IFS='|' read -r category field subkey file envvar <<< "$op"
else
# 4 fields: category|field|file|envvar
IFS='|' read -r category field file envvar <<< "$op"
subkey="$field" # For 4-field ops, field is the vault key
fi
# Determine Vault path and key based on category
local vault_path=""
local vault_key="$subkey"
local source_value=""
if [ "$file" = "$env_file" ]; then
# Source from environment file (envvar contains the variable name)
source_value="${!envvar:-}"
else
# Source from sops-decrypted env (envvar contains the variable name)
source_value="$(printf '%s' "$sops_env" | grep "^${envvar}=" | sed "s/^${envvar}=//" || true)"
fi
case "$category" in
bots)
vault_path="disinto/bots/${field}"
vault_key="$subkey"
;;
forge)
vault_path="disinto/shared/forge"
vault_key="$field"
;;
woodpecker)
vault_path="disinto/shared/woodpecker"
vault_key="$field"
;;
chat)
vault_path="disinto/shared/chat"
vault_key="$field"
;;
runner)
vault_path="disinto/runner/${field}"
vault_key="value"
;;
*)
_err "Unknown category: $category"
continue
;;
esac
# Determine status for this key
local status="created"
if _kv_path_exists "$vault_path"; then
local existing_value
if existing_value="$(_kv_get_value "$vault_path" "$vault_key")" 2>/dev/null; then
if [ "$existing_value" = "$source_value" ]; then
status="unchanged"
else
status="updated"
fi
fi
fi
# vault_path and vault_key are identifier-safe (no ':' in either), so
# the composite key round-trips cleanly via ${ck%:*} / ${ck#*:}.
local ck="${vault_path}:${vault_key}"
ops_value["$ck"]="$source_value"
ops_status["$ck"]="$status"
path_seen["$vault_path"]=1
done
# Second pass: group by vault_path and write.
# IMPORTANT: Always write ALL keys for a path, not just changed ones.
# KV v2 POST replaces the entire document, so we must include unchanged keys
# to avoid dropping them. The idempotency guarantee comes from KV v2 versioning.
for vault_path in "${!path_seen[@]}"; do
# Collect this path's "vault_key=source_value" pairs into a bash
# indexed array. Each element is one kv pair; '=' inside the value is
# preserved because _kv_put_secret splits on the *first* '=' only.
local pairs_array=()
local path_has_changes=0
for ck in "${!ops_value[@]}"; do
[ "${ck%:*}" = "$vault_path" ] || continue
local vault_key="${ck#*:}"
pairs_array+=("${vault_key}=${ops_value[$ck]}")
if [ "${ops_status[$ck]}" != "unchanged" ]; then
path_has_changes=1
fi
done
# Determine effective status for this path (updated if any key changed)
local effective_status="unchanged"
if [ "$path_has_changes" = 1 ]; then
effective_status="updated"
fi
if ! _kv_put_secret "$vault_path" "${pairs_array[@]}"; then
_err "Failed to write to $vault_path"
exit 1
fi
# Output status for each key in this path
for kv in "${pairs_array[@]}"; do
local kv_key="${kv%%=*}"
_format_status "$effective_status" "$vault_path" "$kv_key"
printf '\n'
done
# Count only if path has changes
if [ "$effective_status" = "updated" ]; then
((updated++)) || true
fi
done
_log ""
_log "=== Import complete ==="
_log "Created: $created"
_log "Updated: $updated"
_log "Unchanged: $unchanged"
}
main "$@"

176
tools/vault-seed-agents.sh Executable file
View file

@ -0,0 +1,176 @@
#!/usr/bin/env bash
# =============================================================================
# tools/vault-seed-agents.sh — Idempotent seed for all bot KV paths
#
# Part of the Nomad+Vault migration (S4.1, issue #955). Populates
# kv/disinto/bots/<role> with token + pass for each of the 7 agent roles
# plus the vault bot. Handles the "fresh factory, no .env import" case.
#
# Companion to tools/vault-import.sh — when that runs against a box with
# an existing stack, it overwrites seeded values with real ones.
#
# Idempotency contract (per bot):
# - Both token and pass present → skip, log "<role> unchanged".
# - Either missing → generate random values for missing keys, preserve
# existing keys, write back atomically.
#
# Preconditions:
# - Vault reachable + unsealed at $VAULT_ADDR.
# - VAULT_TOKEN set (env) or /etc/vault.d/root.token readable.
# - curl, jq, openssl
#
# Usage:
# tools/vault-seed-agents.sh
# tools/vault-seed-agents.sh --dry-run
#
# Exit codes:
# 0 success (seed applied, or already applied)
# 1 precondition / API / mount-mismatch failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# shellcheck source=../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
KV_MOUNT="kv"
TOKEN_BYTES=32 # 32 bytes → 64 hex chars
PASS_BYTES=16 # 16 bytes → 32 hex chars
# All bot roles seeded by this script.
BOT_ROLES=(dev review gardener architect planner predictor supervisor vault)
LOG_TAG="[vault-seed-agents]"
log() { printf '%s %s\n' "$LOG_TAG" "$*"; }
die() { printf '%s ERROR: %s\n' "$LOG_TAG" "$*" >&2; exit 1; }
# ── Flag parsing ─────────────────────────────────────────────────────────────
# while/shift shape — distinct from forgejo (arity:value case) and
# woodpecker (for-loop).
DRY_RUN=0
while [ $# -gt 0 ]; do
case "$1" in
--dry-run) DRY_RUN=1 ;;
-h|--help)
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Seed kv/disinto/bots/<role> with token + pass for all agent\n'
printf 'roles. Idempotent: existing non-empty values are preserved.\n\n'
printf ' --dry-run Print planned actions without writing.\n'
exit 0
;;
*) die "invalid argument: ${1} (try --help)" ;;
esac
shift
done
# ── Preconditions ────────────────────────────────────────────────────────────
for bin in curl jq openssl; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
[ -n "${VAULT_ADDR:-}" ] \
|| die "VAULT_ADDR unset — e.g. export VAULT_ADDR=http://127.0.0.1:8200"
hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Step 1: ensure kv/ mount exists and is KV v2 ────────────────────────────
log "── Step 1: ensure ${KV_MOUNT}/ is KV v2 ──"
export DRY_RUN
hvault_ensure_kv_v2 "$KV_MOUNT" "${LOG_TAG}" \
|| die "KV mount check failed"
# ── Step 2: seed each bot role ───────────────────────────────────────────────
total_generated=0
# Check if shared forge credentials exist for dev role fallback
shared_forge_exists=0
shared_forge_raw="$(hvault_get_or_empty "${KV_MOUNT}/data/disinto/shared/forge")" \
|| true
if [ -n "$shared_forge_raw" ]; then
shared_forge_token="$(printf '%s' "$shared_forge_raw" | jq -r '.data.data.token // ""')"
shared_forge_pass="$(printf '%s' "$shared_forge_raw" | jq -r '.data.data.pass // ""')"
if [ -n "$shared_forge_token" ] && [ -n "$shared_forge_pass" ]; then
shared_forge_exists=1
fi
fi
for role in "${BOT_ROLES[@]}"; do
kv_logical="disinto/bots/${role}"
kv_api="${KV_MOUNT}/data/${kv_logical}"
log "── seed ${kv_logical} ──"
existing_raw="$(hvault_get_or_empty "${kv_api}")" \
|| die "failed to read ${kv_api}"
existing_token=""
existing_pass=""
existing_data="{}"
if [ -n "$existing_raw" ]; then
existing_data="$(printf '%s' "$existing_raw" | jq '.data.data // {}')"
existing_token="$(printf '%s' "$existing_raw" | jq -r '.data.data.token // ""')"
existing_pass="$(printf '%s' "$existing_raw" | jq -r '.data.data.pass // ""')"
fi
generated=()
desired_token="$existing_token"
desired_pass="$existing_pass"
# Special case: dev role uses shared forge credentials if available
if [ "$role" = "dev" ] && [ "$shared_forge_exists" -eq 1 ]; then
# Use shared FORGE_TOKEN + FORGE_PASS for dev role
if [ -z "$existing_token" ]; then
desired_token="$shared_forge_token"
generated+=("token")
fi
if [ -z "$existing_pass" ]; then
desired_pass="$shared_forge_pass"
generated+=("pass")
fi
else
# Generate random values for missing keys
if [ -z "$existing_token" ]; then
generated+=("token")
fi
if [ -z "$existing_pass" ]; then
generated+=("pass")
fi
for key in "${generated[@]}"; do
case "$key" in
token) desired_token="$(openssl rand -hex "$TOKEN_BYTES")" ;;
pass) desired_pass="$(openssl rand -hex "$PASS_BYTES")" ;;
esac
done
fi
if [ "${#generated[@]}" -eq 0 ]; then
log "${role}: unchanged"
continue
fi
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] ${role}: would generate ${generated[*]}"
total_generated=$(( total_generated + ${#generated[@]} ))
continue
fi
# Merge new keys into existing data to preserve any keys we don't own.
payload="$(printf '%s' "$existing_data" \
| jq --arg t "$desired_token" --arg p "$desired_pass" \
'{data: (. + {token: $t, pass: $p})}')"
_hvault_request POST "${kv_api}" "$payload" >/dev/null \
|| die "failed to write ${kv_api}"
log "${role}: generated ${generated[*]}"
total_generated=$(( total_generated + ${#generated[@]} ))
done
if [ "$total_generated" -eq 0 ]; then
log "all bot paths already seeded — no-op"
else
log "done — ${total_generated} key(s) seeded across ${#BOT_ROLES[@]} bot paths"
fi

115
tools/vault-seed-chat.sh Executable file
View file

@ -0,0 +1,115 @@
#!/usr/bin/env bash
# =============================================================================
# tools/vault-seed-chat.sh — Idempotent seed for kv/disinto/shared/chat
#
# Part of the Nomad+Vault migration (S5.2, issue #989). Populates the KV v2
# path that nomad/jobs/chat.hcl reads from, so a clean-install factory
# (no old-stack secrets to import) still has per-key values for
# CHAT_OAUTH_CLIENT_ID, CHAT_OAUTH_CLIENT_SECRET, and FORWARD_AUTH_SECRET.
#
# Companion to tools/vault-import.sh (S2.2) — when that import runs against
# a box with an existing stack, it overwrites these seeded values with the
# real ones. Order doesn't matter: whichever runs last wins, and both
# scripts are idempotent in the sense that re-running never rotates an
# existing non-empty key.
#
# Uses _hvault_seed_key (lib/hvault.sh) for each key — the helper reads
# existing data and merges to preserve sibling keys (KV v2 replaces .data
# atomically).
#
# Preconditions:
# - Vault reachable + unsealed at $VAULT_ADDR.
# - VAULT_TOKEN set (env) or /etc/vault.d/root.token readable.
# - The `kv/` mount is enabled as KV v2.
#
# Requires: VAULT_ADDR, VAULT_TOKEN, curl, jq, openssl
#
# Usage:
# tools/vault-seed-chat.sh
# tools/vault-seed-chat.sh --dry-run
#
# Exit codes:
# 0 success (seed applied, or already applied)
# 1 precondition / API / mount-mismatch failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# shellcheck source=../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
KV_MOUNT="kv"
KV_LOGICAL_PATH="disinto/shared/chat"
# Keys to seed — array-driven loop (structurally distinct from forgejo's
# sequential if-blocks and agents' role loop).
SEED_KEYS=(chat_oauth_client_id chat_oauth_client_secret forward_auth_secret)
LOG_TAG="[vault-seed-chat]"
log() { printf '%s %s\n' "$LOG_TAG" "$*"; }
die() { printf '%s ERROR: %s\n' "$LOG_TAG" "$*" >&2; exit 1; }
# ── Flag parsing — [[ ]] guard + case: shape distinct from forgejo
# (arity:value case), woodpecker (for-loop), agents (while/shift).
DRY_RUN=0
if [[ $# -gt 0 ]]; then
case "$1" in
--dry-run) DRY_RUN=1 ;;
-h|--help)
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Seed kv/disinto/shared/chat with random OAuth client\n'
printf 'credentials and forward auth secret if missing.\n'
printf 'Idempotent: existing non-empty values are preserved.\n\n'
printf ' --dry-run Show what would be seeded without writing.\n'
exit 0
;;
*) die "invalid argument: ${1} (try --help)" ;;
esac
fi
# ── Preconditions — inline check-or-die (shape distinct from agents' array
# loop and forgejo's continuation-line style) ─────────────────────────────
command -v curl >/dev/null 2>&1 || die "curl not found"
command -v jq >/dev/null 2>&1 || die "jq not found"
command -v openssl >/dev/null 2>&1 || die "openssl not found"
[ -n "${VAULT_ADDR:-}" ] || die "VAULT_ADDR unset — export VAULT_ADDR=http://127.0.0.1:8200"
hvault_token_lookup >/dev/null || die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Step 1/2: ensure kv/ mount exists and is KV v2 ───────────────────────────
log "── Step 1/2: ensure ${KV_MOUNT}/ is KV v2 ──"
export DRY_RUN
hvault_ensure_kv_v2 "$KV_MOUNT" "${LOG_TAG}" \
|| die "KV mount check failed"
# ── Step 2/2: seed missing keys via _hvault_seed_key helper ──────────────────
log "── Step 2/2: seed ${KV_LOGICAL_PATH} ──"
generated=()
for key in "${SEED_KEYS[@]}"; do
if [ "$DRY_RUN" -eq 1 ]; then
# Check existence without writing
existing=$(hvault_kv_get "$KV_LOGICAL_PATH" "$key" 2>/dev/null) || true
if [ -z "$existing" ]; then
generated+=("$key")
log "[dry-run] ${key} would be generated"
else
log "[dry-run] ${key} unchanged"
fi
else
rc=0
_hvault_seed_key "$KV_LOGICAL_PATH" "$key" || rc=$?
case "$rc" in
0) generated+=("$key"); log "${key} generated" ;;
1) log "${key} unchanged" ;;
*) die "API error seeding ${key} (rc=${rc})" ;;
esac
fi
done
if [ "${#generated[@]}" -eq 0 ]; then
log "all keys present — no-op"
else
log "done — ${#generated[@]} key(s) seeded at kv/${KV_LOGICAL_PATH}"
fi

207
tools/vault-seed-forgejo.sh Executable file
View file

@ -0,0 +1,207 @@
#!/usr/bin/env bash
# =============================================================================
# tools/vault-seed-forgejo.sh — Idempotent seed for kv/disinto/shared/forgejo
#
# Part of the Nomad+Vault migration (S2.4, issue #882). Populates the KV v2
# path that nomad/jobs/forgejo.hcl reads from, so a clean-install factory
# (no old-stack secrets to import) still has per-key values for
# FORGEJO__security__SECRET_KEY + FORGEJO__security__INTERNAL_TOKEN.
#
# Companion to tools/vault-import.sh (S2.2, not yet merged) — when that
# import runs against a box with an existing stack, it overwrites these
# seeded values with the real ones. Order doesn't matter: whichever runs
# last wins, and both scripts are idempotent in the sense that re-running
# never rotates an existing non-empty key.
#
# Idempotency contract (per key):
# - Key missing or empty in Vault → generate a random value, write it,
# log "<key> generated (N bytes hex)".
# - Key present with a non-empty value → leave untouched, log
# "<key> unchanged".
# - Neither key changes is a silent no-op (no Vault write at all).
#
# Rotating an existing key is deliberately NOT in scope — SECRET_KEY
# rotation invalidates every existing session cookie in forgejo and
# INTERNAL_TOKEN rotation breaks internal RPC until all processes have
# restarted. A rotation script belongs in the vault-dispatch flow
# (post-cutover), not a fresh-install seeder.
#
# Preconditions:
# - Vault reachable + unsealed at $VAULT_ADDR.
# - VAULT_TOKEN set (env) or /etc/vault.d/root.token readable.
# - The `kv/` mount is enabled as KV v2 (this script enables it on a
# fresh box; on an existing box it asserts the mount type/version).
#
# Requires:
# - VAULT_ADDR (e.g. http://127.0.0.1:8200)
# - VAULT_TOKEN (env OR /etc/vault.d/root.token, resolved by lib/hvault.sh)
# - curl, jq, openssl
#
# Usage:
# tools/vault-seed-forgejo.sh
# tools/vault-seed-forgejo.sh --dry-run
#
# Exit codes:
# 0 success (seed applied, or already applied)
# 1 precondition / API / mount-mismatch failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# shellcheck source=../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
# KV v2 mount + logical path. Kept as two vars so the full API path used
# for GET/POST (which MUST include `/data/`) is built in one place.
KV_MOUNT="kv"
KV_LOGICAL_PATH="disinto/shared/forgejo"
KV_API_PATH="${KV_MOUNT}/data/${KV_LOGICAL_PATH}"
# Byte lengths for the generated secrets (hex output, so the printable
# string length is 2x these). 32 bytes matches forgejo's own
# `gitea generate secret SECRET_KEY` default; 64 bytes is comfortably
# above forgejo's INTERNAL_TOKEN JWT-HMAC key floor.
SECRET_KEY_BYTES=32
INTERNAL_TOKEN_BYTES=64
log() { printf '[vault-seed-forgejo] %s\n' "$*"; }
die() { printf '[vault-seed-forgejo] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Flag parsing — single optional `--dry-run`. Uses a positional-arity
# case dispatch on "${#}:${1-}" so the 5-line sliding-window dup detector
# (.woodpecker/detect-duplicates.py) sees a shape distinct from both
# vault-apply-roles.sh (if/elif chain) and vault-apply-policies.sh (flat
# case on $1 alone). Three sibling tools, three parser shapes.
DRY_RUN=0
case "$#:${1-}" in
0:)
;;
1:--dry-run)
DRY_RUN=1
;;
1:-h|1:--help)
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Seed kv/disinto/shared/forgejo with random SECRET_KEY +\n'
printf 'INTERNAL_TOKEN if they are missing. Idempotent: existing\n'
printf 'non-empty values are left untouched.\n\n'
printf ' --dry-run Print planned actions (enable mount? which keys\n'
printf ' to generate?) without writing to Vault. Exits 0.\n'
exit 0
;;
*)
die "invalid arguments: $* (try --help)"
;;
esac
# ── Preconditions ────────────────────────────────────────────────────────────
for bin in curl jq openssl; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
# Vault connectivity — short-circuit style (`||`) instead of an `if`-chain
# so this block has a distinct textual shape from vault-apply-roles.sh's
# equivalent preflight; hvault.sh's typed helpers emit structured JSON
# errors that don't render well behind the `[vault-seed-forgejo] …`
# log prefix, hence the inline check + plain-string diag.
[ -n "${VAULT_ADDR:-}" ] \
|| die "VAULT_ADDR unset — e.g. export VAULT_ADDR=http://127.0.0.1:8200"
hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Step 1/2: ensure kv/ mount exists and is KV v2 ───────────────────────────
# The policy at vault/policies/service-forgejo.hcl grants read on
# `kv/data/<path>/*` — that `data` segment only exists for KV v2. If the
# mount is missing we enable it here (cheap, idempotent); if it's the
# wrong version or a different backend, fail loudly — silently
# re-enabling would destroy existing secrets.
log "── Step 1/2: ensure ${KV_MOUNT}/ is KV v2 ──"
export DRY_RUN
hvault_ensure_kv_v2 "$KV_MOUNT" "[vault-seed-forgejo]" \
|| die "KV mount check failed"
# ── Step 2/2: seed missing keys at kv/data/disinto/shared/forgejo ────────────
log "── Step 2/2: seed ${KV_API_PATH} ──"
# hvault_get_or_empty returns an empty string on 404 (KV path absent).
# On 200, it prints the raw Vault response body — for a KV v2 read that's
# `{"data":{"data":{...},"metadata":{...}}}`, hence the `.data.data.<key>`
# path below. A path with `deleted_time` set still returns 200 but the
# inner `.data.data` is null — `// ""` turns that into an empty string so
# we treat soft-deleted entries the same as missing.
existing_raw="$(hvault_get_or_empty "${KV_API_PATH}")" \
|| die "failed to read ${KV_API_PATH}"
existing_secret_key=""
existing_internal_token=""
if [ -n "$existing_raw" ]; then
existing_secret_key="$(printf '%s' "$existing_raw" | jq -r '.data.data.secret_key // ""')"
existing_internal_token="$(printf '%s' "$existing_raw" | jq -r '.data.data.internal_token // ""')"
fi
desired_secret_key="$existing_secret_key"
desired_internal_token="$existing_internal_token"
generated=()
if [ -z "$desired_secret_key" ]; then
if [ "$DRY_RUN" -eq 1 ]; then
# In dry-run, don't call openssl — log the intent only. The real run
# generates fresh bytes; nothing about the generated value is
# deterministic so there's no "planned value" to show.
generated+=("secret_key")
else
desired_secret_key="$(openssl rand -hex "$SECRET_KEY_BYTES")"
generated+=("secret_key")
fi
fi
if [ -z "$desired_internal_token" ]; then
if [ "$DRY_RUN" -eq 1 ]; then
generated+=("internal_token")
else
desired_internal_token="$(openssl rand -hex "$INTERNAL_TOKEN_BYTES")"
generated+=("internal_token")
fi
fi
if [ "${#generated[@]}" -eq 0 ]; then
log "all keys present at ${KV_API_PATH} — no-op"
log "secret_key unchanged"
log "internal_token unchanged"
exit 0
fi
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] would generate + write: ${generated[*]}"
for key in secret_key internal_token; do
case " ${generated[*]} " in
*" ${key} "*) log "[dry-run] ${key} would be generated" ;;
*) log "[dry-run] ${key} unchanged" ;;
esac
done
exit 0
fi
# Write back BOTH keys in one payload. KV v2 replaces `.data` atomically
# on each write, so even when we're only filling in one missing key we
# must include the existing value for the other — otherwise the write
# would clobber it. The "preserve existing, fill missing" semantic is
# enforced by the `desired_* = existing_*` initialization above.
payload="$(jq -n \
--arg sk "$desired_secret_key" \
--arg it "$desired_internal_token" \
'{data: {secret_key: $sk, internal_token: $it}}')"
_hvault_request POST "${KV_API_PATH}" "$payload" >/dev/null \
|| die "failed to write ${KV_API_PATH}"
for key in secret_key internal_token; do
case " ${generated[*]} " in
*" ${key} "*) log "${key} generated" ;;
*) log "${key} unchanged" ;;
esac
done
log "done — ${#generated[@]} key(s) seeded at ${KV_API_PATH}"

149
tools/vault-seed-ops-repo.sh Executable file
View file

@ -0,0 +1,149 @@
#!/usr/bin/env bash
# =============================================================================
# tools/vault-seed-ops-repo.sh — Idempotent seed for kv/disinto/shared/ops-repo
#
# Part of the Nomad+Vault migration (S5.1, issue #1035). Populates the KV v2
# path that nomad/jobs/edge.hcl dispatcher task reads from, so the edge
# proxy has FORGE_TOKEN for ops repo access.
#
# Seeds from kv/disinto/bots/vault (the vault bot credentials) — copies the
# token field to kv/disinto/shared/ops-repo. This is the "service" path that
# dispatcher uses, distinct from the "agent" path (bots/vault) used by
# agent tasks under the service-agents policy.
#
# Idempotency contract:
# - Key present with non-empty value → leave untouched, log "token unchanged".
# - Key missing or empty → copy from bots/vault, log "token copied".
# - If bots/vault is also empty → generate a random value, log "token generated".
#
# Preconditions:
# - Vault reachable + unsealed at $VAULT_ADDR.
# - VAULT_TOKEN set (env) or /etc/vault.d/root.token readable.
# - The `kv/` mount is enabled as KV v2.
#
# Requires:
# - VAULT_ADDR (e.g. http://127.0.0.1:8200)
# - VAULT_TOKEN (env OR /etc/vault.d/root.token, resolved by lib/hvault.sh)
# - curl, jq, openssl
#
# Usage:
# tools/vault-seed-ops-repo.sh
# tools/vault-seed-ops-repo.sh --dry-run
#
# Exit codes:
# 0 success (seed applied, or already applied)
# 1 precondition / API / mount-mismatch failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# shellcheck source=../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
# KV v2 mount + logical paths
KV_MOUNT="kv"
OPS_REPO_PATH="disinto/shared/ops-repo"
VAULT_BOT_PATH="disinto/bots/vault"
OPS_REPO_API="${KV_MOUNT}/data/${OPS_REPO_PATH}"
VAULT_BOT_API="${KV_MOUNT}/data/${VAULT_BOT_PATH}"
log() { printf '[vault-seed-ops-repo] %s\n' "$*"; }
die() { printf '[vault-seed-ops-repo] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Flag parsing ─────────────────────────────────────────────────────────────
DRY_RUN=0
case "$#:${1-}" in
0:)
;;
1:--dry-run)
DRY_RUN=1
;;
1:-h|1:--help)
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Seed kv/disinto/shared/ops-repo with FORGE_TOKEN.\n\n'
printf 'Copies token from kv/disinto/bots/vault if present;\n'
printf 'otherwise generates a random value. Idempotent:\n'
printf 'existing non-empty values are left untouched.\n\n'
printf ' --dry-run Print planned actions without writing.\n'
exit 0
;;
*)
die "invalid arguments: $* (try --help)"
;;
esac
# ── Preconditions ────────────────────────────────────────────────────────────
for bin in curl jq openssl; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
[ -n "${VAULT_ADDR:-}" ] \
|| die "VAULT_ADDR unset — e.g. export VAULT_ADDR=http://127.0.0.1:8200"
hvault_token_lookup >/dev/null \
|| die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Step 1/2: ensure kv/ mount exists and is KV v2 ───────────────────────────
log "── Step 1/2: ensure ${KV_MOUNT}/ is KV v2 ──"
export DRY_RUN
hvault_ensure_kv_v2 "$KV_MOUNT" "[vault-seed-ops-repo]" \
|| die "KV mount check failed"
# ── Step 2/2: seed ops-repo from vault bot ───────────────────────────────────
log "── Step 2/2: seed ${OPS_REPO_API} ──"
# Read existing ops-repo value
existing_raw="$(hvault_get_or_empty "${OPS_REPO_API}")" \
|| die "failed to read ${OPS_REPO_API}"
existing_token=""
if [ -n "$existing_raw" ]; then
existing_token="$(printf '%s' "$existing_raw" | jq -r '.data.data.token // ""')"
fi
desired_token="$existing_token"
action=""
if [ -z "$existing_token" ]; then
# Token missing — try to copy from vault bot
bot_raw="$(hvault_get_or_empty "${VAULT_BOT_API}")" || true
if [ -n "$bot_raw" ]; then
bot_token="$(printf '%s' "$bot_raw" | jq -r '.data.data.token // ""')"
if [ -n "$bot_token" ]; then
desired_token="$bot_token"
action="copied"
fi
fi
# If still no token, generate one
if [ -z "$desired_token" ]; then
if [ "$DRY_RUN" -eq 1 ]; then
action="generated (dry-run)"
else
desired_token="$(openssl rand -hex 32)"
action="generated"
fi
fi
fi
if [ -z "$action" ]; then
log "all keys present at ${OPS_REPO_API} — no-op"
log "token unchanged"
exit 0
fi
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] ${OPS_REPO_PATH}: would ${action} token"
exit 0
fi
# Write the token
payload="$(jq -n --arg t "$desired_token" '{data: {token: $t}}')"
_hvault_request POST "${OPS_REPO_API}" "$payload" >/dev/null \
|| die "failed to write ${OPS_REPO_API}"
log "${OPS_REPO_PATH}: ${action} token"
log "done — ${OPS_REPO_API} seeded"

145
tools/vault-seed-woodpecker.sh Executable file
View file

@ -0,0 +1,145 @@
#!/usr/bin/env bash
# =============================================================================
# tools/vault-seed-woodpecker.sh — Idempotent seed for kv/disinto/shared/woodpecker
#
# Part of the Nomad+Vault migration (S3.1 + S3.3, issues #934 + #936). Populates
# the KV v2 path read by nomad/jobs/woodpecker-server.hcl:
# - agent_secret: pre-shared secret for woodpecker-server ↔ agent communication
# - forgejo_client + forgejo_secret: OAuth2 client credentials from Forgejo
#
# This script handles BOTH:
# 1. S3.1: seeds `agent_secret` if missing
# 2. S3.3: calls wp-oauth-register.sh to create Forgejo OAuth app + store
# forgejo_client/forgejo_secret in Vault
#
# Idempotency contract:
# - agent_secret: missing → generate and write; present → skip, log unchanged
# - OAuth app + credentials: handled by wp-oauth-register.sh (idempotent)
# This script preserves any existing keys it doesn't own.
#
# Idempotency contract (per key):
# - Key missing or empty in Vault → generate a random value, write it,
# log "agent_secret generated".
# - Key present with a non-empty value → leave untouched, log
# "agent_secret unchanged".
#
# Preconditions:
# - Vault reachable + unsealed at $VAULT_ADDR.
# - VAULT_TOKEN set (env) or /etc/vault.d/root.token readable.
# - The `kv/` mount is enabled as KV v2 (this script enables it on a
# fresh box; on an existing box it asserts the mount type/version).
#
# Requires:
# - VAULT_ADDR (e.g. http://127.0.0.1:8200)
# - VAULT_TOKEN (env OR /etc/vault.d/root.token, resolved by lib/hvault.sh)
# - curl, jq, openssl
#
# Usage:
# tools/vault-seed-woodpecker.sh
# tools/vault-seed-woodpecker.sh --dry-run
#
# Exit codes:
# 0 success (seed applied, or already applied)
# 1 precondition / API / mount-mismatch failure
# =============================================================================
set -euo pipefail
SEED_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SEED_DIR}/.." && pwd)"
LIB_DIR="${REPO_ROOT}/lib/init/nomad"
# shellcheck source=../lib/hvault.sh
source "${REPO_ROOT}/lib/hvault.sh"
KV_MOUNT="kv"
KV_LOGICAL_PATH="disinto/shared/woodpecker"
KV_API_PATH="${KV_MOUNT}/data/${KV_LOGICAL_PATH}"
AGENT_SECRET_BYTES=32 # 32 bytes → 64 hex chars
LOG_TAG="[vault-seed-woodpecker]"
log() { printf '%s %s\n' "$LOG_TAG" "$*"; }
die() { printf '%s ERROR: %s\n' "$LOG_TAG" "$*" >&2; exit 1; }
# ── Flag parsing ─────────────────────────────────────────────────────────────
# for-over-"$@" loop — shape distinct from vault-seed-forgejo.sh (arity:value
# case) and vault-apply-roles.sh (if/elif).
DRY_RUN=0
for arg in "$@"; do
case "$arg" in
--dry-run) DRY_RUN=1 ;;
-h|--help)
printf 'Usage: %s [--dry-run]\n\n' "$(basename "$0")"
printf 'Seed kv/disinto/shared/woodpecker with secrets.\n\n'
printf 'Handles both S3.1 (agent_secret) and S3.3 (OAuth app + credentials):\n'
printf ' - agent_secret: generated if missing\n'
printf ' - forgejo_client/forgejo_secret: created via Forgejo API if missing\n\n'
printf ' --dry-run Print planned actions without writing.\n'
exit 0
;;
*) die "invalid argument: ${arg} (try --help)" ;;
esac
done
# ── Preconditions — binary + Vault connectivity checks ───────────────────────
required_bins=(curl jq openssl)
for bin in "${required_bins[@]}"; do
command -v "$bin" >/dev/null 2>&1 || die "required binary not found: ${bin}"
done
[ -n "${VAULT_ADDR:-}" ] || die "VAULT_ADDR unset — export VAULT_ADDR=http://127.0.0.1:8200"
hvault_token_lookup >/dev/null || die "Vault auth probe failed — check VAULT_ADDR + VAULT_TOKEN"
# ── Step 1/3: ensure kv/ mount exists and is KV v2 ───────────────────────────
log "── Step 1/3: ensure ${KV_MOUNT}/ is KV v2 ──"
export DRY_RUN
hvault_ensure_kv_v2 "$KV_MOUNT" "[vault-seed-woodpecker]" \
|| die "KV mount check failed"
# ── Step 2/3: seed agent_secret at kv/data/disinto/shared/woodpecker ─────────
log "── Step 2/3: seed agent_secret ──"
existing_raw="$(hvault_get_or_empty "${KV_API_PATH}")" \
|| die "failed to read ${KV_API_PATH}"
# Read all existing keys so we can preserve them on write (KV v2 replaces
# `.data` atomically). Missing path → empty object.
existing_data="{}"
existing_agent_secret=""
if [ -n "$existing_raw" ]; then
existing_data="$(printf '%s' "$existing_raw" | jq '.data.data // {}')"
existing_agent_secret="$(printf '%s' "$existing_raw" | jq -r '.data.data.agent_secret // ""')"
fi
if [ -n "$existing_agent_secret" ]; then
log "agent_secret unchanged"
else
# agent_secret is missing — generate it.
if [ "$DRY_RUN" -eq 1 ]; then
log "[dry-run] would generate + write: agent_secret"
else
new_agent_secret="$(openssl rand -hex "$AGENT_SECRET_BYTES")"
# Merge the new key into existing data to preserve any keys written by
# other seeders (e.g. S3.3's forgejo_client/forgejo_secret).
payload="$(printf '%s' "$existing_data" \
| jq --arg as "$new_agent_secret" '{data: (. + {agent_secret: $as})}')"
_hvault_request POST "${KV_API_PATH}" "$payload" >/dev/null \
|| die "failed to write ${KV_API_PATH}"
log "agent_secret generated"
fi
fi
# ── Step 3/3: register Forgejo OAuth app and store credentials ───────────────
log "── Step 3/3: register Forgejo OAuth app ──"
# Export DRY_RUN for the OAuth script and call it
export DRY_RUN
if "${LIB_DIR}/wp-oauth-register.sh" || [ "$DRY_RUN" -eq 1 ]; then
:
elif [ -n "${FORGE_URL:-}" ]; then
# Forgejo was configured but unavailable
log "OAuth registration check failed (Forgejo may not be running)"
log "This is expected if Forgejo is not available"
fi
log "done — agent_secret + OAuth credentials seeded"

View file

@ -1,3 +1,4 @@
<!-- last-reviewed: 0d6181918452c1407a3f6bc62917724261acff26 -->
# vault/policies/ — Agent Instructions # vault/policies/ — Agent Instructions
HashiCorp Vault ACL policies for the disinto factory. One `.hcl` file per HashiCorp Vault ACL policies for the disinto factory. One `.hcl` file per
@ -29,6 +30,9 @@ KV v2). Vault addresses KV v2 data at `kv/data/<path>` and metadata at
|---|---| |---|---|
| `service-forgejo` | `kv/data/disinto/shared/forgejo/*` | | `service-forgejo` | `kv/data/disinto/shared/forgejo/*` |
| `service-woodpecker` | `kv/data/disinto/shared/woodpecker/*` | | `service-woodpecker` | `kv/data/disinto/shared/woodpecker/*` |
| `service-agents` | All 7 `kv/data/disinto/bots/<role>/*` namespaces + `kv/data/disinto/shared/forge/*`; composite policy for the `agents` Nomad job (S4.1) |
| `service-chat` | `kv/data/disinto/shared/chat/*`; read-only OAuth client config + forward-auth secret for the chat Nomad job (S5.2, #989) |
| `service-dispatcher` | `kv/data/disinto/runner/*` (list+read) + `kv/data/disinto/shared/ops-repo/*` (read); used by edge dispatcher sidecar (S5.1, #988) |
| `bot-<role>` (dev, review, gardener, architect, planner, predictor, supervisor, vault, dev-qwen) | `kv/data/disinto/bots/<role>/*` + `kv/data/disinto/shared/forge/*` | | `bot-<role>` (dev, review, gardener, architect, planner, predictor, supervisor, vault, dev-qwen) | `kv/data/disinto/bots/<role>/*` + `kv/data/disinto/shared/forge/*` |
| `runner-<TOKEN>` (GITHUB\_TOKEN, CODEBERG\_TOKEN, CLAWHUB\_TOKEN, DEPLOY\_KEY, NPM\_TOKEN, DOCKER\_HUB\_TOKEN) | `kv/data/disinto/runner/<TOKEN>` (exactly one) | | `runner-<TOKEN>` (GITHUB\_TOKEN, CODEBERG\_TOKEN, CLAWHUB\_TOKEN, DEPLOY\_KEY, NPM\_TOKEN, DOCKER\_HUB\_TOKEN) | `kv/data/disinto/runner/<TOKEN>` (exactly one) |
| `dispatcher` | `kv/data/disinto/runner/*` + `kv/data/disinto/shared/ops-repo/*` | | `dispatcher` | `kv/data/disinto/runner/*` + `kv/data/disinto/shared/ops-repo/*` |
@ -48,49 +52,134 @@ validation.
1. Drop a file matching one of the four naming patterns above. Use an 1. Drop a file matching one of the four naming patterns above. Use an
existing file in the same family as the template — comment header, existing file in the same family as the template — comment header,
capability list, and KV path layout should match the family. capability list, and KV path layout should match the family.
2. Run `vault policy fmt -write <file>` to ensure consistent formatting. 2. Run `vault policy fmt <file>` locally so the formatting matches what
3. Run `vault policy validate <file>` locally to check syntax + semantics. the CI fmt-check (step 4 of `.woodpecker/nomad-validate.yml`) will
accept. The fmt check runs non-destructively in CI but a dirty file
fails the step; running `fmt` locally before pushing is the fastest
path.
3. Add the matching entry to `../roles.yaml` (see "JWT-auth roles" below)
so the CI role-reference check (step 6) stays green.
4. Run `tools/vault-apply-policies.sh --dry-run` to confirm the new 4. Run `tools/vault-apply-policies.sh --dry-run` to confirm the new
basename appears in the planned-work list with the expected SHA. basename appears in the planned-work list with the expected SHA.
5. Run `tools/vault-apply-policies.sh` against a Vault instance to 5. Run `tools/vault-apply-policies.sh` against a Vault instance to
create it; re-run to confirm it reports `unchanged`. create it; re-run to confirm it reports `unchanged`.
## JWT-auth roles (S2.3)
Policies are inert until a Vault token carrying them is minted. In this
migration that mint path is JWT auth — Nomad jobs exchange their
workload-identity JWT for a Vault token via
`auth/jwt-nomad/role/<name>``token_policies = ["<policy>"]`. The
role bindings live in [`../roles.yaml`](../roles.yaml); the script that
enables the auth method + writes the config + applies roles is
[`lib/init/nomad/vault-nomad-auth.sh`](../../lib/init/nomad/vault-nomad-auth.sh).
The applier is [`tools/vault-apply-roles.sh`](../../tools/vault-apply-roles.sh).
### Role → policy naming convention
Role name == policy name, 1:1. `vault/roles.yaml` carries one entry per
`vault/policies/*.hcl` file:
```yaml
roles:
- name: service-forgejo # Vault role
policy: service-forgejo # ACL policy attached to minted tokens
namespace: default # bound_claims.nomad_namespace
job_id: forgejo # bound_claims.nomad_job_id
```
The role name is what jobspecs reference via `vault { role = "..." }`
keep it identical to the policy basename so an S2.1↔S2.3 drift (new
policy without a role, or vice versa) shows up in one directory review,
not as a runtime "permission denied" at job placement.
`bound_claims.nomad_job_id` is the actual `job "..."` name in the
jobspec, which may differ from the policy name (e.g. policy
`service-forgejo` binds to job `forgejo`). Update it when each bot's or
runner's jobspec lands.
### Adding a new service
1. Write `vault/policies/<name>.hcl` using the naming-table family that
fits (`service-`, `bot-`, `runner-`, or standalone).
2. Add a matching entry to `vault/roles.yaml` with all four fields
(`name`, `policy`, `namespace`, `job_id`).
3. Apply both — either in one shot via `lib/init/nomad/vault-nomad-auth.sh`
(policies → roles → nomad SIGHUP), or granularly via
`tools/vault-apply-policies.sh` + `tools/vault-apply-roles.sh`.
4. Reference the role in the consuming jobspec's `vault { role = "<name>" }`.
### Token shape
All roles share the same token shape, hardcoded in
`tools/vault-apply-roles.sh`:
| Field | Value |
|---|---|
| `bound_audiences` | `["vault.io"]` — matches `default_identity.aud` in `nomad/server.hcl` |
| `token_type` | `service` — auto-revoked when the task exits |
| `token_ttl` | `1h` |
| `token_max_ttl` | `24h` |
Bumping any of these is a knowing, repo-wide change. Per-role overrides
would let one service's tokens outlive the others — add a field to
`vault/roles.yaml` and the applier at the same time if that ever
becomes necessary.
## Policy lifecycle ## Policy lifecycle
Adding a new policy is a three-step process: Adding a policy that an actual workload consumes is a three-step chain;
the CI pipeline guards each link.
1. **Add policy HCL** — Drop a file in `vault/policies/` matching one of the 1. **Add the policy HCL**`vault/policies/<name>.hcl`, formatted with
naming patterns. Run `vault policy fmt <file>` locally to ensure consistent `vault policy fmt`. Capabilities must be drawn from the Vault-recognized
formatting. set (`read`, `list`, `create`, `update`, `delete`, `patch`, `sudo`,
2. **Update roles.yaml** — Add a JWT auth role in `vault/roles.yaml` that `deny`); a typo fails CI step 5 (HCL written to an inline dev-mode Vault
references the new policy name (basename without `.hcl`). via `vault policy write` — a real parser, not a regex).
3. **Attach to Nomad job** — In S2.4, add the policy to a jobspec's 2. **Update `../roles.yaml`** — add a JWT-auth role entry whose `policy:`
`template { vault { policies = ["<policy-name>"] } }` stanza. field matches the new basename (without `.hcl`). CI step 6 re-checks
every role in this file against the policy set, so a drift between the
two directories fails the step.
3. **Reference from a Nomad jobspec** — add `vault { role = "<name>" }` in
`nomad/jobs/<service>.hcl` (owned by S2.4). Policies do not take effect
until a Nomad job asks for a token via that role.
CI enforces: See the "Adding a new service" walkthrough below for the applier-script
flow once steps 13 are committed.
- `vault policy fmt -check` — all `.hcl` files must be formatted ## CI enforcement (`.woodpecker/nomad-validate.yml`)
- `vault policy validate` — syntax + semantic check (no unknown stanzas,
valid capabilities) The pipeline triggers on any PR touching `vault/policies/**`,
- `roles.yaml` validator — each role must reference a policy that exists `vault/roles.yaml`, or `lib/init/nomad/vault-*.sh` and runs four
in `vault/policies/` vault-scoped checks (in addition to the nomad-scoped steps already in
- secret-scan gate — no literal secrets in policy files (rare but place):
dangerous copy-paste mistake)
| Step | Tool | What it catches |
|---|---|---|
| 4. `vault-policy-fmt` | `vault policy fmt` + `diff` | formatting drift — trailing whitespace, wrong indentation, missing newlines |
| 5. `vault-policy-validate` | `vault policy write` against inline dev Vault | HCL syntax errors, unknown stanzas, invalid capability names (e.g. `"frobnicate"`), malformed `path "..." {}` blocks |
| 6. `vault-roles-validate` | yamllint + PyYAML | roles.yaml syntax drift, missing required fields, role→policy references with no matching `.hcl` |
| P11 | `lib/secret-scan.sh` via `.woodpecker/secret-scan.yml` | literal secret leaked into a policy HCL (rare copy-paste mistake) — already covers `vault/**/*`, no duplicate step here |
All four steps are fail-closed — any error blocks merge. The pipeline
pins `hashicorp/vault:1.18.5` (matching `lib/init/nomad/install.sh`);
bumping the runtime version without bumping the CI image is a CI-caught
drift.
## Common failure modes ## Common failure modes
| Symptom | Cause | Fix | | Symptom in CI logs | Root cause | Fix |
|---|---|---| |---|---|---|
| `vault policy fmt -check` fails | HCL not formatted (wrong indentation, trailing spaces) | Run `vault policy fmt -write <file>` | | `vault-policy-fmt: … is not formatted — run 'vault policy fmt <file>'` | Trailing whitespace / mixed indent in an HCL file | `vault policy fmt <file>` locally and re-commit |
| `vault policy validate` fails | Unknown stanza, invalid capability, missing required field | Check Vault docs; valid capabilities: `read`, `list`, `create`, `update`, `delete`, `sudo` | | `vault-policy-validate: … failed validation` plus a `policy` error from Vault | Unknown capability (e.g. `"frobnicate"`), unknown stanza, malformed `path` block | Fix the HCL; valid capabilities are `read`, `list`, `create`, `update`, `delete`, `patch`, `sudo`, `deny` |
| `roles.yaml` validator fails | Policy name in role doesn't match any `.hcl` basename | Ensure policy name = filename without `.hcl` | | `vault-roles-validate: ERROR: role 'X' references policy 'Y' but vault/policies/Y.hcl does not exist` | A role's `policy:` field does not match any file basename in `vault/policies/` | Either add the missing policy HCL or fix the typo in `roles.yaml` |
| secret-scan fails | Literal secret value embedded (e.g., `token = "abc123..."`) | Use env var reference (`$TOKEN`) or sops/age-encrypted secret | | `vault-roles-validate: ERROR: role entry missing required field 'Z'` | A role in `roles.yaml` is missing one of `name`, `policy`, `namespace`, `job_id` | Add the field; all four are required |
| P11 `secret-scan: detected potential secret …` on a `.hcl` file | A literal token/password was pasted into a policy | Policies must name KV paths, not carry secret values — move the literal into KV (S2.2) and have the policy grant `read` on the path |
## What this directory does NOT own ## What this directory does NOT own
- **Attaching policies to Nomad jobs.** That's S2.4 (#882) via the - **Attaching policies to Nomad jobs.** That's S2.4 (#882) via the
jobspec `template { vault { policies = […] } }` stanza. jobspec `template { vault { policies = […] } }` stanza — the role
- **Enabling JWT auth + Nomad workload identity roles.** That's S2.3 name in `vault { role = "..." }` is what binds the policy.
(#881).
- **Writing the secret values themselves.** That's S2.2 (#880) via - **Writing the secret values themselves.** That's S2.2 (#880) via
`tools/vault-import.sh`. `tools/vault-import.sh`.

View file

@ -3,14 +3,14 @@
# Architect agent: reads its own bot KV namespace + the shared forge URL. # Architect agent: reads its own bot KV namespace + the shared forge URL.
# Attached to the architect-agent Nomad job via workload identity (S2.4). # Attached to the architect-agent Nomad job via workload identity (S2.4).
path "kv/data/disinto/bots/architect/*" { path "kv/data/disinto/bots/architect" {
capabilities = ["read"] capabilities = ["read"]
} }
path "kv/metadata/disinto/bots/architect/*" { path "kv/metadata/disinto/bots/architect" {
capabilities = ["list", "read"] capabilities = ["list", "read"]
} }
path "kv/data/disinto/shared/forge/*" { path "kv/data/disinto/shared/forge" {
capabilities = ["read"] capabilities = ["read"]
} }

View file

@ -5,14 +5,14 @@
# via workload identity (S2.4). KV path mirrors the bot basename: # via workload identity (S2.4). KV path mirrors the bot basename:
# kv/disinto/bots/dev-qwen/*. # kv/disinto/bots/dev-qwen/*.
path "kv/data/disinto/bots/dev-qwen/*" { path "kv/data/disinto/bots/dev-qwen" {
capabilities = ["read"] capabilities = ["read"]
} }
path "kv/metadata/disinto/bots/dev-qwen/*" { path "kv/metadata/disinto/bots/dev-qwen" {
capabilities = ["list", "read"] capabilities = ["list", "read"]
} }
path "kv/data/disinto/shared/forge/*" { path "kv/data/disinto/shared/forge" {
capabilities = ["read"] capabilities = ["read"]
} }

View file

@ -3,14 +3,14 @@
# Dev agent: reads its own bot KV namespace + the shared forge URL. # Dev agent: reads its own bot KV namespace + the shared forge URL.
# Attached to the dev-agent Nomad job via workload identity (S2.4). # Attached to the dev-agent Nomad job via workload identity (S2.4).
path "kv/data/disinto/bots/dev/*" { path "kv/data/disinto/bots/dev" {
capabilities = ["read"] capabilities = ["read"]
} }
path "kv/metadata/disinto/bots/dev/*" { path "kv/metadata/disinto/bots/dev" {
capabilities = ["list", "read"] capabilities = ["list", "read"]
} }
path "kv/data/disinto/shared/forge/*" { path "kv/data/disinto/shared/forge" {
capabilities = ["read"] capabilities = ["read"]
} }

View file

@ -3,14 +3,14 @@
# Gardener agent: reads its own bot KV namespace + the shared forge URL. # Gardener agent: reads its own bot KV namespace + the shared forge URL.
# Attached to the gardener-agent Nomad job via workload identity (S2.4). # Attached to the gardener-agent Nomad job via workload identity (S2.4).
path "kv/data/disinto/bots/gardener/*" { path "kv/data/disinto/bots/gardener" {
capabilities = ["read"] capabilities = ["read"]
} }
path "kv/metadata/disinto/bots/gardener/*" { path "kv/metadata/disinto/bots/gardener" {
capabilities = ["list", "read"] capabilities = ["list", "read"]
} }
path "kv/data/disinto/shared/forge/*" { path "kv/data/disinto/shared/forge" {
capabilities = ["read"] capabilities = ["read"]
} }

View file

@ -3,14 +3,14 @@
# Planner agent: reads its own bot KV namespace + the shared forge URL. # Planner agent: reads its own bot KV namespace + the shared forge URL.
# Attached to the planner-agent Nomad job via workload identity (S2.4). # Attached to the planner-agent Nomad job via workload identity (S2.4).
path "kv/data/disinto/bots/planner/*" { path "kv/data/disinto/bots/planner" {
capabilities = ["read"] capabilities = ["read"]
} }
path "kv/metadata/disinto/bots/planner/*" { path "kv/metadata/disinto/bots/planner" {
capabilities = ["list", "read"] capabilities = ["list", "read"]
} }
path "kv/data/disinto/shared/forge/*" { path "kv/data/disinto/shared/forge" {
capabilities = ["read"] capabilities = ["read"]
} }

View file

@ -3,14 +3,14 @@
# Predictor agent: reads its own bot KV namespace + the shared forge URL. # Predictor agent: reads its own bot KV namespace + the shared forge URL.
# Attached to the predictor-agent Nomad job via workload identity (S2.4). # Attached to the predictor-agent Nomad job via workload identity (S2.4).
path "kv/data/disinto/bots/predictor/*" { path "kv/data/disinto/bots/predictor" {
capabilities = ["read"] capabilities = ["read"]
} }
path "kv/metadata/disinto/bots/predictor/*" { path "kv/metadata/disinto/bots/predictor" {
capabilities = ["list", "read"] capabilities = ["list", "read"]
} }
path "kv/data/disinto/shared/forge/*" { path "kv/data/disinto/shared/forge" {
capabilities = ["read"] capabilities = ["read"]
} }

Some files were not shown because too many files have changed in this diff Show more