Compare commits

...
Sign in to create a new pull request.

107 commits

Author SHA1 Message Date
eefbec601b Merge pull request 'fix: [nomad-step-0] S0.1-fix — bin/disinto swallows --backend=nomad as repo_url positional (#835)' (#839) from fix/issue-835 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 09:31:23 +00:00
Claude
72ed1f112d fix: [nomad-step-0] S0.1-fix — bin/disinto swallows --backend=nomad as repo_url positional (#835)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Why: disinto_init() consumed $1 as repo_url before the argparse loop ran,
so `disinto init --backend=nomad --empty` had --backend=nomad swallowed
into repo_url, backend stayed at its "docker" default, and the --empty
validation then produced the nonsense "--empty is only valid with
--backend=nomad" error — flagged during S0.1 end-to-end verification on
a fresh LXC. nomad backend takes no positional anyway; the LXC already
has the repo cloned by the operator.

Change: only consume $1 as repo_url if it doesn't start with "--", then
defer the "repo URL required" check to after argparse (so the docker
path still errors with a helpful message on a missing positional, not
"Unknown option: --backend=docker").

Verified acceptance criteria:
  1. init --backend=nomad --empty             → dispatches to nomad
  2. init --backend=nomad --empty --dry-run   → 9-step plan, exit 0
  3. init <repo-url>                          → docker path unchanged
  4. init                                     → "repo URL required"
  5. init --backend=docker                    → "repo URL required"
                                                (not "Unknown option")
  6. shellcheck clean

Tests: 4 new regression cases in tests/disinto-init-nomad.bats covering
flag-first nomad invocation (both --flag=value and --flag value forms),
no-args docker default, and --backend=docker missing-positional error
path. Full suite: 10/10 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:19:36 +00:00
0850e83ec6 Merge pull request 'fix: fix: disinto hire-an-agent + compose generator defects blocking multi-llama-dev parallel operation (#834)' (#838) from fix/issue-834 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 09:12:04 +00:00
Claude
43dc86d84c fix: fix: disinto hire-an-agent + compose generator defects blocking multi-llama-dev parallel operation (#834)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Hiring a second llama-backed dev agent (e.g. `dev-qwen2`) alongside
`dev-qwen` tripped four defects that prevented safe parallel operation.

Gap 1 — hire-agent keyed per-agent token as FORGE_<ROLE>_TOKEN, so two
dev-role agents overwrote each other's token in .env. Re-key by agent
name via `tr 'a-z-' 'A-Z_'`: FORGE_TOKEN_<AGENT_UPPER>.

Gap 2 — hire-agent generated a random FORGE_PASS but never wrote it to
.env. The container's git credential helper needs both token and pass
to push over HTTPS (#361). Persist FORGE_PASS_<AGENT_UPPER> with the
same update-in-place idempotency as the token.

Gap 3 — _generate_local_model_services hardcoded FORGE_TOKEN_LLAMA for
every local-model service, forcing all hired llama agents to share one
Forgejo identity. Derive USER_UPPER from the TOML's `forge_user` field
and emit \${FORGE_TOKEN_<USER_UPPER>:-} per service.

Gap 4 — every local-model service mounted the shared `project-repos`
volume, so concurrent llama devs collided on /_factory worktree and
state/.dev-active. Switch to per-agent `project-repos-<service_name>`
and emit the matching top-level volume. Also escape embedded newlines
in `$all_vols` before the sed insertion so multi-agent volume lists
don't unterminate the substitute command.

.env.example documents the new FORGE_TOKEN_<AGENT> / FORGE_PASS_<AGENT>
naming convention (and preserves the legacy FORGE_TOKEN_LLAMA path used
by the ENABLE_LLAMA_AGENT=1 singleton build).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:55:48 +00:00
311e1926bb Merge pull request 'chore: gardener housekeeping' (#837) from chore/gardener-20260416-0838 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 08:52:37 +00:00
3b90bd234d Merge pull request 'fix: fix: issue_claim race — verify assignee after PATCH to prevent duplicate work (#830)' (#836) from fix/issue-830 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 08:46:39 +00:00
Claude
6533f322e3 fix: add last-reviewed watermark SHA to secret-scan safe patterns
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
2026-04-16 08:46:00 +00:00
Claude
e9c144a511 chore: gardener housekeeping 2026-04-16
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline failed
2026-04-16 08:38:31 +00:00
Claude
620515634a fix: issue_claim race — verify assignee after PATCH to prevent duplicate work (#830)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Forgejo's assignees PATCH is last-write-wins, so two dev agents polling
concurrently could both observe .assignee == null at the pre-check, both
PATCH, and the loser would silently "succeed" and proceed to implement
the same issue — colliding at the PR/branch stage.

Re-read the assignee after the PATCH and bail out if it isn't self.
Label writes are moved AFTER this verification so a losing claim leaves
no stray in-progress label to roll back.

Adds tests/lib-issue-claim.bats covering the three paths:
  - happy path (single agent, re-read confirms self)
  - lost race (re-read shows another agent — returns 1, no labels added)
  - pre-check skip (initial GET already shows another agent)

Prerequisite for the LLAMA_BOTS parametric refactor that will run N
dev containers against the same project.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:35:18 +00:00
2a7ae0b7ea Merge pull request 'fix: [nomad-step-0] S0.5 — Woodpecker CI validation for nomad/vault artifacts (#825)' (#833) from fix/issue-825 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
2026-04-16 08:18:46 +00:00
Claude
14c67f36e6 fix: add bats coverage for --backend <value> space-separated form (#825)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The bin/disinto flag loop has separate cases for `--backend value`
(space-separated) and `--backend=value`; a regression in either would
silently route to the docker default path. Per the "stub-first dispatch"
lesson, silent misrouting during a migration is the worst failure mode —
covering both forms closes that gap.

Also triggers a retry of the smoke-init pipeline step, which hit a known
Forgejo branch-indexing flake on pipeline #913 (same flake cleared on
retry for PR #829 pipelines #906#908); unrelated to the nomad-validate
changes, which went all-green in #913.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:06:51 +00:00
Claude
e5c41dd502 fix: tolerate vault operator diagnose exit 2 (advisory warnings) in CI (#825)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Pipeline #911 on PR #833 failed because `vault operator diagnose -config=
nomad/vault.hcl -skip=storage -skip=listener` returns exit code 2 — not
on a hard failure, but because our factory dev-box vault.hcl deliberately
runs TLS-disabled on a localhost-only listener (documented in the file
header), which triggers an advisory "Check Listener TLS" warning.

The -skip flag disables runtime sub-checks (storage access, listener
bind) but does NOT suppress the advisory checks on the parsed config, so
a valid dev-box config with documented-and-intentional warnings still
exits non-zero under strict CI.

Fix: wrap the command in a case on exit code. Treat rc=0 (all green)
and rc=2 (advisory warnings only — config still parses) as success, and
fail hard on rc=1 (real HCL/schema/storage failure) or any other rc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:59:28 +00:00
Claude
5150f8c486 fix: [nomad-step-0] S0.5 — Woodpecker CI validation for nomad/vault artifacts (#825)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline failed
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline failed
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
Locks in static validation for every Nomad+Vault artifact before it can
merge. Four fail-closed steps in .woodpecker/nomad-validate.yml, gated
to PRs touching nomad/, lib/init/nomad/, or bin/disinto:

  1. nomad config validate nomad/server.hcl nomad/client.hcl
  2. vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener
  3. shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
  4. bats tests/disinto-init-nomad.bats — dispatcher smoke tests

bin/disinto picks up pre-existing SC2120 warnings on three passthrough
wrappers (generate_agent_docker, generate_caddyfile, generate_staging_index);
annotated with shellcheck disable=SC2120 so the new pipeline is clean
without narrowing the warning for future code.

Pinned image versions (hashicorp/nomad:1.9.5, hashicorp/vault:1.18.5)
match lib/init/nomad/install.sh — bump both or neither.

nomad/AGENTS.md documents the stack layout, how to add a jobspec in
Step 1, how CI validates it, and the two-place version pinning rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:54:06 +00:00
271ec9d8f5 Merge pull request 'fix: [nomad-step-0] S0.4 — disinto init --backend=nomad --empty orchestrator (cluster-up) (#824)' (#829) from fix/issue-824 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 07:42:47 +00:00
Claude
481175e043 fix: dedupe cluster-up.sh polling via poll_until_healthy helper (#824)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
CI duplicate-detection flagged the in-line vault + nomad polling loops
in cluster-up.sh as matching a 5-line window in vault-init.sh (the
`ready=1 / break / fi / sleep 1 / done` boilerplate).

Extracts the repeated pattern into three helpers at the top of the
file:

  - nomad_has_ready_node       wrapper so poll_until_healthy can take a
                               bare command name.
  - _die_with_service_status   shared "log + dump systemctl status +
                               die" path (factored out of the two
                               callsites + the timeout branch).
  - poll_until_healthy         ticks once per second up to TIMEOUT,
                               fail-fasts on systemd "failed" state,
                               and returns 0 on first successful check.

Step 7 (vault unseal) and Step 8 (nomad ready node) each collapse from
~15 lines of explicit for-loop bookkeeping to a one-line call. No
behavioural change: same tick cadence, same fail-fast, same status
dump on timeout. Local detect-duplicates.py run against main confirms
no new duplicates introduced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:26:54 +00:00
Claude
d2c6b33271 fix: [nomad-step-0] S0.4 — disinto init --backend=nomad --empty orchestrator (cluster-up) (#824)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
Wires S0.1–S0.3 into a single idempotent bring-up script and replaces
the S0.1 stub in _disinto_init_nomad so `disinto init --backend=nomad
--empty` produces a running empty single-node cluster on a fresh box.

lib/init/nomad/cluster-up.sh (new):
  1. install.sh                (nomad + vault binaries)
  2. systemd-nomad.sh          (unit + enable, not started)
  3. systemd-vault.sh          (unit + vault.hcl + enable)
  4. host-volume dirs under /srv/disinto/* (matching nomad/client.hcl)
  5. /etc/nomad.d/{server,client}.hcl (content-compare before write)
  6. vault-init.sh             (first-run init + unseal + persist keys)
  7. systemctl start vault     (poll until unsealed; fail-fast on
                                is-failed)
  8. systemctl start nomad     (poll until ≥1 node ready)
  9. /etc/profile.d/disinto-nomad.sh (VAULT_ADDR + NOMAD_ADDR for
                                      interactive shells)
  Re-running on a healthy box is a no-op — each sub-step is itself
  idempotent and steps 7/8 fast-path when already active + healthy.
  `--dry-run` prints the full step list and exits 0.

bin/disinto:
  - _disinto_init_nomad: replaces the S0.1 stub. Invokes cluster-up.sh
    directly (as root) or via `sudo -n` otherwise. Both `--empty` and
    the default (no flag) call cluster-up.sh today; Step 1 will branch
    on $empty to gate job deployment. --dry-run forwards through.
  - disinto_init: adds `--empty` flag parsing; rejects `--empty`
    combined with `--backend=docker` explicitly instead of silently
    ignoring it.
  - usage: documents `--empty` and drops the "stub, S0.1" annotation
    from --backend.

Closes #824.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:22:15 +00:00
accd10ec67 Merge pull request 'fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)' (#828) from fix/issue-823 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 07:04:57 +00:00
Claude
24cb8f83a2 fix: [nomad-step-0] S0.3 — install vault + systemd auto-unseal + vault-init.sh (dev-persisted seal) (#823)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Adds the Vault half of the factory-dev-box bringup, landed but not started
(per the install-but-don't-start pattern used for nomad in #822):

- lib/init/nomad/install.sh — now also installs vault from the shared
  HashiCorp apt repo. VAULT_VERSION pinned (1.18.5). Fast-path skips apt
  entirely when both binaries are at their pins; partial upgrades only
  touch the package that drifted.

- nomad/vault.hcl — single-node config: file storage backend at
  /var/lib/vault/data, localhost listener on :8200, ui on, mlock kept on.
  No TLS / HA / audit yet; those land in later steps.

- lib/init/nomad/systemd-vault.sh — writes /etc/systemd/system/vault.service
  (Type=notify, ExecStartPost auto-unseals from /etc/vault.d/unseal.key,
  CAP_IPC_LOCK granted for mlock), deploys nomad/vault.hcl to
  /etc/vault.d/, creates /var/lib/vault/data (0700 root), enables the
  unit without starting it. Idempotent via content-compare.

- lib/init/nomad/vault-init.sh — first-run init: spawns a temporary
  `vault server` if not already reachable, runs operator-init with
  key-shares=1/threshold=1, persists unseal.key + root.token (0400 root),
  unseals once in-process, shuts down the temp server. Re-run detects
  initialized + unseal.key present → no-op. Initialized but key missing
  is a hard failure (can't recover).

lib/hvault.sh already defaults VAULT_TOKEN to /etc/vault.d/root.token
when the env var is absent, so no change needed there.

Seal model: the single unseal key lives on disk; seal-key theft equals
vault theft. Factory-dev-box-acceptable tradeoff — avoids running a
second Vault to auto-unseal the first.

Blocks S0.4 (#824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:53:27 +00:00
75bec43c4a Merge pull request 'fix: [nomad-step-0] S0.2 — install nomad + systemd unit + nomad/server.hcl/client.hcl (#822)' (#827) from fix/issue-822 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 06:15:32 +00:00
Claude
06ead3a19d fix: [nomad-step-0] S0.2 — install nomad + systemd unit + nomad/server.hcl/client.hcl (#822)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
Lands the Nomad install + baseline HCL config for the single-node factory
dev box. Nothing is wired into `disinto init` yet — S0.4 does that.

- lib/init/nomad/install.sh: idempotent apt install pinned to
  NOMAD_VERSION (default 1.9.5). Adds HashiCorp apt keyring and sources
  list only if absent; fast-paths when the pinned version is already
  installed.
- lib/init/nomad/systemd-nomad.sh: writes /etc/systemd/system/nomad.service
  (rewrites only when content differs), creates /etc/nomad.d and
  /var/lib/nomad, runs `systemctl enable nomad` WITHOUT starting.
- nomad/server.hcl: single-node combined server+client role. bootstrap_expect=1,
  localhost bind, default ports pinned explicitly, UI enabled. No TLS/ACL —
  factory dev box baseline.
- nomad/client.hcl: Docker task driver (allow_privileged=false, volumes
  enabled) and host_volume pre-wiring for forgejo-data, woodpecker-data,
  agent-data, project-repos, caddy-data, chat-history, ops-repo under
  /srv/disinto/*.

Verified: `nomad config validate nomad/*.hcl` reports "Configuration is
valid!" (with expected TLS/bootstrap warnings for a dev box). Shellcheck
clean across the repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:04:02 +00:00
74f49e1c2f Merge pull request 'fix: [nomad-step-0] S0.1 — add --backend=nomad flag + stub to bin/disinto init (#821)' (#826) from fix/issue-821 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 05:54:22 +00:00
Claude
de00400bc4 fix: [nomad-step-0] S0.1 — add --backend=nomad flag + stub to bin/disinto init (#821)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Lands the dispatch entry point for the Nomad+Vault migration. The docker
path remains the default and is byte-for-byte unchanged. The new
`--backend=nomad` value routes to a `_disinto_init_nomad` stub that fails
loud (exit 99) so no silent misrouting can happen while S0.2–S0.5 fill in
the real implementation. With `--dry-run --backend=nomad` the stub reports
status and exits 0 so dry-run callers (P7) don't see a hard failure.

- New `--backend <value>` flag (accepts `docker` | `nomad`); supports
  both `--backend nomad` and `--backend=nomad` forms.
- Invalid backend values are rejected with a clear error.
- `_disinto_init_nomad` lives next to `disinto_init` so future S0.x
  issues only need to fill in this function — flag parsing and dispatch
  stay frozen.
- `--help` lists the flag and both values.
- `shellcheck bin/disinto` introduces no new findings beyond the
  pre-existing baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 05:43:35 +00:00
32ab84a87c Merge pull request 'chore: gardener housekeeping 2026-04-16' (#819) from chore/gardener-20260416-0215 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 02:22:01 +00:00
Claude
c236350e00 chore: gardener housekeeping 2026-04-16
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- Bump AGENTS.md watermarks to HEAD (c363ee0) across all 9 per-directory files
- supervisor/AGENTS.md: document dual-container trigger (agents + edge) and SUPERVISOR_INTERVAL env var added by P1/#801
- lib/AGENTS.md: document agents-llama-all compose service (all 7 roles) added to generators.sh by P1/#801
- pending-actions.json: comment #623 (all deps now closed, ready for planner decomposition), comment #758 (needs human Forgejo admin action to unblock ops repo writes)
2026-04-16 02:15:38 +00:00
c363ee0aea Merge pull request 'fix: [nomad-prep] P12 — dispatcher commits result.json via git push, not bind-mount (#803)' (#818) from fix/issue-803 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 01:05:57 +00:00
Claude
519742e5e7 fix: [nomad-prep] P12 — dispatcher commits result.json via git push, not bind-mount (#803)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Replace write_result's direct filesystem write with commit_result_via_git,
which clones the ops repo into a scratch directory, writes the result file,
commits as vault-bot, and pushes. This removes the requirement for a shared
bind-mount between the dispatcher container and the host ops-repo clone.

- Idempotent: skips if result.json already exists upstream
- Retry loop: handles push conflicts with rebase-and-push (up to 3 attempts)
- Scratch dir: cleaned up via RETURN trap regardless of outcome
- Works identically under docker and future nomad backends
2026-04-16 00:54:33 +00:00
131d0471f2 Merge pull request 'fix: [nomad-prep] P2 — dispatcher refactor: pluggable launcher + DISPATCHER_BACKEND flag (#802)' (#817) from fix/issue-802 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 00:45:01 +00:00
Claude
4487d1512c fix: restore write_result on pre-docker error paths in _launch_runner_docker
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Prevents infinite retry loops when secret resolution or mount alias
validation fails before the docker run is attempted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:33:55 +00:00
Claude
ef40433fff fix: [nomad-prep] P2 — dispatcher refactor: pluggable launcher + DISPATCHER_BACKEND flag (#802)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:22:10 +00:00
7513e93d6d Merge pull request 'fix: [nomad-prep] P1 — run all 7 bot roles on llama backend (gates migration) (#801)' (#816) from fix/issue-801 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-16 00:14:30 +00:00
Claude
0bfa31da49 chore: retrigger CI
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-15 23:58:20 +00:00
Claude
8e885bed02 fix: [nomad-prep] P1 — run all 7 bot roles on llama backend (gates migration) (#801)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline failed
- Add supervisor role to entrypoint.sh polling loop (SUPERVISOR_INTERVAL,
  default 20 min) and include it in default AGENT_ROLES
- Add agents-llama-all compose service (profile: agents-llama-all) with
  all 7 roles: review, dev, gardener, architect, planner, predictor, supervisor
- Add agents-llama-all to lib/generators.sh for disinto init generation
- Update docs/agents-llama.md with profile table and usage instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:52:04 +00:00
34447d31dc Merge pull request 'fix: [nomad-prep] P7 — make disinto init idempotent + add --dry-run (#800)' (#815) from fix/issue-800 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 23:43:28 +00:00
Claude
9d8f322005 fix: [nomad-prep] P7 — make disinto init idempotent + add --dry-run (#800)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Make `disinto init` safe to re-run on the same box:

- Store admin token as FORGE_ADMIN_TOKEN in .env; preserve on re-run
  (previously deleted and recreated every run, churning DB state)
- Fix human token creation: use admin_pass for basic-auth since
  human_user == admin_user (previously used a random password that
  never matched the actual user password, so HUMAN_TOKEN was never
  created successfully)
- Preserve HUMAN_TOKEN in .env on re-run (same pattern as bot tokens)
- Bot tokens were already idempotent (preserved unless --rotate-tokens)

Add --dry-run flag that reports every intended action (file writes,
API calls, docker commands) based on current state, then exits 0
without touching state. Useful for CI gating and cutover confidence.

Update smoke test:
- Add dry-run test (verifies exit 0 and no .env modification)
- Add idempotency state diff (verifies .env is unchanged on re-run)
- Verify FORGE_ADMIN_TOKEN and HUMAN_TOKEN are stored in .env

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:37:22 +00:00
55cce66468 Merge pull request 'fix: [nomad-prep] P4 — scaffold lib/hvault.sh (HashiCorp Vault helper module) (#799)' (#814) from fix/issue-799 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 22:08:48 +00:00
Claude
14458f1f17 fix: address review — jq-safe JSON construction in hvault.sh
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- _hvault_err: use jq instead of printf to produce valid JSON on all inputs
- hvault_kv_get: use jq --arg for key lookup to prevent filter injection
- hvault_kv_put: build payload entirely via jq to properly escape keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:27:34 +00:00
Claude
fbb246c626 fix: [nomad-prep] P4 — scaffold lib/hvault.sh (HashiCorp Vault helper module) (#799)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:15:44 +00:00
faf6490877 Merge pull request 'fix: [nomad-prep] P11 — wire lib/secret-scan.sh into Woodpecker CI gate (#798)' (#813) from fix/issue-798 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 21:09:04 +00:00
Claude
88b377ecfb fix: add file package for binary detection, document shallow-clone tradeoff
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:03:05 +00:00
Claude
d020847772 fix: [nomad-prep] P11 — wire lib/secret-scan.sh into Woodpecker CI gate (#798)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:56:01 +00:00
98ec610645 Merge pull request 'fix: [nomad-prep] P10 — audit lib/ + compose for docker-backend-isms (#797)' (#812) from fix/issue-797 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 20:50:50 +00:00
Claude
f8c3ada077 fix: [nomad-prep] P10 — audit lib/ + compose for docker-backend-isms (#797)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Sites touched:
- lib/generators.sh: WOODPECKER_BACKEND_DOCKER_NETWORK now reads from
  ${WOODPECKER_CI_NETWORK:-disinto_disinto-net} so nomad jobspecs can
  override the compose-generated network name.
- lib/forge-setup.sh: bare-mode _forgejo_exec() and setup_forge() use
  ${FORGEJO_CONTAINER_NAME:-disinto-forgejo} instead of hardcoding the
  container name. Compose mode is unaffected (uses service name).

Documented exceptions (container_name directives in generators.sh
compose template output): these define names inside docker-compose.yml,
which is compose-specific output. Under nomad the generator is not used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:39:47 +00:00
8315a4ecf5 Merge pull request 'fix: [nomad-prep] P8 — spot-check lib/mirrors.sh against empty Forgejo target (#796)' (#811) from fix/issue-796 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 20:35:38 +00:00
Claude
b6f2d83a28 fix: use FORGE_API_BASE for /repos/migrate endpoint, build payload with jq
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- FORGE_API is repo-scoped; /repos/migrate needs the global FORGE_API_BASE
- Use jq -n --arg for safe JSON construction (no shell interpolation)
- Update docs to reference FORGE_API_BASE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:29:27 +00:00
Claude
2465841b84 fix: [nomad-prep] P8 — spot-check lib/mirrors.sh against empty Forgejo target (#796)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:22:11 +00:00
5c40b59359 Merge pull request 'fix: [nomad-prep] P6 — externalize host paths in docker-compose via env vars (#795)' (#810) from fix/issue-795 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 20:17:43 +00:00
Claude
19f10e33e6 fix: [nomad-prep] P6 — externalize host paths in docker-compose via env vars (#795)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Replace hardcoded host-side bind-mount paths with env vars so Nomad
jobspecs can reuse the same variables at cutover:

- CLAUDE_BIN_DIR: path to claude CLI binary (resolved at init time)
- CLAUDE_CONFIG_FILE: path to .claude.json (default ${HOME}/.claude.json)
- CLAUDE_DIR: path to .claude directory (default ${HOME}/.claude)
- AGENT_SSH_DIR: path to SSH keys (default ${HOME}/.ssh)
- SOPS_AGE_DIR: path to SOPS age keys (default ${HOME}/.config/sops/age)

generators.sh now writes CLAUDE_BIN_DIR to .env instead of sed-replacing
CLAUDE_BIN_PLACEHOLDER in docker-compose.yml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:01:47 +00:00
6a4ca5c3a0 Merge pull request 'fix: [nomad-prep] P5 — add healthchecks to agents, edge, staging, woodpecker-agent (#794)' (#809) from fix/issue-794 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 19:55:25 +00:00
Claude
8799a8c676 fix: [nomad-prep] P5 — add healthchecks to agents, edge, staging, woodpecker-agent (#794)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Add Docker healthcheck blocks so Nomad check stanzas map 1:1 at migration:

- agents / agents-llama: pgrep -f entrypoint.sh (60s interval)
- woodpecker-agent: wget healthz on :3333 (30s interval)
- edge: curl Caddy admin API on :2019 (30s interval)
- staging: wget Caddy admin API on :2019 (30s interval)
- chat: add /health endpoint to server.py (no-auth 200 OK), fix
  Dockerfile HEALTHCHECK to use it, add compose-level healthcheck

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:39:35 +00:00
3b366ad96e Merge pull request 'fix: [nomad-prep] P3 — add load_secret() abstraction to lib/env.sh (#793)' (#808) from fix/issue-793 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 19:29:50 +00:00
Claude
aa298eb2ad fix: reorder test boilerplate to avoid duplicate-detection false positive
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:18:39 +00:00
Claude
9dbc43ab23 fix: [nomad-prep] P3 — add load_secret() abstraction to lib/env.sh (#793)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:15:50 +00:00
1d4e28843e Merge pull request 'fix: infra: _regen_file does not restore stash if generator fails — compose file lost at temp path (#784)' (#807) from fix/issue-784 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 19:06:36 +00:00
Claude
f90702f930 fix: infra: _regen_file does not restore stash if generator fails — compose file lost at temp path (#784)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:55:51 +00:00
defec3b255 Merge pull request 'fix: feat: consolidate secret stores — single granular secrets/*.enc, deprecate .env.vault.enc (#777)' (#806) from fix/issue-777 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 18:46:12 +00:00
Claude
88676e65ae fix: feat: consolidate secret stores — single granular secrets/*.enc, deprecate .env.vault.enc (#777)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:35:03 +00:00
a87dcdf40b Merge pull request 'chore: gardener housekeeping' (#805) from chore/gardener-20260415-1816 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 18:23:21 +00:00
b8cb8c5c32 Merge pull request 'fix: [nomad-prep] P0 — rename lib/vault.sh + vault/ to action-vault namespace (#792)' (#804) from fix/issue-792 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 18:22:49 +00:00
Claude
0937707fe5 chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 18:16:44 +00:00
Claude
e9a018db5c fix: [nomad-prep] P0 — rename lib/vault.sh + vault/ to action-vault namespace (#792)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:16:32 +00:00
18190874ca Merge pull request 'fix: infra: edge-control install.sh overwrites /etc/caddy/Caddyfile with no carve-out for apex/static sites — landing page lost on install (#788)' (#791) from fix/issue-788 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 16:48:46 +00:00
Claude
5a2a9e1c74 fix: infra: edge-control install.sh overwrites /etc/caddy/Caddyfile with no carve-out for apex/static sites — landing page lost on install (#788)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:42:30 +00:00
182c40b9fc Merge pull request 'fix: bug: edge-control add_route targets non-existent Caddy server edge — registration succeeds in registry but traffic never routes (#789)' (#790) from fix/issue-789 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 16:37:19 +00:00
Claude
241ce96046 fix: remove invalid servers { name edge } Caddyfile directive
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
`name` is not a valid subdirective of the global `servers` block in
Caddyfile syntax — Caddy would reject the config on startup. The
dynamic server discovery in `_discover_server_name()` already handles
routing to the correct server regardless of its auto-generated name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:31:09 +00:00
Claude
987413ab3a fix: bug: edge-control add_route targets non-existent Caddy server edge — registration succeeds in registry but traffic never routes (#789)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- install.sh: use Caddy `servers { name edge }` global option so the
  emitted Caddyfile produces a predictably-named server
- lib/caddy.sh: add `_discover_server_name` that queries the admin API
  for the first server listening on :80/:443 — add_route and remove_route
  use dynamic discovery instead of hardcoding `/servers/edge/`
- lib/caddy.sh: add_route, remove_route, and reload_caddy now check HTTP
  status codes (≥400 → return 1 with error message) instead of only
  checking curl exit code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:24:24 +00:00
02e86c3589 Merge pull request 'fix: planner: replace direct push with pr-lifecycle (mirror architect ops flow) (#765)' (#787) from fix/issue-765 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 14:40:14 +00:00
Claude
175716a847 fix: planner: replace direct push with pr-lifecycle (mirror architect ops flow) (#765)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Planner phase 5 pushed ops repo changes directly to main, which branch
protection blocks. Replace with the same PR-based flow architect uses:

- planner-run.sh: create branch planner/run-YYYY-MM-DD in ops repo before
  agent_run, then pr_create + pr_walk_to_merge after agent completes
- run-planner.toml: formula now pushes HEAD (the branch) instead of
  PRIMARY_BRANCH directly
- planner/AGENTS.md: update phase 5 description to reflect PR flow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:28:49 +00:00
d6c8fd8127 Merge pull request 'fix: feat: disinto secrets add — accept piped stdin for non-interactive imports (#776)' (#786) from fix/issue-776 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 14:19:47 +00:00
Claude
5dda6dc8e9 fix: feat: disinto secrets add — accept piped stdin for non-interactive imports (#776)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:08:28 +00:00
49cc870f54 Merge pull request 'fix: infra: deprecate tracked docker/Caddyfilegenerate_caddyfile is the single source of truth (#771)' (#785) from fix/issue-771 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 11:40:44 +00:00
Claude
ec7bc8ff2c fix: infra: deprecate tracked docker/Caddyfilegenerate_caddyfile is the single source of truth (#771)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- Add docker/Caddyfile to .gitignore (generated artifact, not tracked)
- Document generate_caddyfile as canonical source in lib/generators.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:29:56 +00:00
f27c66a7e0 Merge pull request 'fix: infra: disinto up should regenerate compose/Caddyfile from lib/generators.sh and reconcile orphans before docker compose up -d (#770)' (#783) from fix/issue-770 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 11:23:28 +00:00
Claude
53ce7ad475 fix: infra: disinto up should regenerate compose/Caddyfile from lib/generators.sh and reconcile orphans before docker compose up -d (#770)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
- Add `_regen_file` helper that idempotently regenerates a file: moves
  existing file aside, runs the generator, compares output byte-for-byte,
  and either restores the original (preserving mtime) or keeps the new
  version with a `.prev` backup.
- `disinto_up` now calls `generate_compose` and `generate_caddyfile`
  before bringing the stack up, ensuring generator changes are applied.
- Pass `--build --remove-orphans` to `docker compose up -d` so image
  rebuilds and orphan container cleanup happen automatically.
- Add `--no-regen` escape hatch that skips regeneration and prints a
  warning for operators debugging generators or testing hand-edits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:12:38 +00:00
c644660bda Merge pull request 'fix: infra: CI broken on main — missing WOODPECKER_PLUGINS_PRIVILEGED server env + misplaced .woodpecker/ops-filer.yml in project repo (#779)' (#782) from fix/issue-779 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 11:07:27 +00:00
91f36b2692 Merge pull request 'chore: gardener housekeeping' (#781) from chore/gardener-20260415-1007 into main
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/ops-filer Pipeline failed
2026-04-15 11:02:55 +00:00
Claude
a8d393f3bd fix: infra: CI broken on main — missing WOODPECKER_PLUGINS_PRIVILEGED server env + misplaced .woodpecker/ops-filer.yml in project repo (#779)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Part 1: Add WOODPECKER_PLUGINS_PRIVILEGED to woodpecker service environment
in lib/generators.sh, defaulting to plugins/docker, overridable via .env.
Document the new key in .env.example.

Part 2: Delete .woodpecker/ops-filer.yml from project repo — it belongs in
the ops repo and references secrets that don't exist here. Full ops-side
filer setup deferred until sprint PRs need it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:56:39 +00:00
d0c0ef724a Merge pull request 'fix: infra: agents-llama (local-Qwen dev agent) is hand-added to docker-compose.yml — move into lib/generators.sh as a flagged service (#769)' (#780) from fix/issue-769 into main
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/ops-filer Pipeline failed
2026-04-15 10:09:43 +00:00
Claude
539862679d chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 10:07:41 +00:00
250788952f Merge pull request 'fix: feat: publish versioned agent images — compose should use image: not build: (#429)' (#775) from fix/issue-429 into main
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/ops-filer Pipeline failed
2026-04-15 10:04:58 +00:00
Claude
0104ac06a8 fix: infra: agents-llama (local-Qwen dev agent) is hand-added to docker-compose.yml — move into lib/generators.sh as a flagged service (#769)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:58:44 +00:00
c71b6d4f95 ci: retrigger after WOODPECKER_PLUGINS_PRIVILEGED fix
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
2026-04-15 09:46:24 +00:00
Claude
92f19cb2b3 feat: publish versioned agent images — compose should use image: not build: (#429)
- Generated compose now uses `image: ghcr.io/disinto/{agents,edge}` instead
  of `build:` directives; `disinto init --build` restores local-build mode
- Add VOLUME declarations to agents, reproduce, and edge Dockerfiles
- Add CI pipeline (.woodpecker/publish-images.yml) to build and push images
  to ghcr.io/disinto on tag events
- Mount projects/, .env, and state/ into agents container for runtime config
- Skip pre-build binary download when compose uses registry images

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:24:05 +00:00
be463c5b43 Merge pull request 'fix: infra: edge service missing restart: unless-stopped in lib/generators.sh (#768)' (#774) from fix/issue-768 into main 2026-04-15 09:12:48 +00:00
Claude
0baac1a7d8 fix: infra: edge service missing restart: unless-stopped in lib/generators.sh (#768)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:03:26 +00:00
0db4c84818 Merge pull request 'chore: gardener housekeeping' (#767) from chore/gardener-20260415-0806 into main 2026-04-15 08:57:11 +00:00
378da77adf Merge pull request 'fix: bug: architect pitch prompt guardrail is prose-only — model bypasses "NEVER call Forgejo API" via Bash tool; fix via permission scoping + PR-driven sub-issue filing (#764)' (#766) from fix/issue-764 into main 2026-04-15 08:57:07 +00:00
Claude
fd9ba028bc chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 08:06:14 +00:00
Claude
707aae287a fix: reuse forge_api_all from env.sh in sprint-filer.sh to avoid duplicate pagination code
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The duplicate-detection CI step (baseline mode) flags new code blocks that
match existing patterns. filer_api_all reimplemented the same pagination
logic as forge_api_all in env.sh. Replace with a one-liner wrapper that
delegates to forge_api_all with FORGE_FILER_TOKEN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:59:56 +00:00
Claude
0be36dd502 fix: address review — update architect/AGENTS.md, fix pagination and section targeting in sprint-filer.sh
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
- architect/AGENTS.md: update responsibilities, state transitions, vision
  lifecycle, and execution sections to reflect read-only role and filer-bot
  architecture (#764)
- lib/sprint-filer.sh: add filer_api_all() paginated fetch helper; fix
  subissue_exists() and check_and_close_completed_visions() to paginate
  instead of using fixed limits that miss issues on large trackers
- lib/sprint-filer.sh: fix extract_vision_issue() to look specifically in
  the "## Vision issues" section before falling back to first #N in file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:57:20 +00:00
Claude
2c9b8e386f fix: rename awk variable in_body to inbody to avoid smoke test false positive
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/smoke-init Pipeline was successful
The agent-smoke.sh function resolution checker matches lowercase_underscore
identifiers as potential bash function calls. The awk variable `in_body`
inside sprint-filer.sh's heredoc triggered a false [undef] failure.
Also fixes SC2155 (declare and assign separately) in the same file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:43:49 +00:00
Claude
04ff8a6e85 fix: bug: architect pitch prompt guardrail is prose-only — model bypasses "NEVER call Forgejo API" via Bash tool; fix via permission scoping + PR-driven sub-issue filing (#764)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed
Shift the guardrail from prose prompt constraints into Forgejo's permission
layer. architect-bot loses all write access on the project repo (now read-only
for context gathering). Sub-issues are produced by a new filer-bot identity
that runs only after a human merges a sprint PR on the ops repo.

Changes:
- architect-run.sh: remove all project-repo writes (add_inprogress_label,
  close_vision_issue, check_and_close_completed_visions); add ## Sub-issues
  block to pitch format with filer:begin/end markers
- formulas/run-architect.toml: add Sub-issues schema to pitch format; strip
  issue-creation API refs; document read-only constraint on project repo
- lib/formula-session.sh: remove Create issue curl template from
  build_prompt_footer (architect cannot create issues)
- lib/sprint-filer.sh (new): parser + idempotent filer using FORGE_FILER_TOKEN;
  parses filer:begin/end blocks, creates issues with decomposed-from markers,
  adds in-progress label, handles vision lifecycle closure
- .woodpecker/ops-filer.yml (new): CI pipeline on ops repo main-branch push
  that invokes sprint-filer.sh after sprint PR merge
- lib/env.sh, .env.example, docker-compose.yml: add FORGE_FILER_TOKEN for
  filer-bot identity; add filer-bot to FORGE_BOT_USERNAMES
- AGENTS.md: add Filer agent entry; update in-progress label docs
- .woodpecker/agent-smoke.sh: register sprint-filer.sh for smoke test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:41:16 +00:00
10c7a88416 Merge pull request 'fix: bug: architect FORGE_TOKEN override nullified when env.sh re-sources .env — agent actions authored as dev-bot (#762)' (#763) from fix/issue-762 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 07:29:53 +00:00
Claude
66ba93a840 fix: add allowlist entry for standard lib source block in duplicate detection
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
The FORGE_TOKEN_OVERRIDE fix shifted line numbers in agent run scripts,
causing the shared source block (env.sh, formula-session.sh, worktree.sh,
guard.sh, agent-sdk.sh) to register as a new duplicate. This is
intentional boilerplate shared across all formula-driven agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:18:42 +00:00
Claude
aff9f0fcef fix: bug: architect FORGE_TOKEN override nullified when env.sh re-sources .env — agent actions authored as dev-bot (#762)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
Use FORGE_TOKEN_OVERRIDE (set before sourcing env.sh) instead of
post-source FORGE_TOKEN reassignment in all five agent run scripts.
The override mechanism in lib/env.sh:98-100 survives re-sourcing from
nested shells and claude -p tool invocations.

Affected scripts: architect-run.sh, planner-run.sh, gardener-run.sh,
predictor-run.sh, supervisor-run.sh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:15:28 +00:00
c7a1c444e9 Merge pull request 'fix: feat: collect-engagement formula + container script — SSH fetch + local parse + evidence commit (#745)' (#761) from fix/issue-745 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 07:04:15 +00:00
Claude
8a5537fefc fix: feat: collect-engagement formula + container script — SSH fetch + local parse + evidence commit (#745)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:01:37 +00:00
34fd7868e4 Merge pull request 'chore: gardener housekeeping' (#760) from chore/gardener-20260415-0408 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 06:53:12 +00:00
Claude
0b4905af3d chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 04:08:04 +00:00
cdb0408466 Merge pull request 'chore: gardener housekeeping' (#759) from chore/gardener-20260415-0300 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 03:03:27 +00:00
Claude
32420c619d chore: gardener housekeeping 2026-04-15
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-15 03:00:40 +00:00
3757d9d919 Merge pull request 'chore: gardener housekeeping' (#757) from chore/gardener-20260414-2254 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-15 02:02:49 +00:00
b95e2da645 Merge pull request 'fix: docs: rent-a-human instructions for Caddy host SSH key setup (#748)' (#756) from fix/issue-748 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-14 22:56:05 +00:00
Claude
5733a10858 chore: gardener housekeeping 2026-04-14
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
2026-04-14 22:54:30 +00:00
Claude
9b0ecc40dc fix: docs: rent-a-human instructions for Caddy host SSH key setup (#748)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 22:50:20 +00:00
ba3a11fa9d Merge pull request 'fix: bug: entrypoint.sh wait (no-args) serializes polling loop behind long-lived dev-agent/gardener — causes system-wide deadlock (#753)' (#755) from fix/issue-753 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-14 22:43:49 +00:00
Claude
6af8f002f5 fix: bug: entrypoint.sh wait (no-args) serializes polling loop behind long-lived dev-agent/gardener — causes system-wide deadlock (#753)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 22:37:24 +00:00
c5b0b1dc23 Merge pull request 'fix: investigation: CI exhaustion pattern on chat sub-issues #707 and #712 — 3+ failures each (#742)' (#754) from fix/issue-742 into main
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
2026-04-14 22:05:36 +00:00
83 changed files with 5521 additions and 924 deletions

View file

@ -1,8 +1,7 @@
# Secrets — prevent .env files from being baked into the image # Secrets — prevent .env files and encrypted secrets from being baked into the image
.env .env
.env.enc .env.enc
.env.vault secrets/
.env.vault.enc
# Version control — .git is huge and not needed in image # Version control — .git is huge and not needed in image
.git .git

View file

@ -25,8 +25,16 @@ FORGE_URL=http://localhost:3000 # [CONFIG] local Forgejo instance
# - FORGE_TOKEN_<BOT> = API token for REST calls (user identity via /api/v1/user) # - FORGE_TOKEN_<BOT> = API token for REST calls (user identity via /api/v1/user)
# - FORGE_PASS_<BOT> = password for git HTTP push (#361, Forgejo 11.x limitation) # - FORGE_PASS_<BOT> = password for git HTTP push (#361, Forgejo 11.x limitation)
# #
# Local-model agents (agents-llama) use FORGE_TOKEN_LLAMA / FORGE_PASS_LLAMA # Local-model agents hired with `disinto hire-an-agent` are keyed by *agent
# with FORGE_BOT_USER_LLAMA=dev-qwen to ensure correct attribution (#563). # name* (not role), so multiple local-model dev agents can coexist without
# colliding on credentials (#834). For an agent named `dev-qwen2` the vars are:
# - FORGE_TOKEN_DEV_QWEN2
# - FORGE_PASS_DEV_QWEN2
# Name conversion: tr 'a-z-' 'A-Z_' (lowercase→UPPER, hyphens→underscores).
# The compose generator looks these up via the agent's `forge_user` field in
# the project TOML. The pre-existing `dev-qwen` llama agent uses
# FORGE_TOKEN_LLAMA / FORGE_PASS_LLAMA (kept for backwards-compat with the
# legacy `ENABLE_LLAMA_AGENT=1` single-agent path).
FORGE_TOKEN= # [SECRET] dev-bot API token (default for all agents) FORGE_TOKEN= # [SECRET] dev-bot API token (default for all agents)
FORGE_PASS= # [SECRET] dev-bot password for git HTTP push (#361) FORGE_PASS= # [SECRET] dev-bot password for git HTTP push (#361)
FORGE_TOKEN_LLAMA= # [SECRET] dev-qwen API token (for agents-llama) FORGE_TOKEN_LLAMA= # [SECRET] dev-qwen API token (for agents-llama)
@ -45,7 +53,9 @@ FORGE_PREDICTOR_TOKEN= # [SECRET] predictor-bot API token
FORGE_PREDICTOR_PASS= # [SECRET] predictor-bot password for git HTTP push FORGE_PREDICTOR_PASS= # [SECRET] predictor-bot password for git HTTP push
FORGE_ARCHITECT_TOKEN= # [SECRET] architect-bot API token FORGE_ARCHITECT_TOKEN= # [SECRET] architect-bot API token
FORGE_ARCHITECT_PASS= # [SECRET] architect-bot password for git HTTP push FORGE_ARCHITECT_PASS= # [SECRET] architect-bot password for git HTTP push
FORGE_BOT_USERNAMES=dev-bot,review-bot,planner-bot,gardener-bot,vault-bot,supervisor-bot,predictor-bot,architect-bot FORGE_FILER_TOKEN= # [SECRET] filer-bot API token (issues:write on project repo only)
FORGE_FILER_PASS= # [SECRET] filer-bot password for git HTTP push
FORGE_BOT_USERNAMES=dev-bot,review-bot,planner-bot,gardener-bot,vault-bot,supervisor-bot,predictor-bot,architect-bot,filer-bot
# ── Backwards compatibility ─────────────────────────────────────────────── # ── Backwards compatibility ───────────────────────────────────────────────
# If CODEBERG_TOKEN is set but FORGE_TOKEN is not, env.sh falls back to # If CODEBERG_TOKEN is set but FORGE_TOKEN is not, env.sh falls back to
@ -61,6 +71,10 @@ FORGE_BOT_USERNAMES=dev-bot,review-bot,planner-bot,gardener-bot,vault-bot,superv
WOODPECKER_TOKEN= # [SECRET] Woodpecker API token WOODPECKER_TOKEN= # [SECRET] Woodpecker API token
WOODPECKER_SERVER=http://localhost:8000 # [CONFIG] Woodpecker server URL WOODPECKER_SERVER=http://localhost:8000 # [CONFIG] Woodpecker server URL
WOODPECKER_AGENT_SECRET= # [SECRET] shared secret for server↔agent auth (auto-generated) WOODPECKER_AGENT_SECRET= # [SECRET] shared secret for server↔agent auth (auto-generated)
# Woodpecker privileged-plugin allowlist — comma-separated image names
# Add plugins/docker (and others) here to allow privileged execution
WOODPECKER_PLUGINS_PRIVILEGED=plugins/docker
# WOODPECKER_REPO_ID — now per-project, set in projects/*.toml [ci] section # WOODPECKER_REPO_ID — now per-project, set in projects/*.toml [ci] section
# Woodpecker Postgres (for direct DB queries) # Woodpecker Postgres (for direct DB queries)
@ -77,24 +91,42 @@ FORWARD_AUTH_SECRET= # [SECRET] Shared secret for Caddy ↔
# ── Vault-only secrets (DO NOT put these in .env) ──────────────────────── # ── Vault-only secrets (DO NOT put these in .env) ────────────────────────
# These tokens grant access to external systems (GitHub, ClawHub, deploy targets). # These tokens grant access to external systems (GitHub, ClawHub, deploy targets).
# They live ONLY in .env.vault.enc and are injected into the ephemeral runner # They live ONLY in secrets/<NAME>.enc (age-encrypted, one file per key) and are
# container at fire time (#745). lib/env.sh explicitly unsets them so agents # decrypted into the ephemeral runner container at fire time (#745, #777).
# can never hold them directly — all external actions go through vault dispatch. # lib/env.sh explicitly unsets them so agents can never hold them directly —
# all external actions go through vault dispatch.
# #
# GITHUB_TOKEN — GitHub API access (publish, deploy, post) # GITHUB_TOKEN — GitHub API access (publish, deploy, post)
# CLAWHUB_TOKEN — ClawHub registry credentials (publish) # CLAWHUB_TOKEN — ClawHub registry credentials (publish)
# CADDY_SSH_KEY — SSH key for Caddy log collection
# (deploy keys) — SSH keys for deployment targets # (deploy keys) — SSH keys for deployment targets
# #
# To manage vault secrets: disinto secrets edit-vault # To manage secrets: disinto secrets add/show/remove/list
# (vault redesign in progress: PR-based approval, see #73-#77)
# ── Project-specific secrets ────────────────────────────────────────────── # ── Project-specific secrets ──────────────────────────────────────────────
# Store all project secrets here so formulas reference env vars, never hardcode. # Store all project secrets here so formulas reference env vars, never hardcode.
BASE_RPC_URL= # [SECRET] on-chain RPC endpoint BASE_RPC_URL= # [SECRET] on-chain RPC endpoint
# ── Local Qwen dev agent (optional) ──────────────────────────────────────
# Set ENABLE_LLAMA_AGENT=1 to emit agents-llama in docker-compose.yml.
# Requires a running llama-server reachable at ANTHROPIC_BASE_URL.
# See docs/agents-llama.md for details.
ENABLE_LLAMA_AGENT=0 # [CONFIG] 1 = enable agents-llama service
ANTHROPIC_BASE_URL= # [CONFIG] e.g. http://host.docker.internal:8081
# ── Tuning ──────────────────────────────────────────────────────────────── # ── Tuning ────────────────────────────────────────────────────────────────
CLAUDE_TIMEOUT=7200 # [CONFIG] max seconds per Claude invocation CLAUDE_TIMEOUT=7200 # [CONFIG] max seconds per Claude invocation
# ── Host paths (Nomad-portable) ────────────────────────────────────────────
# These env vars externalize host-side bind-mount paths from docker-compose.yml.
# At cutover, Nomad jobspecs reference the same vars — no path translation.
# Defaults point at current paths so an empty .env override still works.
CLAUDE_BIN_DIR=/usr/local/bin/claude # [CONFIG] host path to claude CLI binary (resolved by `disinto init`)
CLAUDE_CONFIG_FILE=${HOME}/.claude.json # [CONFIG] host path to claude config JSON file
CLAUDE_DIR=${HOME}/.claude # [CONFIG] host path to .claude directory (reproduce/edge)
AGENT_SSH_DIR=${HOME}/.ssh # [CONFIG] host path to SSH keys directory
SOPS_AGE_DIR=${HOME}/.config/sops/age # [CONFIG] host path to SOPS age key directory
# ── Claude Code shared OAuth state ───────────────────────────────────────── # ── Claude Code shared OAuth state ─────────────────────────────────────────
# Shared directory used by every factory container so Claude Code's internal # Shared directory used by every factory container so Claude Code's internal
# proper-lockfile-based OAuth refresh lock works across containers. Both # proper-lockfile-based OAuth refresh lock works across containers. Both

4
.gitignore vendored
View file

@ -3,7 +3,6 @@
# Encrypted secrets — safe to commit (SOPS-encrypted with age) # Encrypted secrets — safe to commit (SOPS-encrypted with age)
!.env.enc !.env.enc
!.env.vault.enc
!.sops.yaml !.sops.yaml
# Per-box project config (generated by disinto init) # Per-box project config (generated by disinto init)
@ -33,6 +32,9 @@ docker/agents/bin/
# Note: This file is now committed to track volume mount configuration # Note: This file is now committed to track volume mount configuration
# docker-compose.yml # docker-compose.yml
# Generated Caddyfile — single source of truth is generate_caddyfile in lib/generators.sh
docker/Caddyfile
# Python bytecode # Python bytecode
__pycache__/ __pycache__/
*.pyc *.pyc

View file

@ -213,6 +213,7 @@ check_script lib/issue-lifecycle.sh lib/secret-scan.sh
# Still checked for function resolution against LIB_FUNS + own definitions. # Still checked for function resolution against LIB_FUNS + own definitions.
check_script lib/ci-debug.sh check_script lib/ci-debug.sh
check_script lib/parse-deps.sh check_script lib/parse-deps.sh
check_script lib/sprint-filer.sh
# Agent scripts — list cross-sourced files where function scope flows across files. # Agent scripts — list cross-sourced files where function scope flows across files.
check_script dev/dev-agent.sh check_script dev/dev-agent.sh

View file

@ -292,6 +292,8 @@ def main() -> int:
"21aec56a99d5252b23fb9a38b895e8e8": "Verification helper: check body for Decomposed from pattern", "21aec56a99d5252b23fb9a38b895e8e8": "Verification helper: check body for Decomposed from pattern",
"60ea98b3604557d539193b2a6624e232": "Verification helper: append sub-issue number", "60ea98b3604557d539193b2a6624e232": "Verification helper: append sub-issue number",
"9f6ae8e7811575b964279d8820494eb0": "Verification helper: for loop done pattern", "9f6ae8e7811575b964279d8820494eb0": "Verification helper: for loop done pattern",
# Standard lib source block shared across formula-driven agent run scripts
"330e5809a00b95ade1a5fce2d749b94b": "Standard lib source block (env.sh, formula-session.sh, worktree.sh, guard.sh, agent-sdk.sh)",
} }
if not sh_files: if not sh_files:

View file

@ -0,0 +1,102 @@
# =============================================================================
# .woodpecker/nomad-validate.yml — Static validation for Nomad+Vault artifacts
#
# Part of the Nomad+Vault migration (S0.5, issue #825). Locks in the
# "no-ad-hoc-steps" principle: every HCL/shell artifact under nomad/ or
# lib/init/nomad/, plus the `disinto init` dispatcher, gets checked
# before it can land.
#
# Triggers on PRs (and pushes) that touch any of:
# nomad/** — HCL configs (server, client, vault)
# lib/init/nomad/** — cluster-up / install / systemd / vault-init
# bin/disinto — `disinto init --backend=nomad` dispatcher
# tests/disinto-init-nomad.bats — the bats suite itself
# .woodpecker/nomad-validate.yml — the pipeline definition
#
# Steps (all fail-closed — any error blocks merge):
# 1. nomad-config-validate — `nomad config validate` on server + client HCL
# 2. vault-operator-diagnose — `vault operator diagnose` syntax check on vault.hcl
# 3. shellcheck-nomad — shellcheck the cluster-up + install scripts + disinto
# 4. bats-init-nomad — `disinto init --backend=nomad --dry-run` smoke tests
#
# Pinned image versions match lib/init/nomad/install.sh (nomad 1.9.5 /
# vault 1.18.5). Bump there AND here together — drift = CI passing on
# syntax the runtime would reject.
# =============================================================================
when:
- event: [push, pull_request]
path:
- "nomad/**"
- "lib/init/nomad/**"
- "bin/disinto"
- "tests/disinto-init-nomad.bats"
- ".woodpecker/nomad-validate.yml"
# Authenticated clone — same pattern as .woodpecker/ci.yml. Forgejo is
# configured with REQUIRE_SIGN_IN, so anonymous git clones fail (exit 128).
# FORGE_TOKEN is injected globally via WOODPECKER_ENVIRONMENT.
clone:
git:
image: alpine/git
commands:
- AUTH_URL=$(printf '%s' "$CI_REPO_CLONE_URL" | sed "s|://|://token:$FORGE_TOKEN@|")
- git clone --depth 1 "$AUTH_URL" .
- git fetch --depth 1 origin "$CI_COMMIT_REF"
- git checkout FETCH_HEAD
steps:
# ── 1. Nomad HCL syntax check ────────────────────────────────────────────
# `nomad config validate` parses server.hcl + client.hcl and fails on any
# HCL/semantic error (unknown block, invalid port range, bad driver cfg).
# vault.hcl is excluded — it's a Vault config, not Nomad, so it goes
# through the vault-operator-diagnose step instead.
- name: nomad-config-validate
image: hashicorp/nomad:1.9.5
commands:
- nomad config validate nomad/server.hcl nomad/client.hcl
# ── 2. Vault HCL syntax check ────────────────────────────────────────────
# `vault operator diagnose` loads the config and runs a suite of checks.
# Exit codes:
# 0 — all checks green
# 1 — at least one hard failure (bad HCL, bad schema, unreachable storage)
# 2 — advisory warnings only (no hard failure)
# Our factory dev-box vault.hcl deliberately runs TLS-disabled on a
# localhost-only listener (documented in nomad/vault.hcl), which triggers
# an advisory "Check Listener TLS" warning → exit 2. The config still
# parses, so we tolerate exit 2 and fail only on exit 1 or crashes.
# -skip=storage/-skip=listener disables the runtime-only checks (vault's
# container has /vault/file so storage is fine, but explicit skip is cheap
# insurance against future container-image drift).
- name: vault-operator-diagnose
image: hashicorp/vault:1.18.5
commands:
- |
rc=0
vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener || rc=$?
case "$rc" in
0) echo "vault config: all checks green" ;;
2) echo "vault config: parse OK (rc=2 — advisory warnings only; TLS-disabled on localhost listener is by design)" ;;
*) echo "vault config: hard failure (rc=$rc)" >&2; exit "$rc" ;;
esac
# ── 3. Shellcheck ────────────────────────────────────────────────────────
# Covers the new lib/init/nomad/*.sh scripts plus bin/disinto (which owns
# the backend dispatcher). bin/disinto has no .sh extension so the
# repo-wide shellcheck in .woodpecker/ci.yml skips it — this step is the
# one place it gets checked.
- name: shellcheck-nomad
image: koalaman/shellcheck-alpine:stable
commands:
- shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto
# ── 4. bats: `disinto init --backend=nomad --dry-run` ────────────────────
# Smoke-tests the CLI dispatcher: both --backend=nomad variants exit 0
# with the expected step list, and --backend=docker stays on the docker
# path (regression guard). Pure dry-run — no sudo, no network.
- name: bats-init-nomad
image: alpine:3.19
commands:
- apk add --no-cache bash bats
- bats tests/disinto-init-nomad.bats

View file

@ -0,0 +1,64 @@
# .woodpecker/publish-images.yml — Build and push versioned container images
# Triggered on tag pushes (e.g. v1.2.3). Builds and pushes:
# - ghcr.io/disinto/agents:<tag>
# - ghcr.io/disinto/reproduce:<tag>
# - ghcr.io/disinto/edge:<tag>
#
# Requires GHCR_TOKEN secret configured in Woodpecker with push access
# to ghcr.io/disinto.
when:
event: tag
ref: refs/tags/v*
clone:
git:
image: alpine/git
commands:
- AUTH_URL=$(printf '%s' "$CI_REPO_CLONE_URL" | sed "s|://|://token:$FORGE_TOKEN@|")
- git clone --depth 1 "$AUTH_URL" .
- git fetch --depth 1 origin "$CI_COMMIT_REF"
- git checkout FETCH_HEAD
steps:
- name: build-and-push-agents
image: plugins/docker
settings:
repo: ghcr.io/disinto/agents
registry: ghcr.io
dockerfile: docker/agents/Dockerfile
context: .
tags:
- ${CI_COMMIT_TAG}
- latest
username: disinto
password:
from_secret: GHCR_TOKEN
- name: build-and-push-reproduce
image: plugins/docker
settings:
repo: ghcr.io/disinto/reproduce
registry: ghcr.io
dockerfile: docker/reproduce/Dockerfile
context: .
tags:
- ${CI_COMMIT_TAG}
- latest
username: disinto
password:
from_secret: GHCR_TOKEN
- name: build-and-push-edge
image: plugins/docker
settings:
repo: ghcr.io/disinto/edge
registry: ghcr.io
dockerfile: docker/edge/Dockerfile
context: docker/edge
tags:
- ${CI_COMMIT_TAG}
- latest
username: disinto
password:
from_secret: GHCR_TOKEN

View file

@ -0,0 +1,68 @@
#!/usr/bin/env bash
set -euo pipefail
# run-secret-scan.sh — CI wrapper for lib/secret-scan.sh
#
# Scans files changed in this PR for plaintext secrets.
# Exits non-zero if any secret is detected.
# shellcheck source=../lib/secret-scan.sh
source lib/secret-scan.sh
# Path patterns considered secret-adjacent
SECRET_PATH_PATTERNS=(
'\.env'
'tools/vault-.*\.sh'
'nomad/'
'vault/'
'action-vault/'
'lib/hvault\.sh'
'lib/action-vault\.sh'
)
# Build a single regex from patterns
path_regex=$(printf '%s|' "${SECRET_PATH_PATTERNS[@]}")
path_regex="${path_regex%|}"
# Get files changed in this PR vs target branch.
# Note: shallow clone (depth 50) may lack the merge base for very large PRs,
# causing git diff to fail — || true means the gate skips rather than blocks.
changed_files=$(git diff --name-only --diff-filter=ACMR "origin/${CI_COMMIT_TARGET_BRANCH}...HEAD" || true)
if [ -z "$changed_files" ]; then
echo "secret-scan: no changed files found, skipping"
exit 0
fi
# Filter to secret-adjacent paths only
target_files=$(printf '%s\n' "$changed_files" | grep -E "$path_regex" || true)
if [ -z "$target_files" ]; then
echo "secret-scan: no secret-adjacent files changed, skipping"
exit 0
fi
echo "secret-scan: scanning $(printf '%s\n' "$target_files" | wc -l) file(s):"
printf ' %s\n' "$target_files"
failures=0
while IFS= read -r file; do
# Skip deleted files / non-existent
[ -f "$file" ] || continue
# Skip binary files
file -b --mime-encoding "$file" 2>/dev/null | grep -q binary && continue
content=$(cat "$file")
if ! scan_for_secrets "$content"; then
echo "FAIL: secret detected in $file"
failures=$((failures + 1))
fi
done <<< "$target_files"
if [ "$failures" -gt 0 ]; then
echo ""
echo "secret-scan: $failures file(s) contain potential secrets — merge blocked"
echo "If these are false positives, verify patterns in lib/secret-scan.sh"
exit 1
fi
echo "secret-scan: all files clean"

View file

@ -0,0 +1,32 @@
# .woodpecker/secret-scan.yml — Block PRs that leak plaintext secrets
#
# Triggers on pull requests touching secret-adjacent paths.
# Sources lib/secret-scan.sh and scans each changed file's content.
# Exits non-zero if any potential secret is detected.
when:
- event: pull_request
path:
- ".env*"
- "tools/vault-*.sh"
- "nomad/**/*"
- "vault/**/*"
- "action-vault/**/*"
- "lib/hvault.sh"
- "lib/action-vault.sh"
clone:
git:
image: alpine/git
commands:
- AUTH_URL=$(printf '%s' "$CI_REPO_CLONE_URL" | sed "s|://|://token:$FORGE_TOKEN@|")
- git clone --depth 50 "$AUTH_URL" .
- git fetch --depth 50 origin "$CI_COMMIT_REF" "$CI_COMMIT_TARGET_BRANCH"
- git checkout FETCH_HEAD
steps:
- name: secret-scan
image: alpine:3
commands:
- apk add --no-cache bash git grep file
- bash .woodpecker/run-secret-scan.sh

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 4e53f508d9b36c60bd68ed5fc497fc8775fec79f --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Disinto — Agent Instructions # Disinto — Agent Instructions
## What this repo is ## What this repo is
@ -31,19 +31,21 @@ disinto/ (code repo)
├── supervisor/ supervisor-run.sh — formula-driven health monitoring (polling-loop executor) ├── supervisor/ supervisor-run.sh — formula-driven health monitoring (polling-loop executor)
│ preflight.sh — pre-flight data collection for supervisor formula │ preflight.sh — pre-flight data collection for supervisor formula
├── architect/ architect-run.sh — strategic decomposition of vision into sprints ├── architect/ architect-run.sh — strategic decomposition of vision into sprints
├── vault/ vault-env.sh — shared env setup (vault redesign in progress, see #73-#77) ├── action-vault/ vault-env.sh — shared env setup (vault redesign in progress, see #73-#77)
│ SCHEMA.md — vault item schema documentation │ SCHEMA.md — vault item schema documentation
│ validate.sh — vault item validator │ validate.sh — vault item validator
│ examples/ — example vault action TOMLs (promote, publish, release, webhook-call) │ examples/ — example vault action TOMLs (promote, publish, release, webhook-call)
├── lib/ env.sh, agent-sdk.sh, ci-helpers.sh, ci-debug.sh, load-project.sh, parse-deps.sh, guard.sh, mirrors.sh, pr-lifecycle.sh, issue-lifecycle.sh, worktree.sh, formula-session.sh, stack-lock.sh, forge-setup.sh, forge-push.sh, ops-setup.sh, ci-setup.sh, generators.sh, hire-agent.sh, release.sh, build-graph.py, branch-protection.sh, secret-scan.sh, tea-helpers.sh, vault.sh, ci-log-reader.py, git-creds.sh ├── lib/ env.sh, agent-sdk.sh, ci-helpers.sh, ci-debug.sh, load-project.sh, parse-deps.sh, guard.sh, mirrors.sh, pr-lifecycle.sh, issue-lifecycle.sh, worktree.sh, formula-session.sh, stack-lock.sh, forge-setup.sh, forge-push.sh, ops-setup.sh, ci-setup.sh, generators.sh, hire-agent.sh, release.sh, build-graph.py, branch-protection.sh, secret-scan.sh, tea-helpers.sh, action-vault.sh, ci-log-reader.py, git-creds.sh, sprint-filer.sh, hvault.sh
│ hooks/ — Claude Code session hooks (on-compact-reinject, on-idle-stop, on-phase-change, on-pretooluse-guard, on-session-end, on-stop-failure) │ hooks/ — Claude Code session hooks (on-compact-reinject, on-idle-stop, on-phase-change, on-pretooluse-guard, on-session-end, on-stop-failure)
│ init/nomad/ — cluster-up.sh, install.sh, vault-init.sh, lib-systemd.sh (Nomad+Vault Step 0 installers, #821-#825)
├── nomad/ server.hcl, client.hcl, vault.hcl — HCL configs deployed to /etc/nomad.d/ and /etc/vault.d/ by lib/init/nomad/cluster-up.sh
├── projects/ *.toml.example — templates; *.toml — local per-box config (gitignored) ├── projects/ *.toml.example — templates; *.toml — local per-box config (gitignored)
├── formulas/ Issue templates (TOML specs for multi-step agent tasks) ├── formulas/ Issue templates (TOML specs for multi-step agent tasks)
├── docker/ Dockerfiles and entrypoints: reproduce, triage, edge dispatcher, chat (server.py, entrypoint-chat.sh, Dockerfile, ui/) ├── docker/ Dockerfiles and entrypoints: reproduce, triage, edge dispatcher, chat (server.py, entrypoint-chat.sh, Dockerfile, ui/)
├── tools/ Operational tools: edge-control/ (register.sh, install.sh, verify-chat-sandbox.sh) ├── tools/ Operational tools: edge-control/ (register.sh, install.sh, verify-chat-sandbox.sh)
├── docs/ Protocol docs (PHASE-PROTOCOL.md, EVIDENCE-ARCHITECTURE.md) ├── docs/ Protocol docs (PHASE-PROTOCOL.md, EVIDENCE-ARCHITECTURE.md)
├── site/ disinto.ai website content ├── site/ disinto.ai website content
├── tests/ Test files (mock-forgejo.py, smoke-init.sh) ├── tests/ Test files (mock-forgejo.py, smoke-init.sh, lib-hvault.bats, disinto-init-nomad.bats)
├── templates/ Issue templates ├── templates/ Issue templates
├── bin/ The `disinto` CLI script ├── bin/ The `disinto` CLI script
├── disinto-factory/ Setup documentation and skill ├── disinto-factory/ Setup documentation and skill
@ -86,7 +88,7 @@ Each agent has a `.profile` repository on Forgejo storing `knowledge/lessons-lea
- All scripts start with `#!/usr/bin/env bash` and `set -euo pipefail` - All scripts start with `#!/usr/bin/env bash` and `set -euo pipefail`
- Source shared environment: `source "$(dirname "$0")/../lib/env.sh"` - Source shared environment: `source "$(dirname "$0")/../lib/env.sh"`
- Log to `$LOGFILE` using the `log()` function from env.sh or defined locally - Log to `$LOGFILE` using the `log()` function from env.sh or defined locally
- Never hardcode secrets — agent secrets come from `.env.enc`, vault secrets from `.env.vault.enc` (or `.env`/`.env.vault` fallback) - Never hardcode secrets — agent secrets come from `.env.enc`, vault secrets from `secrets/<NAME>.enc` (age-encrypted, one file per key)
- Never embed secrets in issue bodies, PR descriptions, or comments — use env var references (e.g. `$BASE_RPC_URL`) - Never embed secrets in issue bodies, PR descriptions, or comments — use env var references (e.g. `$BASE_RPC_URL`)
- ShellCheck must pass (CI runs `shellcheck` on all `.sh` files) - ShellCheck must pass (CI runs `shellcheck` on all `.sh` files)
- Avoid duplicate code — shared helpers go in `lib/` - Avoid duplicate code — shared helpers go in `lib/`
@ -113,10 +115,13 @@ bash dev/phase-test.sh
| Supervisor | `supervisor/` | Health monitoring | [supervisor/AGENTS.md](supervisor/AGENTS.md) | | Supervisor | `supervisor/` | Health monitoring | [supervisor/AGENTS.md](supervisor/AGENTS.md) |
| Planner | `planner/` | Strategic planning | [planner/AGENTS.md](planner/AGENTS.md) | | Planner | `planner/` | Strategic planning | [planner/AGENTS.md](planner/AGENTS.md) |
| Predictor | `predictor/` | Infrastructure pattern detection | [predictor/AGENTS.md](predictor/AGENTS.md) | | Predictor | `predictor/` | Infrastructure pattern detection | [predictor/AGENTS.md](predictor/AGENTS.md) |
| Architect | `architect/` | Strategic decomposition | [architect/AGENTS.md](architect/AGENTS.md) | | Architect | `architect/` | Strategic decomposition (read-only on project repo) | [architect/AGENTS.md](architect/AGENTS.md) |
| Filer | `lib/sprint-filer.sh` | Sub-issue filing from merged sprint PRs | ops repo pipeline (deferred, see #779) |
| Reproduce | `docker/reproduce/` | Bug reproduction using Playwright MCP | `formulas/reproduce.toml` | | Reproduce | `docker/reproduce/` | Bug reproduction using Playwright MCP | `formulas/reproduce.toml` |
| Triage | `docker/reproduce/` | Deep root cause analysis | `formulas/triage.toml` | | Triage | `docker/reproduce/` | Deep root cause analysis | `formulas/triage.toml` |
| Edge dispatcher | `docker/edge/` | Polls ops repo for vault actions, executes via Claude sessions | `docker/edge/dispatcher.sh` | | Edge dispatcher | `docker/edge/` | Polls ops repo for vault actions, executes via Claude sessions | `docker/edge/dispatcher.sh` |
| agents-llama | `docker/agents/` (same image) | Local-Qwen dev agent (`AGENT_ROLES=dev`), gated on `ENABLE_LLAMA_AGENT=1` | [docs/agents-llama.md](docs/agents-llama.md) |
| agents-llama-all | `docker/agents/` (same image) | Local-Qwen all-roles agent (all 7 roles), profile `agents-llama-all` | [docs/agents-llama.md](docs/agents-llama.md) |
> **Vault:** Being redesigned as a PR-based approval workflow (issues #73-#77). > **Vault:** Being redesigned as a PR-based approval workflow (issues #73-#77).
> See [docs/VAULT.md](docs/VAULT.md) for the vault PR workflow details. > See [docs/VAULT.md](docs/VAULT.md) for the vault PR workflow details.
@ -135,7 +140,7 @@ Issues flow: `backlog` → `in-progress` → PR → CI → review → merge →
|---|---|---| |---|---|---|
| `backlog` | Issue is queued for implementation. Dev-poll picks the first ready one. | Planner, gardener, humans | | `backlog` | Issue is queued for implementation. Dev-poll picks the first ready one. | Planner, gardener, humans |
| `priority` | Queue tier above plain backlog. Issues with both `priority` and `backlog` are picked before plain `backlog` issues. FIFO within each tier. | Planner, humans | | `priority` | Queue tier above plain backlog. Issues with both `priority` and `backlog` are picked before plain `backlog` issues. FIFO within each tier. | Planner, humans |
| `in-progress` | Dev-agent is actively working on this issue. Only one issue per project is in-progress at a time. | dev-agent.sh (claims issue) | | `in-progress` | Dev-agent is actively working on this issue. Only one issue per project is in-progress at a time. Also set on vision issues by filer-bot when sub-issues are filed (#764). | dev-agent.sh (claims issue), filer-bot (vision issues) |
| `blocked` | Issue is stuck — agent session failed, crashed, timed out, or CI exhausted. Diagnostic comment on the issue has details. Also used for unmet dependencies. | dev-agent.sh, dev-poll.sh (on failure) | | `blocked` | Issue is stuck — agent session failed, crashed, timed out, or CI exhausted. Diagnostic comment on the issue has details. Also used for unmet dependencies. | dev-agent.sh, dev-poll.sh (on failure) |
| `tech-debt` | Pre-existing issue flagged by AI reviewer, not introduced by a PR. | review-pr.sh (auto-created follow-ups) | | `tech-debt` | Pre-existing issue flagged by AI reviewer, not introduced by a PR. | review-pr.sh (auto-created follow-ups) |
| `underspecified` | Dev-agent refused the issue as too large or vague. | dev-poll.sh (on preflight `too_large`), dev-agent.sh (on mid-run `too_large` refusal) | | `underspecified` | Dev-agent refused the issue as too large or vague. | dev-poll.sh (on preflight `too_large`), dev-agent.sh (on mid-run `too_large` refusal) |
@ -177,24 +182,19 @@ Humans write these. Agents read and enforce them.
| AD-002 | **Concurrency is bounded per LLM backend, not per project.** One concurrent Claude session per OAuth credential pool; one concurrent session per llama-server instance. Containers with disjoint backends may run in parallel. | The single-thread invariant is about *backends*, not pipelines. **(a) Anthropic OAuth credentials race on token refresh** — each container uses a per-session `CLAUDE_CONFIG_DIR`, so Claude Code's native lockfile-based OAuth refresh handles contention automatically without external serialization. (Legacy: set `CLAUDE_EXTERNAL_LOCK=1` to re-enable the old `flock session.lock` wrapper for rollback.) **(b) llama-server has finite VRAM and one KV cache** — parallel inference thrashes the cache and risks OOM. All llama-backed agents serialize on the same lock. **(c) Disjoint backends are free to parallelize.** Today `disinto-agents` (Anthropic OAuth, runs `review,gardener`) runs concurrently with `disinto-agents-llama` (llama, runs `dev`) on the same project — they share neither OAuth state nor llama VRAM. **(d) Per-project work-conflict safety** (no duplicate dev work, no merge conflicts on the same branch) is enforced by `issue_claim` (assignee + `in-progress` label) and per-issue worktrees — that's a separate guard that does NOT depend on this AD. | | AD-002 | **Concurrency is bounded per LLM backend, not per project.** One concurrent Claude session per OAuth credential pool; one concurrent session per llama-server instance. Containers with disjoint backends may run in parallel. | The single-thread invariant is about *backends*, not pipelines. **(a) Anthropic OAuth credentials race on token refresh** — each container uses a per-session `CLAUDE_CONFIG_DIR`, so Claude Code's native lockfile-based OAuth refresh handles contention automatically without external serialization. (Legacy: set `CLAUDE_EXTERNAL_LOCK=1` to re-enable the old `flock session.lock` wrapper for rollback.) **(b) llama-server has finite VRAM and one KV cache** — parallel inference thrashes the cache and risks OOM. All llama-backed agents serialize on the same lock. **(c) Disjoint backends are free to parallelize.** Today `disinto-agents` (Anthropic OAuth, runs `review,gardener`) runs concurrently with `disinto-agents-llama` (llama, runs `dev`) on the same project — they share neither OAuth state nor llama VRAM. **(d) Per-project work-conflict safety** (no duplicate dev work, no merge conflicts on the same branch) is enforced by `issue_claim` (assignee + `in-progress` label) and per-issue worktrees — that's a separate guard that does NOT depend on this AD. |
| AD-003 | The runtime creates and destroys, the formula preserves. | Runtime manages worktrees/sessions/temp. Formulas commit knowledge to git before signaling done. | | AD-003 | The runtime creates and destroys, the formula preserves. | Runtime manages worktrees/sessions/temp. Formulas commit knowledge to git before signaling done. |
| AD-004 | Event-driven > polling > fixed delays. | Never `waitForTimeout` or hardcoded sleep. Use phase files, webhooks, or poll loops with backoff. | | AD-004 | Event-driven > polling > fixed delays. | Never `waitForTimeout` or hardcoded sleep. Use phase files, webhooks, or poll loops with backoff. |
| AD-005 | Secrets via env var indirection, never in issue bodies. | Issue bodies become code. Agent secrets go in `.env.enc`, vault secrets in `.env.vault.enc` (SOPS-encrypted when available; plaintext `.env`/`.env.vault` fallback supported). Referenced as `$VAR_NAME`. Runner gets only vault secrets; agents get only agent secrets. | | AD-005 | Secrets via env var indirection, never in issue bodies. | Issue bodies become code. Agent secrets go in `.env.enc` (SOPS-encrypted), vault secrets in `secrets/<NAME>.enc` (age-encrypted, one file per key). Referenced as `$VAR_NAME`. Runner gets only vault secrets; agents get only agent secrets. |
| AD-006 | External actions go through vault dispatch, never direct. | Agents build addressables; only the vault exercises them (publishes, deploys, posts). Tokens for external systems (`GITHUB_TOKEN`, `CLAWHUB_TOKEN`, deploy keys) live only in `.env.vault.enc` and are injected into the ephemeral runner container. `lib/env.sh` unsets them so agents never hold them. PRs with direct external actions without vault dispatch get REQUEST_CHANGES. (Vault redesign in progress: PR-based approval on ops repo, see #73-#77) | | AD-006 | External actions go through vault dispatch, never direct. | Agents build addressables; only the vault exercises them (publishes, deploys, posts). Tokens for external systems (`GITHUB_TOKEN`, `CLAWHUB_TOKEN`, deploy keys) live only in `secrets/<NAME>.enc` and are decrypted into the ephemeral runner container. `lib/env.sh` unsets them so agents never hold them. PRs with direct external actions without vault dispatch get REQUEST_CHANGES. (Vault redesign in progress: PR-based approval on ops repo, see #73-#77) |
**Who enforces what:** **Who enforces what:**
- **Gardener** checks open backlog issues against ADs during grooming; closes violations with a comment referencing the AD number. - **Gardener** checks open backlog issues against ADs during grooming; closes violations with a comment. **Planner** plans within the architecture; does not create issues that violate ADs.
- **Planner** plans within the architecture; does not create issues that violate ADs.
- **Dev-agent** reads AGENTS.md before implementing; refuses work that violates ADs. - **Dev-agent** reads AGENTS.md before implementing; refuses work that violates ADs.
- **AD-002 is a runtime invariant; nothing for the gardener to check at issue-groom time.** OAuth concurrency is handled by per-session `CLAUDE_CONFIG_DIR` isolation (with `CLAUDE_EXTERNAL_LOCK` as a rollback flag). Per-issue work is enforced by `issue_claim`. A violation manifests as a 401 or VRAM OOM in agent logs, not as a malformed issue. - **AD-002 is a runtime invariant; nothing for the gardener to check at issue-groom time.** OAuth concurrency is handled by per-session `CLAUDE_CONFIG_DIR` isolation (with `CLAUDE_EXTERNAL_LOCK` as a rollback flag). Per-issue work is enforced by `issue_claim`. A violation manifests as a 401 or VRAM OOM in agent logs, not as a malformed issue.
---
## Phase-Signaling Protocol ## Phase-Signaling Protocol
When running as a persistent tmux session, Claude must signal the orchestrator When running as a persistent tmux session, Claude must signal the orchestrator
at each phase boundary by writing to a phase file (e.g. at each phase boundary by writing to a phase file (e.g.
`/tmp/dev-session-{project}-{issue}.phase`). `/tmp/dev-session-{project}-{issue}.phase`).
Key phases: `PHASE:awaiting_ci``PHASE:awaiting_review``PHASE:done`. Key phases: `PHASE:awaiting_ci``PHASE:awaiting_review``PHASE:done`. Also: `PHASE:escalate` (needs human input), `PHASE:failed`.
Also: `PHASE:escalate` (needs human input), `PHASE:failed`.
See [docs/PHASE-PROTOCOL.md](docs/PHASE-PROTOCOL.md) for the complete spec, orchestrator reaction matrix, sequence diagram, and crash recovery. See [docs/PHASE-PROTOCOL.md](docs/PHASE-PROTOCOL.md) for the complete spec, orchestrator reaction matrix, sequence diagram, and crash recovery.

View file

@ -50,7 +50,7 @@ blast_radius = "low" # optional: overrides policy.toml tier ("low"|"medium
## Secret Names ## Secret Names
Secret names must be defined in `.env.vault.enc` on the ops repo. The vault validates that requested secrets exist in the allowlist before execution. Secret names must have a corresponding `secrets/<NAME>.enc` file (age-encrypted). The vault validates that requested secrets exist in the allowlist before execution.
Common secret names: Common secret names:
- `CLAWHUB_TOKEN` - Token for ClawHub skill publishing - `CLAWHUB_TOKEN` - Token for ClawHub skill publishing

View file

@ -28,7 +28,7 @@ fi
# VAULT ACTION VALIDATION # VAULT ACTION VALIDATION
# ============================================================================= # =============================================================================
# Allowed secret names - must match keys in .env.vault.enc # Allowed secret names - must match files in secrets/<NAME>.enc
VAULT_ALLOWED_SECRETS="CLAWHUB_TOKEN GITHUB_TOKEN CODEBERG_TOKEN DEPLOY_KEY NPM_TOKEN DOCKER_HUB_TOKEN" VAULT_ALLOWED_SECRETS="CLAWHUB_TOKEN GITHUB_TOKEN CODEBERG_TOKEN DEPLOY_KEY NPM_TOKEN DOCKER_HUB_TOKEN"
# Allowed mount aliases — well-known file-based credential directories # Allowed mount aliases — well-known file-based credential directories

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: c4ca1e930d7be3f95060971ce4fa949dab2f76e7 --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Architect — Agent Instructions # Architect — Agent Instructions
## What this agent is ## What this agent is
@ -10,9 +10,9 @@ converses with humans through PR comments.
## Role ## Role
- **Input**: Vision issues from VISION.md, prerequisite tree from ops repo - **Input**: Vision issues from VISION.md, prerequisite tree from ops repo
- **Output**: Sprint proposals as PRs on the ops repo, sub-issue files - **Output**: Sprint proposals as PRs on the ops repo (with embedded `## Sub-issues` blocks)
- **Mechanism**: Bash-driven orchestration in `architect-run.sh`, pitching formula via `formulas/run-architect.toml` - **Mechanism**: Bash-driven orchestration in `architect-run.sh`, pitching formula via `formulas/run-architect.toml`
- **Identity**: `architect-bot` on Forgejo - **Identity**: `architect-bot` on Forgejo (READ-ONLY on project repo, write on ops repo only — #764)
## Responsibilities ## Responsibilities
@ -24,16 +24,17 @@ converses with humans through PR comments.
acceptance criteria and dependencies acceptance criteria and dependencies
4. **Human conversation**: Respond to PR comments, refine sprint proposals based 4. **Human conversation**: Respond to PR comments, refine sprint proposals based
on human feedback on human feedback
5. **Sub-issue filing**: After design forks are resolved, file concrete sub-issues 5. **Sub-issue definition**: Define concrete sub-issues in the `## Sub-issues`
for implementation block of the sprint spec. Filing is handled by `filer-bot` after sprint PR
merge (#764)
## Formula ## Formula
The architect pitching is driven by `formulas/run-architect.toml`. This formula defines The architect pitching is driven by `formulas/run-architect.toml`. This formula defines
the steps for: the steps for:
- Research: analyzing vision items and prerequisite tree - Research: analyzing vision items and prerequisite tree
- Pitch: creating structured sprint PRs - Pitch: creating structured sprint PRs with embedded `## Sub-issues` blocks
- Sub-issue filing: creating concrete implementation issues - Design Q&A: refining the sprint via PR comments after human ACCEPT
## Bash-driven orchestration ## Bash-driven orchestration
@ -57,22 +58,31 @@ APPROVED review → start design questions (model posts Q1:, adds Design forks s
Answers received → continue Q&A (model processes answers, posts follow-ups) Answers received → continue Q&A (model processes answers, posts follow-ups)
All forks resolved → sub-issue filing (model files implementation issues) All forks resolved → finalize ## Sub-issues section in sprint spec
Sprint PR merged → filer-bot files sub-issues on project repo (#764)
REJECT review → close PR + journal (model processes rejection, bash merges PR) REJECT review → close PR + journal (model processes rejection, bash merges PR)
``` ```
### Vision issue lifecycle ### Vision issue lifecycle
Vision issues decompose into sprint sub-issues tracked via "Decomposed from #N" in sub-issue bodies. The architect automatically closes vision issues when all sub-issues are closed: Vision issues decompose into sprint sub-issues. Sub-issues are defined in the
`## Sub-issues` block of the sprint spec (between `<!-- filer:begin -->` and
`<!-- filer:end -->` markers) and filed by `filer-bot` after the sprint PR merges
on the ops repo (#764).
1. Before picking new vision issues, the architect checks each open vision issue Each filer-created sub-issue carries a `<!-- decomposed-from: #<vision>, sprint: <slug>, id: <id> -->`
2. For each, it queries merged sprint PRs — **only PRs whose title or body reference the specific vision issue** (matched via `#N` pattern, filtering out unrelated PRs that happen to close unrelated issues) (#735/#736) marker in its body for idempotency and traceability.
3. Extracts sub-issue numbers from those PRs, excluding the vision issue itself
4. If all sub-issues are closed, posts a summary comment listing completed sub-issues (with an idempotency guard: checks both comment presence AND `.state == "closed"` — if the comment exists but the issue is still open, retries the close rather than returning early) (#737)
5. The vision issue is then closed automatically
This ensures vision issues transition from `open``closed` once their work is complete, without manual intervention. The #N-scoped matching prevents false positives where unrelated sub-issues would incorrectly trigger vision issue closure. The filer-bot (via `lib/sprint-filer.sh`) handles vision lifecycle:
1. After filing sub-issues, adds `in-progress` label to the vision issue
2. On each run, checks if all sub-issues for a vision are closed
3. If all closed, posts a summary comment and closes the vision issue
The architect no longer writes to the project repo — it is read-only (#764).
All project-repo writes (issue filing, label management, vision closure) are
handled by filer-bot with its narrowly-scoped `FORGE_FILER_TOKEN`.
### Session management ### Session management
@ -86,6 +96,7 @@ Run via `architect/architect-run.sh`, which:
- Acquires a poll-loop lock (via `acquire_lock`) and checks available memory - Acquires a poll-loop lock (via `acquire_lock`) and checks available memory
- Cleans up per-issue scratch files from previous runs (`/tmp/architect-{project}-scratch-*.md`) - Cleans up per-issue scratch files from previous runs (`/tmp/architect-{project}-scratch-*.md`)
- Sources shared libraries (env.sh, formula-session.sh) - Sources shared libraries (env.sh, formula-session.sh)
- Exports `FORGE_TOKEN_OVERRIDE="${FORGE_ARCHITECT_TOKEN}"` BEFORE sourcing env.sh, ensuring architect-bot identity survives re-sourcing (#762)
- Uses FORGE_ARCHITECT_TOKEN for authentication - Uses FORGE_ARCHITECT_TOKEN for authentication
- Processes existing architect PRs via bash-driven design phase - Processes existing architect PRs via bash-driven design phase
- Loads the formula and builds context from VISION.md, AGENTS.md, and ops repo - Loads the formula and builds context from VISION.md, AGENTS.md, and ops repo
@ -95,7 +106,9 @@ Run via `architect/architect-run.sh`, which:
- Selects up to `pitch_budget` (3 - open architect PRs) remaining vision issues - Selects up to `pitch_budget` (3 - open architect PRs) remaining vision issues
- For each selected issue, invokes stateless `claude -p` with issue body + context - For each selected issue, invokes stateless `claude -p` with issue body + context
- Creates PRs directly from pitch content (no scratch files) - Creates PRs directly from pitch content (no scratch files)
- Agent is invoked only for response processing (ACCEPT/REJECT handling) - Agent is invoked for stateless pitch generation and response processing (ACCEPT/REJECT handling)
- NOTE: architect-bot is read-only on the project repo (#764) — sub-issue filing
and in-progress label management are handled by filer-bot after sprint PR merge
**Multi-sprint pitching**: The architect pitches up to 3 sprints per run. Bash handles all state management: **Multi-sprint pitching**: The architect pitches up to 3 sprints per run. Bash handles all state management:
- Fetches Forgejo API data (vision issues, open PRs, merged PRs) - Fetches Forgejo API data (vision issues, open PRs, merged PRs)
@ -120,4 +133,5 @@ empty file not created, just document it).
- #100: Architect formula — research + design fork identification - #100: Architect formula — research + design fork identification
- #101: Architect formula — sprint PR creation with questions - #101: Architect formula — sprint PR creation with questions
- #102: Architect formula — answer parsing + sub-issue filing - #102: Architect formula — answer parsing + sub-issue filing
- #764: Permission scoping — architect read-only on project repo, filer-bot files sub-issues
- #491: Refactor — bash-driven design phase with stateful session resumption - #491: Refactor — bash-driven design phase with stateful session resumption

View file

@ -34,10 +34,11 @@ FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
# Accept project config from argument; default to disinto # Accept project config from argument; default to disinto
export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}" export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}"
# Set override BEFORE sourcing env.sh so it survives any later re-source of
# env.sh from nested shells / claude -p tools (#762, #747)
export FORGE_TOKEN_OVERRIDE="${FORGE_ARCHITECT_TOKEN:-}"
# shellcheck source=../lib/env.sh # shellcheck source=../lib/env.sh
source "$FACTORY_ROOT/lib/env.sh" source "$FACTORY_ROOT/lib/env.sh"
# Override FORGE_TOKEN with architect-bot's token (#747)
FORGE_TOKEN="${FORGE_ARCHITECT_TOKEN:-${FORGE_TOKEN}}"
# shellcheck source=../lib/formula-session.sh # shellcheck source=../lib/formula-session.sh
source "$FACTORY_ROOT/lib/formula-session.sh" source "$FACTORY_ROOT/lib/formula-session.sh"
# shellcheck source=../lib/worktree.sh # shellcheck source=../lib/worktree.sh
@ -116,8 +117,8 @@ build_architect_prompt() {
You are the architect agent for ${FORGE_REPO}. Work through the formula below. You are the architect agent for ${FORGE_REPO}. Work through the formula below.
Your role: strategic decomposition of vision issues into development sprints. Your role: strategic decomposition of vision issues into development sprints.
Propose sprints via PRs on the ops repo, converse with humans through PR comments, Propose sprints via PRs on the ops repo, converse with humans through PR comments.
and file sub-issues after design forks are resolved. You are READ-ONLY on the project repo — sub-issues are filed by filer-bot after sprint PR merge (#764).
## Project context ## Project context
${CONTEXT_BLOCK} ${CONTEXT_BLOCK}
@ -144,8 +145,8 @@ build_architect_prompt_for_mode() {
You are the architect agent for ${FORGE_REPO}. Work through the formula below. You are the architect agent for ${FORGE_REPO}. Work through the formula below.
Your role: strategic decomposition of vision issues into development sprints. Your role: strategic decomposition of vision issues into development sprints.
Propose sprints via PRs on the ops repo, converse with humans through PR comments, Propose sprints via PRs on the ops repo, converse with humans through PR comments.
and file sub-issues after design forks are resolved. You are READ-ONLY on the project repo — sub-issues are filed by filer-bot after sprint PR merge (#764).
## CURRENT STATE: Approved PR awaiting initial design questions ## CURRENT STATE: Approved PR awaiting initial design questions
@ -156,10 +157,10 @@ design conversation has not yet started. Your task is to:
2. Identify the key design decisions that need human input 2. Identify the key design decisions that need human input
3. Post initial design questions (Q1:, Q2:, etc.) as comments on the PR 3. Post initial design questions (Q1:, Q2:, etc.) as comments on the PR
4. Add a `## Design forks` section to the PR body documenting the design decisions 4. Add a `## Design forks` section to the PR body documenting the design decisions
5. File sub-issues for each design fork path if applicable 5. Update the ## Sub-issues section in the sprint spec if design decisions affect decomposition
This is NOT a pitch phase — the pitch is already approved. This is the START This is NOT a pitch phase — the pitch is already approved. This is the START
of the design Q&A phase. of the design Q&A phase. Sub-issues are filed by filer-bot after sprint PR merge (#764).
## Project context ## Project context
${CONTEXT_BLOCK} ${CONTEXT_BLOCK}
@ -178,8 +179,8 @@ _PROMPT_EOF_
You are the architect agent for ${FORGE_REPO}. Work through the formula below. You are the architect agent for ${FORGE_REPO}. Work through the formula below.
Your role: strategic decomposition of vision issues into development sprints. Your role: strategic decomposition of vision issues into development sprints.
Propose sprints via PRs on the ops repo, converse with humans through PR comments, Propose sprints via PRs on the ops repo, converse with humans through PR comments.
and file sub-issues after design forks are resolved. You are READ-ONLY on the project repo — sub-issues are filed by filer-bot after sprint PR merge (#764).
## CURRENT STATE: Design Q&A in progress ## CURRENT STATE: Design Q&A in progress
@ -193,7 +194,7 @@ Your task is to:
2. Read human answers from PR comments 2. Read human answers from PR comments
3. Parse the answers and determine next steps 3. Parse the answers and determine next steps
4. Post follow-up questions if needed (Q3:, Q4:, etc.) 4. Post follow-up questions if needed (Q3:, Q4:, etc.)
5. If all design forks are resolved, file sub-issues for each path 5. If all design forks are resolved, finalize the ## Sub-issues section in the sprint spec
6. Update the `## Design forks` section as you progress 6. Update the `## Design forks` section as you progress
## Project context ## Project context
@ -417,243 +418,10 @@ fetch_vision_issues() {
"${FORGE_API}/issues?labels=vision&state=open&limit=100" 2>/dev/null || echo '[]' "${FORGE_API}/issues?labels=vision&state=open&limit=100" 2>/dev/null || echo '[]'
} }
# ── Helper: Fetch all sub-issues for a vision issue ─────────────────────── # NOTE: get_vision_subissues, all_subissues_closed, close_vision_issue,
# Sub-issues are identified by: # check_and_close_completed_visions removed (#764) — architect-bot is read-only
# 1. Issues whose body contains "Decomposed from #N" pattern # on the project repo. Vision lifecycle (closing completed visions, adding
# 2. Issues referenced in merged sprint PR bodies # in-progress labels) is now handled by filer-bot via lib/sprint-filer.sh.
# Returns: newline-separated list of sub-issue numbers (empty if none)
# Args: vision_issue_number
get_vision_subissues() {
local vision_issue="$1"
local subissues=()
# Method 1: Find issues with "Decomposed from #N" in body
local issues_json
issues_json=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues?limit=100" 2>/dev/null) || true
if [ -n "$issues_json" ] && [ "$issues_json" != "null" ]; then
while IFS= read -r subissue_num; do
[ -z "$subissue_num" ] && continue
subissues+=("$subissue_num")
done <<< "$(printf '%s' "$issues_json" | jq -r --arg vid "$vision_issue" \
'[.[] | select(.number != ($vid | tonumber)) | select(.body // "" | contains("Decomposed from #" + $vid))] | .[].number' 2>/dev/null)"
fi
# Method 2: Find issues referenced in merged sprint PR bodies
# Only consider PRs whose title or body references this specific vision issue
local prs_json
prs_json=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API_BASE}/repos/${FORGE_OPS_REPO}/pulls?state=closed&limit=100" 2>/dev/null) || true
if [ -n "$prs_json" ] && [ "$prs_json" != "null" ]; then
while IFS= read -r pr_num; do
[ -z "$pr_num" ] && continue
local pr_details pr_body pr_title
pr_details=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API_BASE}/repos/${FORGE_OPS_REPO}/pulls/${pr_num}" 2>/dev/null) || continue
local is_merged
is_merged=$(printf '%s' "$pr_details" | jq -r '.merged // false') || continue
if [ "$is_merged" != "true" ]; then
continue
fi
pr_title=$(printf '%s' "$pr_details" | jq -r '.title // ""') || continue
pr_body=$(printf '%s' "$pr_details" | jq -r '.body // ""') || continue
# Only process PRs that reference this specific vision issue
if ! printf '%s\n%s' "$pr_title" "$pr_body" | grep -qE "#${vision_issue}([^0-9]|$)"; then
continue
fi
# Extract issue numbers from PR body, excluding the vision issue itself
while IFS= read -r ref_issue; do
[ -z "$ref_issue" ] && continue
# Skip the vision issue itself
[ "$ref_issue" = "$vision_issue" ] && continue
# Skip if already in list
local found=false
for existing in "${subissues[@]+"${subissues[@]}"}"; do
[ "$existing" = "$ref_issue" ] && found=true && break
done
if [ "$found" = false ]; then
subissues+=("$ref_issue")
fi
done <<< "$(printf '%s' "$pr_body" | grep -oE '#[0-9]+' | tr -d '#' | sort -u)"
done <<< "$(printf '%s' "$prs_json" | jq -r '.[] | select(.title | contains("architect:")) | .number')"
fi
# Output unique sub-issues
printf '%s\n' "${subissues[@]}" | sort -u | grep -v '^$' || true
}
# ── Helper: Check if all sub-issues of a vision issue are closed ───────────
# Returns: 0 if all sub-issues are closed, 1 if any are still open
# Args: vision_issue_number
all_subissues_closed() {
local vision_issue="$1"
local subissues
subissues=$(get_vision_subissues "$vision_issue")
# If no sub-issues found, parent cannot be considered complete
if [ -z "$subissues" ]; then
return 1
fi
# Check each sub-issue state
while IFS= read -r subissue_num; do
[ -z "$subissue_num" ] && continue
local sub_state
sub_state=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues/${subissue_num}" 2>/dev/null | jq -r '.state // "unknown"') || true
if [ "$sub_state" != "closed" ]; then
log "Sub-issue #${subissue_num} is ${sub_state} — vision issue #${vision_issue} not ready to close"
return 1
fi
done <<< "$subissues"
return 0
}
# ── Helper: Close vision issue with summary comment ────────────────────────
# Posts a comment listing all completed sub-issues before closing.
# Returns: 0 on success, 1 on failure
# Args: vision_issue_number
close_vision_issue() {
local vision_issue="$1"
# Idempotency guard: check if a completion comment already exists
local existing_comments
existing_comments=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues/${vision_issue}/comments" 2>/dev/null) || existing_comments="[]"
if printf '%s' "$existing_comments" | jq -e '[.[] | select(.body | contains("Vision Issue Completed"))] | length > 0' >/dev/null 2>&1; then
# Comment exists — verify the issue is actually closed before skipping
local issue_state
issue_state=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues/${vision_issue}" 2>/dev/null | jq -r '.state // "open"') || issue_state="open"
if [ "$issue_state" = "closed" ]; then
log "Vision issue #${vision_issue} already has a completion comment and is closed — skipping"
return 0
fi
log "Vision issue #${vision_issue} has a completion comment but state=${issue_state} — retrying close"
else
# No completion comment yet — build and post one
local subissues
subissues=$(get_vision_subissues "$vision_issue")
# Build summary comment
local summary=""
local count=0
while IFS= read -r subissue_num; do
[ -z "$subissue_num" ] && continue
local sub_title
sub_title=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues/${subissue_num}" 2>/dev/null | jq -r '.title // "Untitled"') || sub_title="Untitled"
summary+="- #${subissue_num}: ${sub_title}"$'\n'
count=$((count + 1))
done <<< "$subissues"
local comment
comment=$(cat <<EOF
## Vision Issue Completed
All sub-issues have been implemented and merged. This vision issue is now closed.
### Completed sub-issues (${count}):
${summary}
---
*Automated closure by architect · $(date -u '+%Y-%m-%d %H:%M UTC')*
EOF
)
# Post comment before closing
local tmpfile tmpjson
tmpfile=$(mktemp /tmp/vision-close-XXXXXX.md)
tmpjson="${tmpfile}.json"
printf '%s' "$comment" > "$tmpfile"
jq -Rs '{body:.}' < "$tmpfile" > "$tmpjson"
if ! curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/${vision_issue}/comments" \
--data-binary @"$tmpjson" >/dev/null 2>&1; then
log "WARNING: failed to post closure comment on vision issue #${vision_issue}"
rm -f "$tmpfile" "$tmpjson"
return 1
fi
rm -f "$tmpfile" "$tmpjson"
fi
# Clear assignee (best-effort) and close the issue
curl -sf -X PATCH \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/${vision_issue}" \
-d '{"assignees":[]}' >/dev/null 2>&1 || true
local close_response
close_response=$(curl -sf -X PATCH \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/${vision_issue}" \
-d '{"state":"closed"}' 2>/dev/null) || {
log "ERROR: state=closed PATCH failed for vision issue #${vision_issue}"
return 1
}
local result_state
result_state=$(printf '%s' "$close_response" | jq -r '.state // "unknown"') || result_state="unknown"
if [ "$result_state" != "closed" ]; then
log "ERROR: vision issue #${vision_issue} state is '${result_state}' after close PATCH — expected 'closed'"
return 1
fi
log "Closed vision issue #${vision_issue}${count:+ — all ${count} sub-issue(s) complete}"
return 0
}
# ── Lifecycle check: Close vision issues with all sub-issues complete ──────
# Runs before picking new vision issues for decomposition.
# Checks each open vision issue and closes it if all sub-issues are closed.
check_and_close_completed_visions() {
log "Checking for vision issues with all sub-issues complete..."
local vision_issues_json
vision_issues_json=$(fetch_vision_issues)
if [ -z "$vision_issues_json" ] || [ "$vision_issues_json" = "null" ]; then
log "No open vision issues found"
return 0
fi
# Get all vision issue numbers
local vision_issue_nums
vision_issue_nums=$(printf '%s' "$vision_issues_json" | jq -r '.[].number' 2>/dev/null) || vision_issue_nums=""
local closed_count=0
while IFS= read -r vision_issue; do
[ -z "$vision_issue" ] && continue
if all_subissues_closed "$vision_issue"; then
if close_vision_issue "$vision_issue"; then
closed_count=$((closed_count + 1))
fi
fi
done <<< "$vision_issue_nums"
if [ "$closed_count" -gt 0 ]; then
log "Closed ${closed_count} vision issue(s) with all sub-issues complete"
else
log "No vision issues ready for closure"
fi
}
# ── Helper: Fetch open architect PRs from ops repo Forgejo API ─────────── # ── Helper: Fetch open architect PRs from ops repo Forgejo API ───────────
# Returns: JSON array of architect PR objects # Returns: JSON array of architect PR objects
@ -745,7 +513,23 @@ Instructions:
## Recommendation ## Recommendation
<architect's assessment: worth it / defer / alternative approach> <architect's assessment: worth it / defer / alternative approach>
## Sub-issues
<!-- filer:begin -->
- id: <kebab-case-id>
title: \"vision(#${issue_num}): <concise sub-issue title>\"
labels: [backlog]
depends_on: []
body: |
## Goal
<what this sub-issue accomplishes>
## Acceptance criteria
- [ ] <criterion>
<!-- filer:end -->
IMPORTANT: Do NOT include design forks or questions. This is a go/no-go pitch. IMPORTANT: Do NOT include design forks or questions. This is a go/no-go pitch.
The ## Sub-issues block is parsed by the filer-bot pipeline after sprint PR merge.
Each sub-issue between filer:begin/end markers becomes a Forgejo issue.
--- ---
@ -854,37 +638,8 @@ post_pr_footer() {
fi fi
} }
# ── Helper: Add in-progress label to vision issue ──────────────────────── # NOTE: add_inprogress_label removed (#764) — architect-bot is read-only on
# Args: vision_issue_number # project repo. in-progress label is now added by filer-bot via sprint-filer.sh.
add_inprogress_label() {
local issue_num="$1"
# Get label ID for 'in-progress'
local labels_json
labels_json=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/labels" 2>/dev/null) || return 1
local inprogress_label_id
inprogress_label_id=$(printf '%s' "$labels_json" | jq -r --arg label "in-progress" '.[] | select(.name == $label) | .id' 2>/dev/null) || true
if [ -z "$inprogress_label_id" ]; then
log "WARNING: in-progress label not found"
return 1
fi
# Add label to issue
if curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/${issue_num}/labels" \
-d "{\"labels\": [${inprogress_label_id}]}" >/dev/null 2>&1; then
log "Added in-progress label to vision issue #${issue_num}"
return 0
else
log "WARNING: failed to add in-progress label to vision issue #${issue_num}"
return 1
fi
}
# ── Precondition checks in bash before invoking the model ───────────────── # ── Precondition checks in bash before invoking the model ─────────────────
@ -934,9 +689,7 @@ if [ "${open_arch_prs:-0}" -ge 3 ]; then
log "3 open architect PRs found but responses detected — processing" log "3 open architect PRs found but responses detected — processing"
fi fi
# ── Lifecycle check: Close vision issues with all sub-issues complete ────── # NOTE: Vision lifecycle check (close completed visions) moved to filer-bot (#764)
# Run before picking new vision issues for decomposition
check_and_close_completed_visions
# ── Bash-driven state management: Select vision issues for pitching ─────── # ── Bash-driven state management: Select vision issues for pitching ───────
# This logic is also documented in formulas/run-architect.toml preflight step # This logic is also documented in formulas/run-architect.toml preflight step
@ -1072,8 +825,7 @@ for vision_issue in "${ARCHITECT_TARGET_ISSUES[@]}"; do
# Post footer comment # Post footer comment
post_pr_footer "$pr_number" post_pr_footer "$pr_number"
# Add in-progress label to vision issue # NOTE: in-progress label is added by filer-bot after sprint PR merge (#764)
add_inprogress_label "$vision_issue"
pitch_count=$((pitch_count + 1)) pitch_count=$((pitch_count + 1))
log "Completed pitch for vision issue #${vision_issue} — PR #${pr_number}" log "Completed pitch for vision issue #${vision_issue} — PR #${pr_number}"

View file

@ -81,9 +81,13 @@ Init options:
--repo-root <path> Local clone path (default: ~/name) --repo-root <path> Local clone path (default: ~/name)
--ci-id <n> Woodpecker CI repo ID (default: 0 = no CI) --ci-id <n> Woodpecker CI repo ID (default: 0 = no CI)
--forge-url <url> Forge base URL (default: http://localhost:3000) --forge-url <url> Forge base URL (default: http://localhost:3000)
--backend <value> Orchestration backend: docker (default) | nomad
--empty (nomad) Bring up cluster only, no jobs (S0.4)
--bare Skip compose generation (bare-metal setup) --bare Skip compose generation (bare-metal setup)
--build Use local docker build instead of registry images (dev mode)
--yes Skip confirmation prompts --yes Skip confirmation prompts
--rotate-tokens Force regeneration of all bot tokens/passwords (idempotent by default) --rotate-tokens Force regeneration of all bot tokens/passwords (idempotent by default)
--dry-run Print every intended action without executing
Hire an agent options: Hire an agent options:
--formula <path> Path to role formula TOML (default: formulas/<role>.toml) --formula <path> Path to role formula TOML (default: formulas/<role>.toml)
@ -203,18 +207,21 @@ generate_compose() {
# Generate docker/agents/ files if they don't already exist. # Generate docker/agents/ files if they don't already exist.
# (Implementation in lib/generators.sh) # (Implementation in lib/generators.sh)
# shellcheck disable=SC2120 # passthrough wrapper; forwards any future args to impl
generate_agent_docker() { generate_agent_docker() {
_generate_agent_docker_impl "$@" _generate_agent_docker_impl "$@"
} }
# Generate docker/Caddyfile template for edge proxy. # Generate docker/Caddyfile template for edge proxy.
# (Implementation in lib/generators.sh) # (Implementation in lib/generators.sh)
# shellcheck disable=SC2120 # passthrough wrapper; forwards any future args to impl
generate_caddyfile() { generate_caddyfile() {
_generate_caddyfile_impl "$@" _generate_caddyfile_impl "$@"
} }
# Generate docker/index.html default page. # Generate docker/index.html default page.
# (Implementation in lib/generators.sh) # (Implementation in lib/generators.sh)
# shellcheck disable=SC2120 # passthrough wrapper; forwards any future args to impl
generate_staging_index() { generate_staging_index() {
_generate_staging_index_impl "$@" _generate_staging_index_impl "$@"
} }
@ -642,30 +649,131 @@ prompt_admin_password() {
# ── init command ───────────────────────────────────────────────────────────── # ── init command ─────────────────────────────────────────────────────────────
disinto_init() { # Nomad backend init — dispatcher (Nomad+Vault migration, S0.4, issue #824).
local repo_url="${1:-}" #
if [ -z "$repo_url" ]; then # Today `--empty` and the default (no flag) both bring up an empty
echo "Error: repo URL required" >&2 # single-node Nomad+Vault cluster via lib/init/nomad/cluster-up.sh. Step 1
echo "Usage: disinto init <repo-url>" >&2 # will extend the default path to also deploy jobs; `--empty` will remain
# the "cluster only, no workloads" escape hatch.
#
# Uses `sudo -n` when not already root — cluster-up.sh mutates /etc/,
# /srv/, and systemd state, so it has to run as root. The `-n` keeps the
# failure mode legible (no hanging TTY-prompted sudo inside a factory
# init run); operators running without sudo-NOPASSWD should invoke
# `sudo disinto init ...` directly.
_disinto_init_nomad() {
local dry_run="${1:-false}" empty="${2:-false}"
local cluster_up="${FACTORY_ROOT}/lib/init/nomad/cluster-up.sh"
if [ ! -x "$cluster_up" ]; then
echo "Error: ${cluster_up} not found or not executable" >&2
exit 1 exit 1
fi fi
# --empty and default both invoke cluster-up today. Log the requested
# mode so the dispatch is visible in factory bootstrap logs — Step 1
# will branch on $empty to gate the job-deployment path.
if [ "$empty" = "true" ]; then
echo "nomad backend: --empty (cluster-up only, no jobs)"
else
echo "nomad backend: default (cluster-up; jobs deferred to Step 1)"
fi
# Dry-run forwards straight through; cluster-up.sh prints its own step
# list and exits 0 without touching the box.
local -a cmd=("$cluster_up")
if [ "$dry_run" = "true" ]; then
cmd+=("--dry-run")
"${cmd[@]}"
exit $?
fi
# Real run — needs root. Invoke via sudo if we're not already root so
# the command's exit code propagates directly. We don't distinguish
# "sudo denied" from "cluster-up.sh failed" here; both surface as a
# non-zero exit, and cluster-up.sh's own error messages cover the
# latter case.
local rc=0
if [ "$(id -u)" -eq 0 ]; then
"${cmd[@]}" || rc=$?
else
if ! command -v sudo >/dev/null 2>&1; then
echo "Error: cluster-up.sh must run as root and sudo is not installed" >&2
exit 1
fi
sudo -n -- "${cmd[@]}" || rc=$?
fi
exit "$rc"
}
disinto_init() {
# Only consume $1 as repo_url if it looks like a positional arg (not a
# flag). The nomad backend (#835) takes no positional — the LXC already
# has the repo cloned by the operator, and repo_url is a docker-backend
# concept. Eagerly consuming `--backend=nomad` as repo_url produced the
# nonsense "--empty is only valid with --backend=nomad" error seen in
# S0.1 end-to-end testing on a fresh LXC. Defer the "repo URL required"
# check to after argparse, where we know the backend.
local repo_url=""
if [ $# -gt 0 ] && [[ "$1" != --* ]]; then
repo_url="$1"
shift shift
fi
# Parse flags # Parse flags
local branch="" repo_root="" ci_id="0" auto_yes=false forge_url_flag="" bare=false rotate_tokens=false local branch="" repo_root="" ci_id="0" auto_yes=false forge_url_flag="" bare=false rotate_tokens=false use_build=false dry_run=false backend="docker" empty=false
while [ $# -gt 0 ]; do while [ $# -gt 0 ]; do
case "$1" in case "$1" in
--branch) branch="$2"; shift 2 ;; --branch) branch="$2"; shift 2 ;;
--repo-root) repo_root="$2"; shift 2 ;; --repo-root) repo_root="$2"; shift 2 ;;
--ci-id) ci_id="$2"; shift 2 ;; --ci-id) ci_id="$2"; shift 2 ;;
--forge-url) forge_url_flag="$2"; shift 2 ;; --forge-url) forge_url_flag="$2"; shift 2 ;;
--backend) backend="$2"; shift 2 ;;
--backend=*) backend="${1#--backend=}"; shift ;;
--bare) bare=true; shift ;; --bare) bare=true; shift ;;
--build) use_build=true; shift ;;
--empty) empty=true; shift ;;
--yes) auto_yes=true; shift ;; --yes) auto_yes=true; shift ;;
--rotate-tokens) rotate_tokens=true; shift ;; --rotate-tokens) rotate_tokens=true; shift ;;
--dry-run) dry_run=true; shift ;;
*) echo "Unknown option: $1" >&2; exit 1 ;; *) echo "Unknown option: $1" >&2; exit 1 ;;
esac esac
done done
# Validate backend
case "$backend" in
docker|nomad) ;;
*) echo "Error: invalid --backend value '${backend}' (expected: docker|nomad)" >&2; exit 1 ;;
esac
# Docker backend requires a repo_url positional; nomad doesn't use one.
# This check must run *after* argparse so `--backend=docker` (with no
# positional) errors with a helpful message instead of the misleading
# "Unknown option: --backend=docker".
if [ "$backend" = "docker" ] && [ -z "$repo_url" ]; then
echo "Error: repo URL required" >&2
echo "Usage: disinto init <repo-url> [options]" >&2
exit 1
fi
# --empty is nomad-only today (the docker path has no concept of an
# "empty cluster"). Reject explicitly rather than letting it silently
# do nothing on --backend=docker.
if [ "$empty" = true ] && [ "$backend" != "nomad" ]; then
echo "Error: --empty is only valid with --backend=nomad" >&2
exit 1
fi
# Dispatch on backend — the nomad path runs lib/init/nomad/cluster-up.sh
# (S0.4). The default and --empty variants are identical today; Step 1
# will branch on $empty to add job deployment to the default path.
if [ "$backend" = "nomad" ]; then
_disinto_init_nomad "$dry_run" "$empty"
# shellcheck disable=SC2317 # _disinto_init_nomad always exits today;
# `return` is defensive against future refactors.
return
fi
# Export bare-metal flag for setup_forge # Export bare-metal flag for setup_forge
export DISINTO_BARE="$bare" export DISINTO_BARE="$bare"
@ -738,12 +846,92 @@ p.write_text(text)
fi fi
fi fi
# ── Dry-run mode: report intended actions and exit ─────────────────────────
if [ "$dry_run" = true ]; then
echo ""
echo "── Dry-run: intended actions ────────────────────────────"
local env_file="${FACTORY_ROOT}/.env"
local rr="${repo_root:-/home/${USER}/${project_name}}"
if [ "$bare" = false ]; then
[ -f "${FACTORY_ROOT}/docker-compose.yml" ] \
&& echo "[skip] docker-compose.yml (exists)" \
|| echo "[create] docker-compose.yml"
fi
[ -f "$env_file" ] \
&& echo "[exists] .env" \
|| echo "[create] .env"
# Report token state from .env
if [ -f "$env_file" ]; then
local _var
for _var in FORGE_ADMIN_TOKEN HUMAN_TOKEN FORGE_TOKEN FORGE_REVIEW_TOKEN \
FORGE_PLANNER_TOKEN FORGE_GARDENER_TOKEN FORGE_VAULT_TOKEN \
FORGE_SUPERVISOR_TOKEN FORGE_PREDICTOR_TOKEN FORGE_ARCHITECT_TOKEN; do
if grep -q "^${_var}=" "$env_file" 2>/dev/null; then
echo "[keep] ${_var} (preserved)"
else
echo "[create] ${_var}"
fi
done
else
echo "[create] all tokens and passwords"
fi
echo ""
echo "[ensure] Forgejo admin user 'disinto-admin'"
echo "[ensure] 8 bot users: dev-bot, review-bot, planner-bot, gardener-bot, vault-bot, supervisor-bot, predictor-bot, architect-bot"
echo "[ensure] 2 llama bot users: dev-qwen, dev-qwen-nightly"
echo "[ensure] .profile repos for all bots"
echo "[ensure] repo ${forge_repo} on Forgejo with collaborators"
echo "[run] preflight checks"
[ -d "${rr}/.git" ] \
&& echo "[skip] clone ${rr} (exists)" \
|| echo "[clone] ${repo_url} -> ${rr}"
echo "[push] to local Forgejo"
echo "[ensure] ops repo disinto-admin/${project_name}-ops"
echo "[ensure] branch protection on ${forge_repo}"
[ "$toml_exists" = true ] \
&& echo "[skip] ${toml_path} (exists)" \
|| echo "[create] ${toml_path}"
if [ "$bare" = false ]; then
echo "[ensure] Woodpecker OAuth2 app"
echo "[ensure] Chat OAuth2 app"
echo "[ensure] WOODPECKER_AGENT_SECRET in .env"
fi
echo "[ensure] labels on ${forge_repo}"
[ -f "${rr}/VISION.md" ] \
&& echo "[skip] VISION.md (exists)" \
|| echo "[create] VISION.md"
echo "[copy] issue templates"
echo "[ensure] scheduling (cron or compose polling)"
if [ "$bare" = false ]; then
echo "[start] docker compose stack"
echo "[ensure] Woodpecker token + repo activation"
fi
echo "[ensure] CLAUDE_CONFIG_DIR"
echo "[ensure] state files (.dev-active, .reviewer-active, .gardener-active)"
echo ""
echo "Dry run complete — no changes made."
exit 0
fi
# Generate compose files (unless --bare) # Generate compose files (unless --bare)
if [ "$bare" = false ]; then if [ "$bare" = false ]; then
local forge_port local forge_port
forge_port=$(printf '%s' "$forge_url" | sed -E 's|.*:([0-9]+)/?$|\1|') forge_port=$(printf '%s' "$forge_url" | sed -E 's|.*:([0-9]+)/?$|\1|')
forge_port="${forge_port:-3000}" forge_port="${forge_port:-3000}"
generate_compose "$forge_port" generate_compose "$forge_port" "$use_build"
generate_agent_docker generate_agent_docker
generate_caddyfile generate_caddyfile
generate_staging_index generate_staging_index
@ -890,6 +1078,19 @@ p.write_text(text)
echo "Config: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 saved to .env" echo "Config: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 saved to .env"
fi fi
# Write local-Qwen dev agent env keys with safe defaults (#769)
if ! grep -q '^ENABLE_LLAMA_AGENT=' "$env_file" 2>/dev/null; then
cat >> "$env_file" <<'LLAMAENVEOF'
# Local Qwen dev agent (optional) — set to 1 to enable
ENABLE_LLAMA_AGENT=0
FORGE_TOKEN_LLAMA=
FORGE_PASS_LLAMA=
ANTHROPIC_BASE_URL=
LLAMAENVEOF
echo "Config: ENABLE_LLAMA_AGENT keys written to .env (disabled by default)"
fi
# Create labels on remote # Create labels on remote
create_labels "$forge_repo" "$forge_url" create_labels "$forge_repo" "$forge_url"
@ -1118,8 +1319,6 @@ disinto_secrets() {
local subcmd="${1:-}" local subcmd="${1:-}"
local enc_file="${FACTORY_ROOT}/.env.enc" local enc_file="${FACTORY_ROOT}/.env.enc"
local env_file="${FACTORY_ROOT}/.env" local env_file="${FACTORY_ROOT}/.env"
local vault_enc_file="${FACTORY_ROOT}/.env.vault.enc"
local vault_env_file="${FACTORY_ROOT}/.env.vault"
# Shared helper: ensure sops+age and .sops.yaml exist # Shared helper: ensure sops+age and .sops.yaml exist
_secrets_ensure_sops() { _secrets_ensure_sops() {
@ -1165,25 +1364,42 @@ disinto_secrets() {
case "$subcmd" in case "$subcmd" in
add) add)
local name="${2:-}" # Parse flags
local force=false
shift # consume 'add'
while [ $# -gt 0 ]; do
case "$1" in
-f|--force) force=true; shift ;;
-*) echo "Unknown flag: $1" >&2; exit 1 ;;
*) break ;;
esac
done
local name="${1:-}"
if [ -z "$name" ]; then if [ -z "$name" ]; then
echo "Usage: disinto secrets add <NAME>" >&2 echo "Usage: disinto secrets add [-f|--force] <NAME>" >&2
exit 1 exit 1
fi fi
_secrets_ensure_age_key _secrets_ensure_age_key
mkdir -p "$secrets_dir" mkdir -p "$secrets_dir"
printf 'Enter value for %s: ' "$name" >&2
local value local value
if [ -t 0 ]; then
# Interactive TTY — prompt with hidden input (original behavior)
printf 'Enter value for %s: ' "$name" >&2
IFS= read -rs value IFS= read -rs value
echo >&2 echo >&2
else
# Piped/redirected stdin — read raw bytes verbatim
IFS= read -r -d '' value || true
fi
if [ -z "$value" ]; then if [ -z "$value" ]; then
echo "Error: empty value" >&2 echo "Error: empty value" >&2
exit 1 exit 1
fi fi
local enc_path="${secrets_dir}/${name}.enc" local enc_path="${secrets_dir}/${name}.enc"
if [ -f "$enc_path" ]; then if [ -f "$enc_path" ] && [ "$force" = false ]; then
if [ -t 0 ]; then
printf 'Secret %s already exists. Overwrite? [y/N] ' "$name" >&2 printf 'Secret %s already exists. Overwrite? [y/N] ' "$name" >&2
local confirm local confirm
read -r confirm read -r confirm
@ -1191,6 +1407,10 @@ disinto_secrets() {
echo "Aborted." >&2 echo "Aborted." >&2
exit 1 exit 1
fi fi
else
echo "Error: secret ${name} already exists (use -f to overwrite)" >&2
exit 1
fi
fi fi
if ! printf '%s' "$value" | age -r "$AGE_PUBLIC_KEY" -o "$enc_path"; then if ! printf '%s' "$value" | age -r "$AGE_PUBLIC_KEY" -o "$enc_path"; then
echo "Error: encryption failed" >&2 echo "Error: encryption failed" >&2
@ -1221,6 +1441,37 @@ disinto_secrets() {
sops -d "$enc_file" sops -d "$enc_file"
fi fi
;; ;;
remove)
local name="${2:-}"
if [ -z "$name" ]; then
echo "Usage: disinto secrets remove <NAME>" >&2
exit 1
fi
local enc_path="${secrets_dir}/${name}.enc"
if [ ! -f "$enc_path" ]; then
echo "Error: ${enc_path} not found" >&2
exit 1
fi
rm -f "$enc_path"
echo "Removed: ${enc_path}"
;;
list)
if [ ! -d "$secrets_dir" ]; then
echo "No secrets directory found." >&2
exit 0
fi
local found=false
for enc_file_path in "${secrets_dir}"/*.enc; do
[ -f "$enc_file_path" ] || continue
found=true
local secret_name
secret_name=$(basename "$enc_file_path" .enc)
echo "$secret_name"
done
if [ "$found" = false ]; then
echo "No secrets stored." >&2
fi
;;
edit) edit)
if [ ! -f "$enc_file" ]; then if [ ! -f "$enc_file" ]; then
echo "Error: ${enc_file} not found. Run 'disinto secrets migrate' first." >&2 echo "Error: ${enc_file} not found. Run 'disinto secrets migrate' first." >&2
@ -1244,54 +1495,100 @@ disinto_secrets() {
rm -f "$env_file" rm -f "$env_file"
echo "Migrated: .env -> .env.enc (plaintext removed)" echo "Migrated: .env -> .env.enc (plaintext removed)"
;; ;;
edit-vault) migrate-from-vault)
if [ ! -f "$vault_enc_file" ]; then # One-shot migration: split .env.vault.enc into secrets/<KEY>.enc files (#777)
echo "Error: ${vault_enc_file} not found. Run 'disinto secrets migrate-vault' first." >&2 local vault_enc_file="${FACTORY_ROOT}/.env.vault.enc"
local vault_env_file="${FACTORY_ROOT}/.env.vault"
local source_file=""
if [ -f "$vault_enc_file" ] && command -v sops &>/dev/null; then
source_file="$vault_enc_file"
elif [ -f "$vault_env_file" ]; then
source_file="$vault_env_file"
else
echo "Error: neither .env.vault.enc nor .env.vault found — nothing to migrate." >&2
exit 1 exit 1
fi fi
sops "$vault_enc_file"
;; _secrets_ensure_age_key
show-vault) mkdir -p "$secrets_dir"
if [ ! -f "$vault_enc_file" ]; then
echo "Error: ${vault_enc_file} not found." >&2 # Decrypt vault to temp dotenv
local tmp_dotenv
tmp_dotenv=$(mktemp /tmp/disinto-vault-migrate-XXXXXX)
trap 'rm -f "$tmp_dotenv"' RETURN
if [ "$source_file" = "$vault_enc_file" ]; then
if ! sops -d --output-type dotenv "$vault_enc_file" > "$tmp_dotenv" 2>/dev/null; then
rm -f "$tmp_dotenv"
echo "Error: failed to decrypt .env.vault.enc" >&2
exit 1 exit 1
fi fi
sops -d "$vault_enc_file" else
;; cp "$vault_env_file" "$tmp_dotenv"
migrate-vault) fi
if [ ! -f "$vault_env_file" ]; then
echo "Error: ${vault_env_file} not found — nothing to migrate." >&2 # Parse each KEY=VALUE and encrypt into secrets/<KEY>.enc
echo " Create .env.vault with vault secrets (GITHUB_TOKEN, deploy keys, etc.)" >&2 local count=0
local failed=0
while IFS='=' read -r key value; do
# Skip empty lines and comments
[[ -z "$key" || "$key" =~ ^[[:space:]]*# ]] && continue
# Trim whitespace from key
key=$(echo "$key" | xargs)
[ -z "$key" ] && continue
local enc_path="${secrets_dir}/${key}.enc"
if printf '%s' "$value" | age -r "$AGE_PUBLIC_KEY" -o "$enc_path" 2>/dev/null; then
# Verify round-trip
local check
check=$(age -d -i "$age_key_file" "$enc_path" 2>/dev/null) || { failed=$((failed + 1)); echo " FAIL (verify): ${key}" >&2; continue; }
if [ "$check" = "$value" ]; then
echo " OK: ${key} -> secrets/${key}.enc"
count=$((count + 1))
else
echo " FAIL (mismatch): ${key}" >&2
failed=$((failed + 1))
fi
else
echo " FAIL (encrypt): ${key}" >&2
failed=$((failed + 1))
fi
done < "$tmp_dotenv"
rm -f "$tmp_dotenv"
if [ "$failed" -gt 0 ]; then
echo "Error: ${failed} secret(s) failed migration. Vault files NOT removed." >&2
exit 1 exit 1
fi fi
_secrets_ensure_sops
encrypt_env_file "$vault_env_file" "$vault_enc_file" if [ "$count" -eq 0 ]; then
# Verify decryption works before removing plaintext echo "Warning: no secrets found in vault file." >&2
if ! sops -d "$vault_enc_file" >/dev/null 2>&1; then else
echo "Error: failed to verify .env.vault.enc decryption" >&2 echo "Migrated ${count} secret(s) to secrets/*.enc"
rm -f "$vault_enc_file" # Remove old vault files on success
exit 1 rm -f "$vault_enc_file" "$vault_env_file"
echo "Removed: .env.vault.enc / .env.vault"
fi fi
rm -f "$vault_env_file"
echo "Migrated: .env.vault -> .env.vault.enc (plaintext removed)"
;; ;;
*) *)
cat <<EOF >&2 cat <<EOF >&2
Usage: disinto secrets <subcommand> Usage: disinto secrets <subcommand>
Individual secrets (secrets/<NAME>.enc): Secrets (secrets/<NAME>.enc — age-encrypted, one file per key):
add <NAME> Prompt for value, encrypt, store in secrets/<NAME>.enc add <NAME> Prompt for value, encrypt, store in secrets/<NAME>.enc
show <NAME> Decrypt and print an individual secret show <NAME> Decrypt and print a secret
remove <NAME> Remove a secret
list List all stored secrets
Agent secrets (.env.enc): Agent secrets (.env.enc — sops-encrypted dotenv):
edit Edit agent secrets (FORGE_TOKEN, CLAUDE_API_KEY, etc.) edit Edit agent secrets (FORGE_TOKEN, CLAUDE_API_KEY, etc.)
show Show decrypted agent secrets (no argument) show Show decrypted agent secrets (no argument)
migrate Encrypt .env -> .env.enc migrate Encrypt .env -> .env.enc
Vault secrets (.env.vault.enc): Migration:
edit-vault Edit vault secrets (GITHUB_TOKEN, deploy keys, etc.) migrate-from-vault Split .env.vault.enc into secrets/<KEY>.enc (one-shot)
show-vault Show decrypted vault secrets
migrate-vault Encrypt .env.vault -> .env.vault.enc
EOF EOF
exit 1 exit 1
;; ;;
@ -1303,7 +1600,8 @@ EOF
disinto_run() { disinto_run() {
local action_id="${1:?Usage: disinto run <action-id>}" local action_id="${1:?Usage: disinto run <action-id>}"
local compose_file="${FACTORY_ROOT}/docker-compose.yml" local compose_file="${FACTORY_ROOT}/docker-compose.yml"
local vault_enc="${FACTORY_ROOT}/.env.vault.enc" local secrets_dir="${FACTORY_ROOT}/secrets"
local age_key_file="${HOME}/.config/sops/age/keys.txt"
if [ ! -f "$compose_file" ]; then if [ ! -f "$compose_file" ]; then
echo "Error: docker-compose.yml not found" >&2 echo "Error: docker-compose.yml not found" >&2
@ -1311,29 +1609,42 @@ disinto_run() {
exit 1 exit 1
fi fi
if [ ! -f "$vault_enc" ]; then if [ ! -d "$secrets_dir" ]; then
echo "Error: .env.vault.enc not found — create vault secrets first" >&2 echo "Error: secrets/ directory not found — create secrets first" >&2
echo " Run 'disinto secrets migrate-vault' after creating .env.vault" >&2 echo " Run 'disinto secrets add <NAME>' to add secrets" >&2
exit 1 exit 1
fi fi
if ! command -v sops &>/dev/null; then if ! command -v age &>/dev/null; then
echo "Error: sops not found — required to decrypt vault secrets" >&2 echo "Error: age not found — required to decrypt secrets" >&2
exit 1 exit 1
fi fi
# Decrypt vault secrets to temp file if [ ! -f "$age_key_file" ]; then
echo "Error: age key not found at ${age_key_file}" >&2
exit 1
fi
# Decrypt all secrets/*.enc into a temp env file for the runner
local tmp_env local tmp_env
tmp_env=$(mktemp /tmp/disinto-vault-XXXXXX) tmp_env=$(mktemp /tmp/disinto-secrets-XXXXXX)
trap 'rm -f "$tmp_env"' EXIT trap 'rm -f "$tmp_env"' EXIT
if ! sops -d --output-type dotenv "$vault_enc" > "$tmp_env" 2>/dev/null; then local count=0
rm -f "$tmp_env" for enc_path in "${secrets_dir}"/*.enc; do
echo "Error: failed to decrypt .env.vault.enc" >&2 [ -f "$enc_path" ] || continue
exit 1 local key
fi key=$(basename "$enc_path" .enc)
local val
val=$(age -d -i "$age_key_file" "$enc_path" 2>/dev/null) || {
echo "Warning: failed to decrypt ${enc_path}" >&2
continue
}
printf '%s=%s\n' "$key" "$val" >> "$tmp_env"
count=$((count + 1))
done
echo "Vault secrets decrypted to tmpfile" echo "Decrypted ${count} secret(s) to tmpfile"
# Run action in ephemeral runner container # Run action in ephemeral runner container
local rc=0 local rc=0
@ -1404,21 +1715,96 @@ download_agent_binaries() {
# ── up command ──────────────────────────────────────────────────────────────── # ── up command ────────────────────────────────────────────────────────────────
# Regenerate a file idempotently: run the generator, compare output, backup if changed.
# Usage: _regen_file <target_file> <generator_fn> [args...]
_regen_file() {
local target="$1"; shift
local generator="$1"; shift
local basename
basename=$(basename "$target")
# Move existing file aside so the generator (which skips if file exists)
# produces a fresh copy.
local stashed=""
if [ -f "$target" ]; then
stashed=$(mktemp "${target}.stash.XXXXXX")
mv "$target" "$stashed"
fi
# Run the generator — it writes $target from scratch.
# If the generator fails, restore the stashed original so it is not stranded.
if ! "$generator" "$@"; then
if [ -n "$stashed" ]; then
mv "$stashed" "$target"
fi
return 1
fi
if [ -z "$stashed" ]; then
# No previous file — first generation
echo "regenerated: ${basename} (new)"
return
fi
if cmp -s "$stashed" "$target"; then
# Content unchanged — restore original to preserve mtime
mv "$stashed" "$target"
echo "unchanged: ${basename}"
else
# Content changed — keep new, save old as .prev
mv "$stashed" "${target}.prev"
echo "regenerated: ${basename} (previous saved as ${basename}.prev)"
fi
}
disinto_up() { disinto_up() {
local compose_file="${FACTORY_ROOT}/docker-compose.yml" local compose_file="${FACTORY_ROOT}/docker-compose.yml"
local caddyfile="${FACTORY_ROOT}/docker/Caddyfile"
if [ ! -f "$compose_file" ]; then if [ ! -f "$compose_file" ]; then
echo "Error: docker-compose.yml not found" >&2 echo "Error: docker-compose.yml not found" >&2
echo " Run 'disinto init <repo-url>' first (without --bare)" >&2 echo " Run 'disinto init <repo-url>' first (without --bare)" >&2
exit 1 exit 1
fi fi
# Pre-build: download binaries to docker/agents/bin/ to avoid network calls during docker build # Parse --no-regen flag; remaining args pass through to docker compose
local no_regen=false
local -a compose_args=()
for arg in "$@"; do
case "$arg" in
--no-regen) no_regen=true ;;
*) compose_args+=("$arg") ;;
esac
done
# ── Regenerate compose & Caddyfile from generators ──────────────────────
if [ "$no_regen" = true ]; then
echo "Warning: running with unmanaged compose — hand-edits will drift" >&2
else
# Determine forge_port from FORGE_URL (same logic as init)
local forge_url="${FORGE_URL:-http://localhost:3000}"
local forge_port
forge_port=$(printf '%s' "$forge_url" | sed -E 's|.*:([0-9]+)/?$|\1|')
forge_port="${forge_port:-3000}"
# Detect build mode from existing compose
local use_build=false
if grep -q '^\s*build:' "$compose_file"; then
use_build=true
fi
_regen_file "$compose_file" generate_compose "$forge_port" "$use_build"
_regen_file "$caddyfile" generate_caddyfile
fi
# Pre-build: download binaries only when compose uses local build
if grep -q '^\s*build:' "$compose_file"; then
echo "── Pre-build: downloading agent binaries ────────────────────────" echo "── Pre-build: downloading agent binaries ────────────────────────"
if ! download_agent_binaries; then if ! download_agent_binaries; then
echo "Error: failed to download agent binaries" >&2 echo "Error: failed to download agent binaries" >&2
exit 1 exit 1
fi fi
echo "" echo ""
fi
# Decrypt secrets to temp .env if SOPS available and .env.enc exists # Decrypt secrets to temp .env if SOPS available and .env.enc exists
local tmp_env="" local tmp_env=""
@ -1431,7 +1817,7 @@ disinto_up() {
echo "Decrypted secrets for compose" echo "Decrypted secrets for compose"
fi fi
docker compose -f "$compose_file" up -d "$@" docker compose -f "$compose_file" up -d --build --remove-orphans ${compose_args[@]+"${compose_args[@]}"}
echo "Stack is up" echo "Stack is up"
# Clean up temp .env (also handled by EXIT trap if compose fails) # Clean up temp .env (also handled by EXIT trap if compose fails)

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 4e53f508d9b36c60bd68ed5fc497fc8775fec79f --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Dev Agent # Dev Agent
**Role**: Implement issues autonomously — write code, push branches, address **Role**: Implement issues autonomously — write code, push branches, address
@ -55,6 +55,12 @@ PRs owned by other bot users (#374).
**Crash recovery**: on `PHASE:crashed` or non-zero exit, the worktree is **preserved** (not destroyed) for debugging. Location logged. Supervisor housekeeping removes stale crashed worktrees older than 24h. **Crash recovery**: on `PHASE:crashed` or non-zero exit, the worktree is **preserved** (not destroyed) for debugging. Location logged. Supervisor housekeeping removes stale crashed worktrees older than 24h.
**Polling loop isolation (#753)**: `docker/agents/entrypoint.sh` now tracks fast-poll PIDs
(`FAST_PIDS`) and calls `wait "${FAST_PIDS[@]}"` instead of `wait` (no-args). This means
long-running dev-agent sessions no longer block the loop from launching the next iteration's
fast polls — the loop only waits for review-poll and dev-poll (the fast agents), never for
the dev-agent subprocess itself.
**Lifecycle**: dev-poll.sh (invoked by polling loop, `check_active dev`) → dev-agent.sh → **Lifecycle**: dev-poll.sh (invoked by polling loop, `check_active dev`) → dev-agent.sh →
tmux session → phase file drives CI/review loop → merge + `mirror_push()` → close issue. tmux session → phase file drives CI/review loop → merge + `mirror_push()` → close issue.
On respawn after `PHASE:escalate`, the stale phase file is cleared first so the session On respawn after `PHASE:escalate`, the stale phase file is cleared first so the session

View file

@ -14,10 +14,10 @@ services:
- agent-data:/home/agent/data - agent-data:/home/agent/data
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${HOME}/.claude.json:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- CLAUDE_BIN_PLACEHOLDER:/usr/local/bin/claude:ro - ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${HOME}/.ssh:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${HOME}/.config/sops/age:/home/agent/.config/sops/age:ro - ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
environment: environment:
- FORGE_URL=http://forgejo:3000 - FORGE_URL=http://forgejo:3000
@ -30,6 +30,7 @@ services:
- FORGE_SUPERVISOR_TOKEN=${FORGE_SUPERVISOR_TOKEN:-} - FORGE_SUPERVISOR_TOKEN=${FORGE_SUPERVISOR_TOKEN:-}
- FORGE_PREDICTOR_TOKEN=${FORGE_PREDICTOR_TOKEN:-} - FORGE_PREDICTOR_TOKEN=${FORGE_PREDICTOR_TOKEN:-}
- FORGE_ARCHITECT_TOKEN=${FORGE_ARCHITECT_TOKEN:-} - FORGE_ARCHITECT_TOKEN=${FORGE_ARCHITECT_TOKEN:-}
- FORGE_FILER_TOKEN=${FORGE_FILER_TOKEN:-}
- FORGE_BOT_USERNAMES=${FORGE_BOT_USERNAMES:-} - FORGE_BOT_USERNAMES=${FORGE_BOT_USERNAMES:-}
- WOODPECKER_TOKEN=${WOODPECKER_TOKEN:-} - WOODPECKER_TOKEN=${WOODPECKER_TOKEN:-}
- CLAUDE_TIMEOUT=${CLAUDE_TIMEOUT:-7200} - CLAUDE_TIMEOUT=${CLAUDE_TIMEOUT:-7200}
@ -48,6 +49,13 @@ services:
- GARDENER_INTERVAL=${GARDENER_INTERVAL:-21600} - GARDENER_INTERVAL=${GARDENER_INTERVAL:-21600}
- ARCHITECT_INTERVAL=${ARCHITECT_INTERVAL:-21600} - ARCHITECT_INTERVAL=${ARCHITECT_INTERVAL:-21600}
- PLANNER_INTERVAL=${PLANNER_INTERVAL:-43200} - PLANNER_INTERVAL=${PLANNER_INTERVAL:-43200}
- SUPERVISOR_INTERVAL=${SUPERVISOR_INTERVAL:-1200}
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on: depends_on:
forgejo: forgejo:
condition: service_healthy condition: service_healthy
@ -69,10 +77,10 @@ services:
- agent-data:/home/agent/data - agent-data:/home/agent/data
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${HOME}/.claude.json:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- CLAUDE_BIN_PLACEHOLDER:/usr/local/bin/claude:ro - ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${HOME}/.ssh:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${HOME}/.config/sops/age:/home/agent/.config/sops/age:ro - ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
environment: environment:
- FORGE_URL=http://forgejo:3000 - FORGE_URL=http://forgejo:3000
@ -102,6 +110,80 @@ services:
- CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config} - CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}
- POLL_INTERVAL=${POLL_INTERVAL:-300} - POLL_INTERVAL=${POLL_INTERVAL:-300}
- AGENT_ROLES=dev - AGENT_ROLES=dev
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on:
forgejo:
condition: service_healthy
woodpecker:
condition: service_started
networks:
- disinto-net
agents-llama-all:
build:
context: .
dockerfile: docker/agents/Dockerfile
image: disinto/agents-llama:latest
container_name: disinto-agents-llama-all
restart: unless-stopped
profiles: ["agents-llama-all"]
security_opt:
- apparmor=unconfined
volumes:
- agent-data:/home/agent/data
- project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro
environment:
- FORGE_URL=http://forgejo:3000
- FORGE_REPO=${FORGE_REPO:-disinto-admin/disinto}
- FORGE_TOKEN=${FORGE_TOKEN_LLAMA:-}
- FORGE_PASS=${FORGE_PASS_LLAMA:-}
- FORGE_REVIEW_TOKEN=${FORGE_REVIEW_TOKEN:-}
- FORGE_PLANNER_TOKEN=${FORGE_PLANNER_TOKEN:-}
- FORGE_GARDENER_TOKEN=${FORGE_GARDENER_TOKEN:-}
- FORGE_VAULT_TOKEN=${FORGE_VAULT_TOKEN:-}
- FORGE_SUPERVISOR_TOKEN=${FORGE_SUPERVISOR_TOKEN:-}
- FORGE_PREDICTOR_TOKEN=${FORGE_PREDICTOR_TOKEN:-}
- FORGE_ARCHITECT_TOKEN=${FORGE_ARCHITECT_TOKEN:-}
- FORGE_FILER_TOKEN=${FORGE_FILER_TOKEN:-}
- FORGE_BOT_USERNAMES=${FORGE_BOT_USERNAMES:-}
- WOODPECKER_TOKEN=${WOODPECKER_TOKEN:-}
- CLAUDE_TIMEOUT=${CLAUDE_TIMEOUT:-7200}
- CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=${CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC:-1}
- CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60
- CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
- ANTHROPIC_BASE_URL=${ANTHROPIC_BASE_URL:-}
- FORGE_ADMIN_PASS=${FORGE_ADMIN_PASS:-}
- DISINTO_CONTAINER=1
- PROJECT_TOML=projects/disinto.toml
- PROJECT_NAME=${PROJECT_NAME:-project}
- PROJECT_REPO_ROOT=/home/agent/repos/${PROJECT_NAME:-project}
- WOODPECKER_DATA_DIR=/woodpecker-data
- WOODPECKER_REPO_ID=${WOODPECKER_REPO_ID:-}
- CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}
- POLL_INTERVAL=${POLL_INTERVAL:-300}
- GARDENER_INTERVAL=${GARDENER_INTERVAL:-21600}
- ARCHITECT_INTERVAL=${ARCHITECT_INTERVAL:-21600}
- PLANNER_INTERVAL=${PLANNER_INTERVAL:-43200}
- SUPERVISOR_INTERVAL=${SUPERVISOR_INTERVAL:-1200}
- AGENT_ROLES=review,dev,gardener,architect,planner,predictor,supervisor
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on: depends_on:
forgejo: forgejo:
condition: service_healthy condition: service_healthy
@ -121,9 +203,9 @@ services:
- /var/run/docker.sock:/var/run/docker.sock - /var/run/docker.sock:/var/run/docker.sock
- agent-data:/home/agent/data - agent-data:/home/agent/data
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${HOME}/.claude:/home/agent/.claude - ${CLAUDE_DIR:-${HOME}/.claude}:/home/agent/.claude
- /usr/local/bin/claude:/usr/local/bin/claude:ro - ${CLAUDE_BIN_DIR:-/usr/local/bin/claude}:/usr/local/bin/claude:ro
- ${HOME}/.ssh:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
env_file: env_file:
- .env - .env
@ -137,9 +219,9 @@ services:
- apparmor=unconfined - apparmor=unconfined
volumes: volumes:
- /var/run/docker.sock:/var/run/docker.sock - /var/run/docker.sock:/var/run/docker.sock
- /usr/local/bin/claude:/usr/local/bin/claude:ro - ${CLAUDE_BIN_DIR:-/usr/local/bin/claude}:/usr/local/bin/claude:ro
- ${HOME}/.claude.json:/root/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/root/.claude.json:ro
- ${HOME}/.claude:/root/.claude:ro - ${CLAUDE_DIR:-${HOME}/.claude}:/root/.claude:ro
- disinto-logs:/opt/disinto-logs - disinto-logs:/opt/disinto-logs
environment: environment:
- FORGE_SUPERVISOR_TOKEN=${FORGE_SUPERVISOR_TOKEN:-} - FORGE_SUPERVISOR_TOKEN=${FORGE_SUPERVISOR_TOKEN:-}
@ -155,6 +237,12 @@ services:
ports: ports:
- "80:80" - "80:80"
- "443:443" - "443:443"
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:2019/config/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
depends_on: depends_on:
- forgejo - forgejo
networks: networks:

View file

@ -28,6 +28,9 @@ RUN chmod +x /entrypoint.sh
# Entrypoint runs polling loop directly, dropping to agent user via gosu. # Entrypoint runs polling loop directly, dropping to agent user via gosu.
# All scripts execute as the agent user (UID 1000) while preserving env vars. # All scripts execute as the agent user (UID 1000) while preserving env vars.
VOLUME /home/agent/data
VOLUME /home/agent/repos
WORKDIR /home/agent/disinto WORKDIR /home/agent/disinto
ENTRYPOINT ["/entrypoint.sh"] ENTRYPOINT ["/entrypoint.sh"]

View file

@ -7,14 +7,15 @@ set -euo pipefail
# poll scripts. All Docker Compose env vars are inherited (PATH, FORGE_TOKEN, # poll scripts. All Docker Compose env vars are inherited (PATH, FORGE_TOKEN,
# ANTHROPIC_API_KEY, etc.). # ANTHROPIC_API_KEY, etc.).
# #
# AGENT_ROLES env var controls which scripts run: "review,dev,gardener,architect,planner,predictor" # AGENT_ROLES env var controls which scripts run: "review,dev,gardener,architect,planner,predictor,supervisor"
# (default: all six). Uses while-true loop with staggered intervals: # (default: all seven). Uses while-true loop with staggered intervals:
# - review-poll: every 5 minutes (offset by 0s) # - review-poll: every 5 minutes (offset by 0s)
# - dev-poll: every 5 minutes (offset by 2 minutes) # - dev-poll: every 5 minutes (offset by 2 minutes)
# - gardener: every GARDENER_INTERVAL seconds (default: 21600 = 6 hours) # - gardener: every GARDENER_INTERVAL seconds (default: 21600 = 6 hours)
# - architect: every ARCHITECT_INTERVAL seconds (default: 21600 = 6 hours) # - architect: every ARCHITECT_INTERVAL seconds (default: 21600 = 6 hours)
# - planner: every PLANNER_INTERVAL seconds (default: 43200 = 12 hours) # - planner: every PLANNER_INTERVAL seconds (default: 43200 = 12 hours)
# - predictor: every 24 hours (288 iterations * 5 min) # - predictor: every 24 hours (288 iterations * 5 min)
# - supervisor: every SUPERVISOR_INTERVAL seconds (default: 1200 = 20 min)
DISINTO_BAKED="/home/agent/disinto" DISINTO_BAKED="/home/agent/disinto"
DISINTO_LIVE="/home/agent/repos/_factory" DISINTO_LIVE="/home/agent/repos/_factory"
@ -328,7 +329,7 @@ init_state_dir
# Parse AGENT_ROLES env var (default: all agents) # Parse AGENT_ROLES env var (default: all agents)
# Expected format: comma-separated list like "review,dev,gardener" # Expected format: comma-separated list like "review,dev,gardener"
AGENT_ROLES="${AGENT_ROLES:-review,dev,gardener,architect,planner,predictor}" AGENT_ROLES="${AGENT_ROLES:-review,dev,gardener,architect,planner,predictor,supervisor}"
log "Agent roles configured: ${AGENT_ROLES}" log "Agent roles configured: ${AGENT_ROLES}"
# Poll interval in seconds (5 minutes default) # Poll interval in seconds (5 minutes default)
@ -338,9 +339,10 @@ POLL_INTERVAL="${POLL_INTERVAL:-300}"
GARDENER_INTERVAL="${GARDENER_INTERVAL:-21600}" GARDENER_INTERVAL="${GARDENER_INTERVAL:-21600}"
ARCHITECT_INTERVAL="${ARCHITECT_INTERVAL:-21600}" ARCHITECT_INTERVAL="${ARCHITECT_INTERVAL:-21600}"
PLANNER_INTERVAL="${PLANNER_INTERVAL:-43200}" PLANNER_INTERVAL="${PLANNER_INTERVAL:-43200}"
SUPERVISOR_INTERVAL="${SUPERVISOR_INTERVAL:-1200}"
log "Entering polling loop (interval: ${POLL_INTERVAL}s, roles: ${AGENT_ROLES})" log "Entering polling loop (interval: ${POLL_INTERVAL}s, roles: ${AGENT_ROLES})"
log "Gardener interval: ${GARDENER_INTERVAL}s, Architect interval: ${ARCHITECT_INTERVAL}s, Planner interval: ${PLANNER_INTERVAL}s" log "Gardener interval: ${GARDENER_INTERVAL}s, Architect interval: ${ARCHITECT_INTERVAL}s, Planner interval: ${PLANNER_INTERVAL}s, Supervisor interval: ${SUPERVISOR_INTERVAL}s"
# Main polling loop using iteration counter for gardener scheduling # Main polling loop using iteration counter for gardener scheduling
iteration=0 iteration=0
@ -385,11 +387,13 @@ print(cfg.get('primary_branch', 'main'))
log "Processing project TOML: ${toml}" log "Processing project TOML: ${toml}"
# --- Fast agents: run in background, wait before slow agents --- # --- Fast agents: run in background, wait before slow agents ---
FAST_PIDS=()
# Review poll (every iteration) # Review poll (every iteration)
if [[ ",${AGENT_ROLES}," == *",review,"* ]]; then if [[ ",${AGENT_ROLES}," == *",review,"* ]]; then
log "Running review-poll (iteration ${iteration}) for ${toml}" log "Running review-poll (iteration ${iteration}) for ${toml}"
gosu agent bash -c "cd ${DISINTO_DIR} && bash review/review-poll.sh \"${toml}\"" >> "${DISINTO_LOG_DIR}/review-poll.log" 2>&1 & gosu agent bash -c "cd ${DISINTO_DIR} && bash review/review-poll.sh \"${toml}\"" >> "${DISINTO_LOG_DIR}/review-poll.log" 2>&1 &
FAST_PIDS+=($!)
fi fi
sleep 2 # stagger fast polls sleep 2 # stagger fast polls
@ -398,10 +402,14 @@ print(cfg.get('primary_branch', 'main'))
if [[ ",${AGENT_ROLES}," == *",dev,"* ]]; then if [[ ",${AGENT_ROLES}," == *",dev,"* ]]; then
log "Running dev-poll (iteration ${iteration}) for ${toml}" log "Running dev-poll (iteration ${iteration}) for ${toml}"
gosu agent bash -c "cd ${DISINTO_DIR} && bash dev/dev-poll.sh \"${toml}\"" >> "${DISINTO_LOG_DIR}/dev-poll.log" 2>&1 & gosu agent bash -c "cd ${DISINTO_DIR} && bash dev/dev-poll.sh \"${toml}\"" >> "${DISINTO_LOG_DIR}/dev-poll.log" 2>&1 &
FAST_PIDS+=($!)
fi fi
# Wait for fast polls to finish before launching slow agents # Wait only for THIS iteration's fast polls — long-running gardener/dev-agent
wait # from prior iterations must not block us.
if [ ${#FAST_PIDS[@]} -gt 0 ]; then
wait "${FAST_PIDS[@]}"
fi
# --- Slow agents: run in background with pgrep guard --- # --- Slow agents: run in background with pgrep guard ---
@ -457,6 +465,19 @@ print(cfg.get('primary_branch', 'main'))
fi fi
fi fi
fi fi
# Supervisor (interval configurable via SUPERVISOR_INTERVAL env var, default 20 min)
if [[ ",${AGENT_ROLES}," == *",supervisor,"* ]]; then
supervisor_iteration=$((iteration * POLL_INTERVAL))
if [ $((supervisor_iteration % SUPERVISOR_INTERVAL)) -eq 0 ] && [ "$now" -ge "$supervisor_iteration" ]; then
if ! pgrep -f "supervisor-run.sh" >/dev/null; then
log "Running supervisor (iteration ${iteration}, ${SUPERVISOR_INTERVAL}s interval) for ${toml}"
gosu agent bash -c "cd ${DISINTO_DIR} && bash supervisor/supervisor-run.sh \"${toml}\"" >> "${DISINTO_LOG_DIR}/supervisor.log" 2>&1 &
else
log "Skipping supervisor — already running"
fi
fi
fi
done done
sleep "${POLL_INTERVAL}" sleep "${POLL_INTERVAL}"

View file

@ -30,6 +30,6 @@ WORKDIR /var/chat
EXPOSE 8080 EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/')" || exit 1 CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')" || exit 1
ENTRYPOINT ["/entrypoint-chat.sh"] ENTRYPOINT ["/entrypoint-chat.sh"]

View file

@ -481,6 +481,14 @@ class ChatHandler(BaseHTTPRequestHandler):
parsed = urlparse(self.path) parsed = urlparse(self.path)
path = parsed.path path = parsed.path
# Health endpoint (no auth required) — used by Docker healthcheck
if path == "/health":
self.send_response(200)
self.send_header("Content-Type", "text/plain")
self.end_headers()
self.wfile.write(b"ok\n")
return
# Verify endpoint for Caddy forward_auth (#709) # Verify endpoint for Caddy forward_auth (#709)
if path == "/chat/auth/verify": if path == "/chat/auth/verify":
self.handle_auth_verify() self.handle_auth_verify()

View file

@ -1,4 +1,7 @@
FROM caddy:latest FROM caddy:latest
RUN apk add --no-cache bash jq curl git docker-cli python3 openssh-client autossh RUN apk add --no-cache bash jq curl git docker-cli python3 openssh-client autossh
COPY entrypoint-edge.sh /usr/local/bin/entrypoint-edge.sh COPY entrypoint-edge.sh /usr/local/bin/entrypoint-edge.sh
VOLUME /data
ENTRYPOINT ["bash", "/usr/local/bin/entrypoint-edge.sh"] ENTRYPOINT ["bash", "/usr/local/bin/entrypoint-edge.sh"]

View file

@ -8,8 +8,8 @@
# 2. Scan vault/actions/ for TOML files without .result.json # 2. Scan vault/actions/ for TOML files without .result.json
# 3. Verify TOML arrived via merged PR with admin merger (Forgejo API) # 3. Verify TOML arrived via merged PR with admin merger (Forgejo API)
# 4. Validate TOML using vault-env.sh validator # 4. Validate TOML using vault-env.sh validator
# 5. Decrypt .env.vault.enc and extract only declared secrets # 5. Decrypt declared secrets via load_secret (lib/env.sh)
# 6. Launch: docker run --rm disinto/agents:latest <action-id> # 6. Launch: delegate to _launch_runner_{docker,nomad} backend
# 7. Write <action-id>.result.json with exit code, timestamp, logs summary # 7. Write <action-id>.result.json with exit code, timestamp, logs summary
# #
# Part of #76. # Part of #76.
@ -19,7 +19,7 @@ set -euo pipefail
# Resolve script root (parent of lib/) # Resolve script root (parent of lib/)
SCRIPT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" SCRIPT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
# Source shared environment # Source shared environment (provides load_secret, log helpers, etc.)
source "${SCRIPT_ROOT}/../lib/env.sh" source "${SCRIPT_ROOT}/../lib/env.sh"
# Project TOML location: prefer mounted path, fall back to cloned path # Project TOML location: prefer mounted path, fall back to cloned path
@ -27,26 +27,18 @@ source "${SCRIPT_ROOT}/../lib/env.sh"
# the shallow clone only has .toml.example files. # the shallow clone only has .toml.example files.
PROJECTS_DIR="${PROJECTS_DIR:-${FACTORY_ROOT:-/opt/disinto}-projects}" PROJECTS_DIR="${PROJECTS_DIR:-${FACTORY_ROOT:-/opt/disinto}-projects}"
# Load vault secrets after env.sh (env.sh unsets them for agent security) # -----------------------------------------------------------------------------
# Vault secrets must be available to the dispatcher # Backend selection: DISPATCHER_BACKEND={docker,nomad}
if [ -f "$FACTORY_ROOT/.env.vault.enc" ] && command -v sops &>/dev/null; then # Default: docker. nomad lands as a pure addition during migration Step 5.
set -a # -----------------------------------------------------------------------------
eval "$(sops -d --output-type dotenv "$FACTORY_ROOT/.env.vault.enc" 2>/dev/null)" \ DISPATCHER_BACKEND="${DISPATCHER_BACKEND:-docker}"
|| echo "Warning: failed to decrypt .env.vault.enc — vault secrets not loaded" >&2
set +a
elif [ -f "$FACTORY_ROOT/.env.vault" ]; then
set -a
# shellcheck source=/dev/null
source "$FACTORY_ROOT/.env.vault"
set +a
fi
# Ops repo location (vault/actions directory) # Ops repo location (vault/actions directory)
OPS_REPO_ROOT="${OPS_REPO_ROOT:-/home/debian/disinto-ops}" OPS_REPO_ROOT="${OPS_REPO_ROOT:-/home/debian/disinto-ops}"
VAULT_ACTIONS_DIR="${OPS_REPO_ROOT}/vault/actions" VAULT_ACTIONS_DIR="${OPS_REPO_ROOT}/vault/actions"
# Vault action validation # Vault action validation
VAULT_ENV="${SCRIPT_ROOT}/../vault/vault-env.sh" VAULT_ENV="${SCRIPT_ROOT}/../action-vault/vault-env.sh"
# Admin users who can merge vault PRs (from issue #77) # Admin users who can merge vault PRs (from issue #77)
# Comma-separated list of Forgejo usernames with admin role # Comma-separated list of Forgejo usernames with admin role
@ -350,73 +342,113 @@ get_dispatch_mode() {
fi fi
} }
# Write result file for an action # Commit result.json to the ops repo via git push (portable, no bind-mount).
# Usage: write_result <action_id> <exit_code> <logs> #
write_result() { # Clones the ops repo into a scratch directory, writes the result file,
# commits as vault-bot, and pushes to the primary branch.
# Idempotent: skips if result.json already exists upstream.
# Retries on push conflict with rebase-and-push (handles concurrent merges).
#
# Usage: commit_result_via_git <action_id> <exit_code> <logs>
commit_result_via_git() {
local action_id="$1" local action_id="$1"
local exit_code="$2" local exit_code="$2"
local logs="$3" local logs="$3"
local result_file="${VAULT_ACTIONS_DIR}/${action_id}.result.json" local result_relpath="vault/actions/${action_id}.result.json"
local ops_clone_url="${FORGE_URL}/${FORGE_OPS_REPO}.git"
local branch="${PRIMARY_BRANCH:-main}"
local scratch_dir
scratch_dir=$(mktemp -d /tmp/dispatcher-result-XXXXXX)
# shellcheck disable=SC2064
trap "rm -rf '${scratch_dir}'" RETURN
# Shallow clone of the ops repo — only the primary branch
if ! git clone --depth 1 --branch "$branch" \
"$ops_clone_url" "$scratch_dir" 2>/dev/null; then
log "ERROR: Failed to clone ops repo for result commit (action ${action_id})"
return 1
fi
# Idempotency: skip if result.json already exists upstream
if [ -f "${scratch_dir}/${result_relpath}" ]; then
log "Result already exists upstream for ${action_id} — skipping commit"
return 0
fi
# Configure git identity as vault-bot
git -C "$scratch_dir" config user.name "vault-bot"
git -C "$scratch_dir" config user.email "vault-bot@disinto.local"
# Truncate logs if too long (keep last 1000 chars) # Truncate logs if too long (keep last 1000 chars)
if [ ${#logs} -gt 1000 ]; then if [ ${#logs} -gt 1000 ]; then
logs="${logs: -1000}" logs="${logs: -1000}"
fi fi
# Write result JSON # Write result JSON via jq (never string-interpolate into JSON)
mkdir -p "$(dirname "${scratch_dir}/${result_relpath}")"
jq -n \ jq -n \
--arg id "$action_id" \ --arg id "$action_id" \
--argjson exit_code "$exit_code" \ --argjson exit_code "$exit_code" \
--arg timestamp "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" \ --arg timestamp "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" \
--arg logs "$logs" \ --arg logs "$logs" \
'{id: $id, exit_code: $exit_code, timestamp: $timestamp, logs: $logs}' \ '{id: $id, exit_code: $exit_code, timestamp: $timestamp, logs: $logs}' \
> "$result_file" > "${scratch_dir}/${result_relpath}"
log "Result written: ${result_file}" git -C "$scratch_dir" add "$result_relpath"
git -C "$scratch_dir" commit -q -m "vault: result for ${action_id}"
# Push with retry on conflict (rebase-and-push pattern).
# Common case: admin merges another action PR between our clone and push.
local attempt
for attempt in 1 2 3; do
if git -C "$scratch_dir" push origin "$branch" 2>/dev/null; then
log "Result committed and pushed for ${action_id} (attempt ${attempt})"
return 0
fi
log "Push conflict for ${action_id} (attempt ${attempt}/3) — rebasing"
if ! git -C "$scratch_dir" pull --rebase origin "$branch" 2>/dev/null; then
# Rebase conflict — check if result was pushed by another process
git -C "$scratch_dir" rebase --abort 2>/dev/null || true
if git -C "$scratch_dir" fetch origin "$branch" 2>/dev/null && \
git -C "$scratch_dir" show "origin/${branch}:${result_relpath}" >/dev/null 2>&1; then
log "Result already exists upstream for ${action_id} (pushed by another process)"
return 0
fi
fi
done
log "ERROR: Failed to push result for ${action_id} after 3 attempts"
return 1
} }
# Launch runner for the given action # Write result file for an action via git push to the ops repo.
# Usage: launch_runner <toml_file> # Usage: write_result <action_id> <exit_code> <logs>
launch_runner() { write_result() {
local toml_file="$1" local action_id="$1"
local action_id local exit_code="$2"
action_id=$(basename "$toml_file" .toml) local logs="$3"
log "Launching runner for action: ${action_id}" commit_result_via_git "$action_id" "$exit_code" "$logs"
}
# Validate TOML # -----------------------------------------------------------------------------
if ! validate_action "$toml_file"; then # Pluggable launcher backends
log "ERROR: Action validation failed for ${action_id}" # -----------------------------------------------------------------------------
write_result "$action_id" 1 "Validation failed: see logs above"
return 1
fi
# Check dispatch mode to determine if admin verification is needed # _launch_runner_docker ACTION_ID SECRETS_CSV MOUNTS_CSV
local dispatch_mode #
dispatch_mode=$(get_dispatch_mode "$toml_file") # Builds and executes a `docker run` command for the vault runner.
# Secrets are resolved via load_secret (lib/env.sh).
# Returns: exit code of the docker run. Stdout/stderr are captured to a temp
# log file whose path is printed to stdout (caller reads it).
_launch_runner_docker() {
local action_id="$1"
local secrets_csv="$2"
local mounts_csv="$3"
if [ "$dispatch_mode" = "direct" ]; then
log "Action ${action_id}: tier=${VAULT_TIER:-unknown}, dispatch_mode=${dispatch_mode} — skipping admin merge verification (direct commit)"
else
# Verify admin merge for PR-based actions
log "Action ${action_id}: tier=${VAULT_TIER:-unknown}, dispatch_mode=${dispatch_mode} — verifying admin merge"
if ! verify_admin_merged "$toml_file"; then
log "ERROR: Admin merge verification failed for ${action_id}"
write_result "$action_id" 1 "Admin merge verification failed: see logs above"
return 1
fi
log "Action ${action_id}: admin merge verified"
fi
# Extract secrets from validated action
local secrets_array
secrets_array="${VAULT_ACTION_SECRETS:-}"
# Build docker run command (self-contained, no compose context needed).
# The edge container has the Docker socket but not the host's compose project,
# so docker compose run would fail with exit 125. docker run is self-contained:
# the dispatcher knows the image, network, env vars, and entrypoint.
local -a cmd=(docker run --rm local -a cmd=(docker run --rm
--name "vault-runner-${action_id}" --name "vault-runner-${action_id}"
--network host --network host
@ -451,29 +483,27 @@ launch_runner() {
cmd+=(-v "${runtime_home}/.claude.json:/home/agent/.claude.json:ro") cmd+=(-v "${runtime_home}/.claude.json:/home/agent/.claude.json:ro")
fi fi
# Add environment variables for secrets (if any declared) # Add environment variables for secrets (resolved via load_secret)
if [ -n "$secrets_array" ]; then if [ -n "$secrets_csv" ]; then
for secret in $secrets_array; do local secret
for secret in $(echo "$secrets_csv" | tr ',' ' '); do
secret=$(echo "$secret" | xargs) secret=$(echo "$secret" | xargs)
if [ -n "$secret" ]; then [ -n "$secret" ] || continue
# Verify secret exists in vault local secret_val
if [ -z "${!secret:-}" ]; then secret_val=$(load_secret "$secret") || true
log "ERROR: Secret '${secret}' not found in vault for action ${action_id}" if [ -z "$secret_val" ]; then
write_result "$action_id" 1 "Secret not found in vault: ${secret}" log "ERROR: Secret '${secret}' could not be resolved for action ${action_id}"
write_result "$action_id" 1 "Secret not found: ${secret}"
return 1 return 1
fi fi
cmd+=(-e "${secret}=${!secret}") cmd+=(-e "${secret}=${secret_val}")
fi
done done
else
log "Action ${action_id} has no secrets declared — runner will execute without extra env vars"
fi fi
# Add volume mounts for file-based credentials (if any declared) # Add volume mounts for file-based credentials
local mounts_array if [ -n "$mounts_csv" ]; then
mounts_array="${VAULT_ACTION_MOUNTS:-}" local mount_alias
if [ -n "$mounts_array" ]; then for mount_alias in $(echo "$mounts_csv" | tr ',' ' '); do
for mount_alias in $mounts_array; do
mount_alias=$(echo "$mount_alias" | xargs) mount_alias=$(echo "$mount_alias" | xargs)
[ -n "$mount_alias" ] || continue [ -n "$mount_alias" ] || continue
case "$mount_alias" in case "$mount_alias" in
@ -501,7 +531,7 @@ launch_runner() {
# Image and entrypoint arguments: runner entrypoint + action-id # Image and entrypoint arguments: runner entrypoint + action-id
cmd+=(disinto/agents:latest /home/agent/disinto/docker/runner/entrypoint-runner.sh "$action_id") cmd+=(disinto/agents:latest /home/agent/disinto/docker/runner/entrypoint-runner.sh "$action_id")
log "Running: docker run --rm vault-runner-${action_id} (secrets: ${secrets_array:-none}, mounts: ${mounts_array:-none})" log "Running: docker run --rm vault-runner-${action_id} (secrets: ${secrets_csv:-none}, mounts: ${mounts_csv:-none})"
# Create temp file for logs # Create temp file for logs
local log_file local log_file
@ -509,7 +539,6 @@ launch_runner() {
trap 'rm -f "$log_file"' RETURN trap 'rm -f "$log_file"' RETURN
# Execute with array expansion (safe from shell injection) # Execute with array expansion (safe from shell injection)
# Capture stdout and stderr to log file
"${cmd[@]}" > "$log_file" 2>&1 "${cmd[@]}" > "$log_file" 2>&1
local exit_code=$? local exit_code=$?
@ -529,6 +558,137 @@ launch_runner() {
return $exit_code return $exit_code
} }
# _launch_runner_nomad ACTION_ID SECRETS_CSV MOUNTS_CSV
#
# Nomad backend stub — will be implemented in migration Step 5.
_launch_runner_nomad() {
echo "nomad backend not yet implemented" >&2
return 1
}
# Launch runner for the given action (backend-agnostic orchestrator)
# Usage: launch_runner <toml_file>
launch_runner() {
local toml_file="$1"
local action_id
action_id=$(basename "$toml_file" .toml)
log "Launching runner for action: ${action_id}"
# Validate TOML
if ! validate_action "$toml_file"; then
log "ERROR: Action validation failed for ${action_id}"
write_result "$action_id" 1 "Validation failed: see logs above"
return 1
fi
# Check dispatch mode to determine if admin verification is needed
local dispatch_mode
dispatch_mode=$(get_dispatch_mode "$toml_file")
if [ "$dispatch_mode" = "direct" ]; then
log "Action ${action_id}: tier=${VAULT_TIER:-unknown}, dispatch_mode=${dispatch_mode} — skipping admin merge verification (direct commit)"
else
# Verify admin merge for PR-based actions
log "Action ${action_id}: tier=${VAULT_TIER:-unknown}, dispatch_mode=${dispatch_mode} — verifying admin merge"
if ! verify_admin_merged "$toml_file"; then
log "ERROR: Admin merge verification failed for ${action_id}"
write_result "$action_id" 1 "Admin merge verification failed: see logs above"
return 1
fi
log "Action ${action_id}: admin merge verified"
fi
# Build CSV lists from validated action metadata
local secrets_csv=""
if [ -n "${VAULT_ACTION_SECRETS:-}" ]; then
# Convert space-separated to comma-separated
secrets_csv=$(echo "${VAULT_ACTION_SECRETS}" | xargs | tr ' ' ',')
fi
local mounts_csv=""
if [ -n "${VAULT_ACTION_MOUNTS:-}" ]; then
mounts_csv=$(echo "${VAULT_ACTION_MOUNTS}" | xargs | tr ' ' ',')
fi
# Delegate to the selected backend
"_launch_runner_${DISPATCHER_BACKEND}" "$action_id" "$secrets_csv" "$mounts_csv"
}
# -----------------------------------------------------------------------------
# Pluggable sidecar launcher (reproduce / triage / verify)
# -----------------------------------------------------------------------------
# _dispatch_sidecar_docker CONTAINER_NAME ISSUE_NUM PROJECT_TOML IMAGE [FORMULA]
#
# Launches a sidecar container via docker run (background, pid-tracked).
# Prints the background PID to stdout.
_dispatch_sidecar_docker() {
local container_name="$1"
local issue_number="$2"
local project_toml="$3"
local image="$4"
local formula="${5:-}"
local -a cmd=(docker run --rm
--name "${container_name}"
--network host
--security-opt apparmor=unconfined
-v /var/run/docker.sock:/var/run/docker.sock
-v agent-data:/home/agent/data
-v project-repos:/home/agent/repos
-e "FORGE_URL=${FORGE_URL}"
-e "FORGE_TOKEN=${FORGE_TOKEN}"
-e "FORGE_REPO=${FORGE_REPO}"
-e "PRIMARY_BRANCH=${PRIMARY_BRANCH:-main}"
-e DISINTO_CONTAINER=1
)
# Set formula if provided
if [ -n "$formula" ]; then
cmd+=(-e "DISINTO_FORMULA=${formula}")
fi
# Pass through ANTHROPIC_API_KEY if set
if [ -n "${ANTHROPIC_API_KEY:-}" ]; then
cmd+=(-e "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}")
fi
# Mount shared Claude config dir and ~/.ssh from the runtime user's home
local runtime_home="${HOME:-/home/debian}"
if [ -d "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}" ]; then
cmd+=(-v "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}")
cmd+=(-e "CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}")
fi
if [ -f "${runtime_home}/.claude.json" ]; then
cmd+=(-v "${runtime_home}/.claude.json:/home/agent/.claude.json:ro")
fi
if [ -d "${runtime_home}/.ssh" ]; then
cmd+=(-v "${runtime_home}/.ssh:/home/agent/.ssh:ro")
fi
if [ -f /usr/local/bin/claude ]; then
cmd+=(-v /usr/local/bin/claude:/usr/local/bin/claude:ro)
fi
# Mount the project TOML into the container at a stable path
local container_toml="/home/agent/project.toml"
cmd+=(-v "${project_toml}:${container_toml}:ro")
cmd+=("${image}" "$container_toml" "$issue_number")
# Launch in background
"${cmd[@]}" &
echo $!
}
# _dispatch_sidecar_nomad CONTAINER_NAME ISSUE_NUM PROJECT_TOML IMAGE [FORMULA]
#
# Nomad sidecar backend stub — will be implemented in migration Step 5.
_dispatch_sidecar_nomad() {
echo "nomad backend not yet implemented" >&2
return 1
}
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Reproduce dispatch — launch sidecar for bug-report issues # Reproduce dispatch — launch sidecar for bug-report issues
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
@ -607,52 +767,13 @@ dispatch_reproduce() {
log "Dispatching reproduce-agent for issue #${issue_number} (project: ${project_toml})" log "Dispatching reproduce-agent for issue #${issue_number} (project: ${project_toml})"
# Build docker run command using array (safe from injection) local bg_pid
local -a cmd=(docker run --rm bg_pid=$("_dispatch_sidecar_${DISPATCHER_BACKEND}" \
--name "disinto-reproduce-${issue_number}" "disinto-reproduce-${issue_number}" \
--network host "$issue_number" \
--security-opt apparmor=unconfined "$project_toml" \
-v /var/run/docker.sock:/var/run/docker.sock "disinto-reproduce:latest")
-v agent-data:/home/agent/data
-v project-repos:/home/agent/repos
-e "FORGE_URL=${FORGE_URL}"
-e "FORGE_TOKEN=${FORGE_TOKEN}"
-e "FORGE_REPO=${FORGE_REPO}"
-e "PRIMARY_BRANCH=${PRIMARY_BRANCH:-main}"
-e DISINTO_CONTAINER=1
)
# Pass through ANTHROPIC_API_KEY if set
if [ -n "${ANTHROPIC_API_KEY:-}" ]; then
cmd+=(-e "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}")
fi
# Mount shared Claude config dir and ~/.ssh from the runtime user's home if available
local runtime_home="${HOME:-/home/debian}"
if [ -d "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}" ]; then
cmd+=(-v "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}")
cmd+=(-e "CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}")
fi
if [ -f "${runtime_home}/.claude.json" ]; then
cmd+=(-v "${runtime_home}/.claude.json:/home/agent/.claude.json:ro")
fi
if [ -d "${runtime_home}/.ssh" ]; then
cmd+=(-v "${runtime_home}/.ssh:/home/agent/.ssh:ro")
fi
# Mount claude CLI binary if present on host
if [ -f /usr/local/bin/claude ]; then
cmd+=(-v /usr/local/bin/claude:/usr/local/bin/claude:ro)
fi
# Mount the project TOML into the container at a stable path
local container_toml="/home/agent/project.toml"
cmd+=(-v "${project_toml}:${container_toml}:ro")
cmd+=(disinto-reproduce:latest "$container_toml" "$issue_number")
# Launch in background; write pid-file so we don't double-launch
"${cmd[@]}" &
local bg_pid=$!
echo "$bg_pid" > "$(_reproduce_lockfile "$issue_number")" echo "$bg_pid" > "$(_reproduce_lockfile "$issue_number")"
log "Reproduce container launched (pid ${bg_pid}) for issue #${issue_number}" log "Reproduce container launched (pid ${bg_pid}) for issue #${issue_number}"
} }
@ -732,53 +853,14 @@ dispatch_triage() {
log "Dispatching triage-agent for issue #${issue_number} (project: ${project_toml})" log "Dispatching triage-agent for issue #${issue_number} (project: ${project_toml})"
# Build docker run command using array (safe from injection) local bg_pid
local -a cmd=(docker run --rm bg_pid=$("_dispatch_sidecar_${DISPATCHER_BACKEND}" \
--name "disinto-triage-${issue_number}" "disinto-triage-${issue_number}" \
--network host "$issue_number" \
--security-opt apparmor=unconfined "$project_toml" \
-v /var/run/docker.sock:/var/run/docker.sock "disinto-reproduce:latest" \
-v agent-data:/home/agent/data "triage")
-v project-repos:/home/agent/repos
-e "FORGE_URL=${FORGE_URL}"
-e "FORGE_TOKEN=${FORGE_TOKEN}"
-e "FORGE_REPO=${FORGE_REPO}"
-e "PRIMARY_BRANCH=${PRIMARY_BRANCH:-main}"
-e DISINTO_CONTAINER=1
-e DISINTO_FORMULA=triage
)
# Pass through ANTHROPIC_API_KEY if set
if [ -n "${ANTHROPIC_API_KEY:-}" ]; then
cmd+=(-e "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}")
fi
# Mount shared Claude config dir and ~/.ssh from the runtime user's home if available
local runtime_home="${HOME:-/home/debian}"
if [ -d "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}" ]; then
cmd+=(-v "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}")
cmd+=(-e "CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}")
fi
if [ -f "${runtime_home}/.claude.json" ]; then
cmd+=(-v "${runtime_home}/.claude.json:/home/agent/.claude.json:ro")
fi
if [ -d "${runtime_home}/.ssh" ]; then
cmd+=(-v "${runtime_home}/.ssh:/home/agent/.ssh:ro")
fi
# Mount claude CLI binary if present on host
if [ -f /usr/local/bin/claude ]; then
cmd+=(-v /usr/local/bin/claude:/usr/local/bin/claude:ro)
fi
# Mount the project TOML into the container at a stable path
local container_toml="/home/agent/project.toml"
cmd+=(-v "${project_toml}:${container_toml}:ro")
cmd+=(disinto-reproduce:latest "$container_toml" "$issue_number")
# Launch in background; write pid-file so we don't double-launch
"${cmd[@]}" &
local bg_pid=$!
echo "$bg_pid" > "$(_triage_lockfile "$issue_number")" echo "$bg_pid" > "$(_triage_lockfile "$issue_number")"
log "Triage container launched (pid ${bg_pid}) for issue #${issue_number}" log "Triage container launched (pid ${bg_pid}) for issue #${issue_number}"
} }
@ -934,53 +1016,14 @@ dispatch_verify() {
log "Dispatching verification-agent for issue #${issue_number} (project: ${project_toml})" log "Dispatching verification-agent for issue #${issue_number} (project: ${project_toml})"
# Build docker run command using array (safe from injection) local bg_pid
local -a cmd=(docker run --rm bg_pid=$("_dispatch_sidecar_${DISPATCHER_BACKEND}" \
--name "disinto-verify-${issue_number}" "disinto-verify-${issue_number}" \
--network host "$issue_number" \
--security-opt apparmor=unconfined "$project_toml" \
-v /var/run/docker.sock:/var/run/docker.sock "disinto-reproduce:latest" \
-v agent-data:/home/agent/data "verify")
-v project-repos:/home/agent/repos
-e "FORGE_URL=${FORGE_URL}"
-e "FORGE_TOKEN=${FORGE_TOKEN}"
-e "FORGE_REPO=${FORGE_REPO}"
-e "PRIMARY_BRANCH=${PRIMARY_BRANCH:-main}"
-e DISINTO_CONTAINER=1
-e DISINTO_FORMULA=verify
)
# Pass through ANTHROPIC_API_KEY if set
if [ -n "${ANTHROPIC_API_KEY:-}" ]; then
cmd+=(-e "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}")
fi
# Mount shared Claude config dir and ~/.ssh from the runtime user's home if available
local runtime_home="${HOME:-/home/debian}"
if [ -d "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}" ]; then
cmd+=(-v "${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}")
cmd+=(-e "CLAUDE_CONFIG_DIR=${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}")
fi
if [ -f "${runtime_home}/.claude.json" ]; then
cmd+=(-v "${runtime_home}/.claude.json:/home/agent/.claude.json:ro")
fi
if [ -d "${runtime_home}/.ssh" ]; then
cmd+=(-v "${runtime_home}/.ssh:/home/agent/.ssh:ro")
fi
# Mount claude CLI binary if present on host
if [ -f /usr/local/bin/claude ]; then
cmd+=(-v /usr/local/bin/claude:/usr/local/bin/claude:ro)
fi
# Mount the project TOML into the container at a stable path
local container_toml="/home/agent/project.toml"
cmd+=(-v "${project_toml}:${container_toml}:ro")
cmd+=(disinto-reproduce:latest "$container_toml" "$issue_number")
# Launch in background; write pid-file so we don't double-launch
"${cmd[@]}" &
local bg_pid=$!
echo "$bg_pid" > "$(_verify_lockfile "$issue_number")" echo "$bg_pid" > "$(_verify_lockfile "$issue_number")"
log "Verification container launched (pid ${bg_pid}) for issue #${issue_number}" log "Verification container launched (pid ${bg_pid}) for issue #${issue_number}"
} }
@ -1002,10 +1045,25 @@ ensure_ops_repo() {
# Main dispatcher loop # Main dispatcher loop
main() { main() {
log "Starting dispatcher..." log "Starting dispatcher (backend=${DISPATCHER_BACKEND})..."
log "Polling ops repo: ${VAULT_ACTIONS_DIR}" log "Polling ops repo: ${VAULT_ACTIONS_DIR}"
log "Admin users: ${ADMIN_USERS}" log "Admin users: ${ADMIN_USERS}"
# Validate backend selection at startup
case "$DISPATCHER_BACKEND" in
docker) ;;
nomad)
log "ERROR: nomad backend not yet implemented"
echo "nomad backend not yet implemented" >&2
exit 1
;;
*)
log "ERROR: unknown DISPATCHER_BACKEND=${DISPATCHER_BACKEND}"
echo "unknown DISPATCHER_BACKEND=${DISPATCHER_BACKEND} (expected: docker, nomad)" >&2
exit 1
;;
esac
while true; do while true; do
# Refresh ops repo at the start of each poll cycle # Refresh ops repo at the start of each poll cycle
ensure_ops_repo ensure_ops_repo

View file

@ -173,6 +173,67 @@ PROJECT_TOML="${PROJECT_TOML:-projects/disinto.toml}"
sleep 1200 # 20 minutes sleep 1200 # 20 minutes
done) & done) &
# ── Load required secrets from secrets/*.enc (#777) ────────────────────
# Edge container declares its required secrets; missing ones cause a hard fail.
_AGE_KEY_FILE="${HOME}/.config/sops/age/keys.txt"
_SECRETS_DIR="/opt/disinto/secrets"
EDGE_REQUIRED_SECRETS="CADDY_SSH_KEY CADDY_SSH_HOST CADDY_SSH_USER CADDY_ACCESS_LOG"
_edge_decrypt_secret() {
local enc_path="${_SECRETS_DIR}/${1}.enc"
[ -f "$enc_path" ] || return 1
age -d -i "$_AGE_KEY_FILE" "$enc_path" 2>/dev/null
}
if [ -f "$_AGE_KEY_FILE" ] && [ -d "$_SECRETS_DIR" ]; then
_missing=""
for _secret_name in $EDGE_REQUIRED_SECRETS; do
_val=$(_edge_decrypt_secret "$_secret_name") || { _missing="${_missing} ${_secret_name}"; continue; }
export "$_secret_name=$_val"
done
if [ -n "$_missing" ]; then
echo "FATAL: required secrets missing from secrets/*.enc:${_missing}" >&2
echo " Run 'disinto secrets add <NAME>' for each missing secret." >&2
echo " If migrating from .env.vault.enc, run 'disinto secrets migrate-from-vault' first." >&2
exit 1
fi
echo "edge: loaded required secrets: ${EDGE_REQUIRED_SECRETS}" >&2
else
echo "FATAL: age key (${_AGE_KEY_FILE}) or secrets dir (${_SECRETS_DIR}) not found — cannot load required secrets" >&2
echo " Ensure age is installed and secrets/*.enc files are present." >&2
exit 1
fi
# Start daily engagement collection cron loop in background (#745)
# Runs collect-engagement.sh daily at ~23:50 UTC via a sleep loop that
# calculates seconds until the next 23:50 window. SSH key from secrets/*.enc (#777).
(while true; do
# Calculate seconds until next 23:50 UTC
_now=$(date -u +%s)
_target=$(date -u -d "today 23:50" +%s 2>/dev/null || date -u -d "23:50" +%s 2>/dev/null || echo 0)
if [ "$_target" -le "$_now" ]; then
_target=$(( _target + 86400 ))
fi
_sleep_secs=$(( _target - _now ))
echo "edge: collect-engagement scheduled in ${_sleep_secs}s (next 23:50 UTC)" >&2
sleep "$_sleep_secs"
_fetch_log="/tmp/caddy-access-log-fetch.log"
_ssh_key_file=$(mktemp)
printf '%s\n' "$CADDY_SSH_KEY" > "$_ssh_key_file"
chmod 0600 "$_ssh_key_file"
scp -i "$_ssh_key_file" -o StrictHostKeyChecking=accept-new -o ConnectTimeout=10 -o BatchMode=yes \
"${CADDY_SSH_USER}@${CADDY_SSH_HOST}:${CADDY_ACCESS_LOG}" \
"$_fetch_log" 2>&1 | tee -a /opt/disinto-logs/collect-engagement.log || true
rm -f "$_ssh_key_file"
if [ -s "$_fetch_log" ]; then
CADDY_ACCESS_LOG="$_fetch_log" bash /opt/disinto/site/collect-engagement.sh 2>&1 \
| tee -a /opt/disinto-logs/collect-engagement.log || true
else
echo "edge: collect-engagement: fetched log is empty, skipping parse" >&2
fi
rm -f "$_fetch_log"
done) &
# Caddy as main process — run in foreground via wait so background jobs survive # Caddy as main process — run in foreground via wait so background jobs survive
# (exec replaces the shell, which can orphan backgrounded subshells) # (exec replaces the shell, which can orphan backgrounded subshells)
caddy run --config /etc/caddy/Caddyfile --adapter caddyfile & caddy run --config /etc/caddy/Caddyfile --adapter caddyfile &

View file

@ -7,5 +7,8 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
RUN useradd -m -u 1000 -s /bin/bash agent RUN useradd -m -u 1000 -s /bin/bash agent
COPY docker/reproduce/entrypoint-reproduce.sh /entrypoint-reproduce.sh COPY docker/reproduce/entrypoint-reproduce.sh /entrypoint-reproduce.sh
RUN chmod +x /entrypoint-reproduce.sh RUN chmod +x /entrypoint-reproduce.sh
VOLUME /home/agent/data
VOLUME /home/agent/repos
WORKDIR /home/agent WORKDIR /home/agent
ENTRYPOINT ["/entrypoint-reproduce.sh"] ENTRYPOINT ["/entrypoint-reproduce.sh"]

View file

@ -26,8 +26,8 @@ The `main` branch on the ops repo (`johba/disinto-ops`) is protected via Forgejo
## Vault PR Lifecycle ## Vault PR Lifecycle
1. **Request** — Agent calls `lib/vault.sh:vault_request()` with action TOML content 1. **Request** — Agent calls `lib/action-vault.sh:vault_request()` with action TOML content
2. **Validation** — TOML is validated against the schema in `vault/vault-env.sh` 2. **Validation** — TOML is validated against the schema in `action-vault/vault-env.sh`
3. **PR Creation** — A PR is created on `disinto-ops` with: 3. **PR Creation** — A PR is created on `disinto-ops` with:
- Branch: `vault/<action-id>` - Branch: `vault/<action-id>`
- Title: `vault: <action-id>` - Title: `vault: <action-id>`
@ -90,12 +90,12 @@ To verify the protection is working:
- #73 — Vault redesign proposal - #73 — Vault redesign proposal
- #74 — Vault action TOML schema - #74 — Vault action TOML schema
- #75 — Vault PR creation helper (`lib/vault.sh`) - #75 — Vault PR creation helper (`lib/action-vault.sh`)
- #76 — Dispatcher rewrite (poll for merged vault PRs) - #76 — Dispatcher rewrite (poll for merged vault PRs)
- #77 — Branch protection on ops repo (this issue) - #77 — Branch protection on ops repo (this issue)
## See Also ## See Also
- [`lib/vault.sh`](../lib/vault.sh) — Vault PR creation helper - [`lib/action-vault.sh`](../lib/action-vault.sh) — Vault PR creation helper
- [`vault/vault-env.sh`](../vault/vault-env.sh) — TOML validation - [`action-vault/vault-env.sh`](../action-vault/vault-env.sh) — TOML validation
- [`lib/branch-protection.sh`](../lib/branch-protection.sh) — Branch protection helper - [`lib/branch-protection.sh`](../lib/branch-protection.sh) — Branch protection helper

59
docs/agents-llama.md Normal file
View file

@ -0,0 +1,59 @@
# agents-llama — Local-Qwen Agents
The `agents-llama` service is an optional compose service that runs agents
backed by a local llama-server instance (e.g. Qwen) instead of the Anthropic
API. It uses the same Docker image as the main `agents` service but connects to
a local inference endpoint via `ANTHROPIC_BASE_URL`.
Two profiles are available:
| Profile | Service | Roles | Use case |
|---------|---------|-------|----------|
| _(default)_ | `agents-llama` | `dev` only | Conservative: single-role soak test |
| `agents-llama-all` | `agents-llama-all` | all 7 (review, dev, gardener, architect, planner, predictor, supervisor) | Pre-migration: validate every role on llama before Nomad cutover |
## Enabling
Set `ENABLE_LLAMA_AGENT=1` in `.env` (or `.env.enc`) and provide the required
credentials:
```env
ENABLE_LLAMA_AGENT=1
FORGE_TOKEN_LLAMA=<dev-qwen API token>
FORGE_PASS_LLAMA=<dev-qwen password>
ANTHROPIC_BASE_URL=http://host.docker.internal:8081 # llama-server endpoint
```
Then regenerate the compose file (`disinto init ...`) and bring the stack up.
### Running all 7 roles (agents-llama-all)
```bash
docker compose --profile agents-llama-all up -d
```
This starts the `agents-llama-all` container with all 7 bot roles against the
local llama endpoint. The per-role forge tokens (`FORGE_REVIEW_TOKEN`,
`FORGE_GARDENER_TOKEN`, etc.) must be set in `.env` — they are the same tokens
used by the Claude-backed `agents` container.
## Prerequisites
- **llama-server** (or compatible OpenAI-API endpoint) running on the host,
reachable from inside Docker at the URL set in `ANTHROPIC_BASE_URL`.
- A Forgejo bot user (e.g. `dev-qwen`) with its own API token and password,
stored as `FORGE_TOKEN_LLAMA` / `FORGE_PASS_LLAMA`.
## Behaviour
- `agents-llama`: `AGENT_ROLES=dev` — only picks up dev work.
- `agents-llama-all`: `AGENT_ROLES=review,dev,gardener,architect,planner,predictor,supervisor` — runs all 7 roles.
- `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60` — more aggressive compaction for smaller
context windows.
- Serialises on the llama-server's single KV cache (AD-002).
## Disabling
Set `ENABLE_LLAMA_AGENT=0` (or leave it unset) and regenerate. The service
block is omitted entirely from `docker-compose.yml`; the stack starts cleanly
without it.

59
docs/mirror-bootstrap.md Normal file
View file

@ -0,0 +1,59 @@
# Mirror Bootstrap — Pull-Mirror Cutover Path
How to populate an empty Forgejo repo from an external source using
`lib/mirrors.sh`'s `mirror_pull_register()`.
## Prerequisites
| Variable | Example | Purpose |
|---|---|---|
| `FORGE_URL` | `http://forgejo:3000` | Forgejo instance base URL |
| `FORGE_API_BASE` | `${FORGE_URL}/api/v1` | Global API base (set by `lib/env.sh`) |
| `FORGE_TOKEN` | (admin or org-owner token) | Must have `repo:create` scope |
The target org/user must already exist on the Forgejo instance.
## Command
```bash
source lib/env.sh
source lib/mirrors.sh
# Register a pull mirror — creates the repo and starts the first sync.
mirror_pull_register \
"https://codeberg.org/johba/disinto.git" \ # source URL
"disinto-admin" \ # target owner
"disinto" \ # target repo name
"8h0m0s" # sync interval (optional, default 8h)
```
The function calls `POST /api/v1/repos/migrate` with `mirror: true`.
Forgejo creates the repo and immediately queues the first sync.
## Verifying the sync
```bash
# Check mirror status via API
forge_api GET "/repos/disinto-admin/disinto" | jq '.mirror, .mirror_interval'
# Confirm content arrived — should list branches
forge_api GET "/repos/disinto-admin/disinto/branches" | jq '.[].name'
```
The first sync typically completes within a few seconds for small-to-medium
repos. For large repos, poll the branches endpoint until content appears.
## Cutover scenario (Nomad migration)
At cutover to the Nomad box:
1. Stand up fresh Forgejo on the Nomad cluster (empty instance).
2. Create the `disinto-admin` org via `disinto init` or API.
3. Run `mirror_pull_register` pointing at the Codeberg source.
4. Wait for sync to complete (check branches endpoint).
5. Once content is confirmed, proceed with `disinto init` against the
now-populated repo — all subsequent `mirror_push` calls will push
to any additional mirrors configured in `projects/*.toml`.
No manual `git clone` + `git push` step is needed. The Forgejo pull-mirror
handles the entire transfer.

View file

@ -0,0 +1,172 @@
# formulas/collect-engagement.toml — Collect website engagement data
#
# Daily formula: SSH into Caddy host, fetch access log, parse locally,
# commit evidence JSON to ops repo via Forgejo API.
#
# Triggered by cron in the edge container entrypoint (daily at 23:50 UTC).
# Design choices from #426: Q1=A (fetch raw log, process locally),
# Q2=A (direct cron in edge container), Q3=B (dedicated purpose-limited SSH key).
#
# Steps: fetch-log → parse-engagement → commit-evidence
name = "collect-engagement"
description = "SSH-fetch Caddy access log, parse engagement metrics, commit evidence"
version = 1
[context]
files = ["AGENTS.md"]
[vars.caddy_host]
description = "SSH host for the Caddy server"
required = false
default = "${CADDY_SSH_HOST:-disinto.ai}"
[vars.caddy_user]
description = "SSH user on the Caddy host"
required = false
default = "${CADDY_SSH_USER:-debian}"
[vars.caddy_log_path]
description = "Path to Caddy access log on the remote host"
required = false
default = "${CADDY_ACCESS_LOG:-/var/log/caddy/access.log}"
[vars.local_log_path]
description = "Local path to store fetched access log"
required = false
default = "/tmp/caddy-access-log-fetch.log"
[vars.evidence_dir]
description = "Evidence output directory in the ops repo"
required = false
default = "evidence/engagement"
# ── Step 1: SSH fetch ────────────────────────────────────────────────
[[steps]]
id = "fetch-log"
title = "Fetch Caddy access log from remote host via SSH"
description = """
Fetch today's Caddy access log segment from the remote host using SCP.
The SSH key is read from the environment (CADDY_SSH_KEY), which is
decrypted from secrets/CADDY_SSH_KEY.enc by the edge entrypoint. It is NEVER hardcoded.
1. Write the SSH key to a temporary file with restricted permissions:
_ssh_key_file=$(mktemp)
trap 'rm -f "$_ssh_key_file"' EXIT
printf '%s\n' "$CADDY_SSH_KEY" > "$_ssh_key_file"
chmod 0600 "$_ssh_key_file"
2. Verify connectivity:
ssh -i "$_ssh_key_file" -o StrictHostKeyChecking=accept-new \
-o ConnectTimeout=10 -o BatchMode=yes \
{{caddy_user}}@{{caddy_host}} 'echo ok'
3. Fetch the access log via scp:
scp -i "$_ssh_key_file" -o StrictHostKeyChecking=accept-new \
-o ConnectTimeout=10 -o BatchMode=yes \
"{{caddy_user}}@{{caddy_host}}:{{caddy_log_path}}" \
"{{local_log_path}}"
4. Verify the fetched file is non-empty:
if [ ! -s "{{local_log_path}}" ]; then
echo "WARNING: fetched access log is empty — site may have no traffic"
else
echo "Fetched $(wc -l < "{{local_log_path}}") lines from {{caddy_host}}"
fi
5. Clean up the temporary key file:
rm -f "$_ssh_key_file"
"""
# ── Step 2: Parse engagement ─────────────────────────────────────────
[[steps]]
id = "parse-engagement"
title = "Run collect-engagement.sh against the local log copy"
description = """
Run the engagement parser against the locally fetched access log.
1. Set CADDY_ACCESS_LOG to point at the local copy so collect-engagement.sh
reads from it instead of the default path:
export CADDY_ACCESS_LOG="{{local_log_path}}"
2. Run the parser:
bash "$FACTORY_ROOT/site/collect-engagement.sh"
3. Verify the evidence JSON was written:
REPORT_DATE=$(date -u +%Y-%m-%d)
EVIDENCE_FILE="${OPS_REPO_ROOT}/{{evidence_dir}}/${REPORT_DATE}.json"
if [ -f "$EVIDENCE_FILE" ]; then
echo "Evidence written: $EVIDENCE_FILE"
jq . "$EVIDENCE_FILE"
else
echo "ERROR: evidence file not found at $EVIDENCE_FILE"
exit 1
fi
4. Clean up the fetched log:
rm -f "{{local_log_path}}"
"""
needs = ["fetch-log"]
# ── Step 3: Commit evidence ──────────────────────────────────────────
[[steps]]
id = "commit-evidence"
title = "Commit evidence JSON to ops repo via Forgejo API"
description = """
Commit the dated evidence JSON to the ops repo so the planner can
consume it during gap analysis.
1. Read the evidence file:
REPORT_DATE=$(date -u +%Y-%m-%d)
EVIDENCE_FILE="${OPS_REPO_ROOT}/{{evidence_dir}}/${REPORT_DATE}.json"
CONTENT=$(base64 < "$EVIDENCE_FILE")
2. Check if the file already exists in the ops repo (update vs create):
OPS_OWNER="${OPS_FORGE_OWNER:-${FORGE_REPO%%/*}}"
OPS_REPO="${OPS_FORGE_REPO:-${PROJECT_NAME:-disinto}-ops}"
FILE_PATH="{{evidence_dir}}/${REPORT_DATE}.json"
EXISTING=$(curl -sf \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_URL}/api/v1/repos/${OPS_OWNER}/${OPS_REPO}/contents/${FILE_PATH}" \
2>/dev/null || echo "")
3. Create or update the file via Forgejo API:
if [ -n "$EXISTING" ] && printf '%s' "$EXISTING" | jq -e '.sha' >/dev/null 2>&1; then
# Update existing file
SHA=$(printf '%s' "$EXISTING" | jq -r '.sha')
curl -sf -X PUT \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/repos/${OPS_OWNER}/${OPS_REPO}/contents/${FILE_PATH}" \
-d "$(jq -nc --arg content "$CONTENT" --arg sha "$SHA" --arg msg "evidence: engagement ${REPORT_DATE}" \
'{message: $msg, content: $content, sha: $sha}')"
echo "Updated existing evidence file in ops repo"
else
# Create new file
curl -sf -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_URL}/api/v1/repos/${OPS_OWNER}/${OPS_REPO}/contents/${FILE_PATH}" \
-d "$(jq -nc --arg content "$CONTENT" --arg msg "evidence: engagement ${REPORT_DATE}" \
'{message: $msg, content: $content}')"
echo "Created evidence file in ops repo"
fi
4. Verify the commit landed:
VERIFY=$(curl -sf \
-H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_URL}/api/v1/repos/${OPS_OWNER}/${OPS_REPO}/contents/${FILE_PATH}" \
| jq -r '.name // empty')
if [ "$VERIFY" = "${REPORT_DATE}.json" ]; then
echo "Evidence committed: ${FILE_PATH}"
else
echo "ERROR: could not verify evidence commit"
exit 1
fi
"""
needs = ["parse-engagement"]

View file

@ -0,0 +1,161 @@
# formulas/rent-a-human-caddy-ssh.toml — Provision SSH key for Caddy log collection
#
# "Rent a Human" — walk the operator through provisioning a purpose-limited
# SSH keypair so collect-engagement.sh can fetch Caddy access logs remotely.
#
# The key uses a `command=` restriction so it can ONLY cat the access log.
# No interactive shell, no port forwarding, no agent forwarding.
#
# Parent vision issue: #426
# Sprint: website-observability-wire-up (ops PR #10)
# Consumed by: site/collect-engagement.sh (issue #745)
name = "rent-a-human-caddy-ssh"
description = "Provision a purpose-limited SSH keypair for remote Caddy log collection"
version = 1
# ── Step 1: Generate keypair ─────────────────────────────────────────────────
[[steps]]
id = "generate-keypair"
title = "Generate a dedicated ed25519 keypair"
description = """
Generate a purpose-limited SSH keypair for Caddy log collection.
Run on your local machine (NOT the Caddy host):
```
ssh-keygen -t ed25519 -f caddy-collect -N '' -C 'disinto-collect-engagement'
```
This produces two files:
- caddy-collect (private key goes into the vault)
- caddy-collect.pub (public key goes onto the Caddy host)
Do NOT set a passphrase (-N '') the factory runs unattended.
"""
# ── Step 2: Install public key on Caddy host ─────────────────────────────────
[[steps]]
id = "install-public-key"
title = "Install the public key on the Caddy host with command= restriction"
needs = ["generate-keypair"]
description = """
Install the public key on the Caddy host with a strict command= restriction
so this key can ONLY read the access log.
1. SSH into the Caddy host as the user who owns /var/log/caddy/access.log.
2. Open (or create) ~/.ssh/authorized_keys:
mkdir -p ~/.ssh && chmod 700 ~/.ssh
nano ~/.ssh/authorized_keys
3. Add this line (all on ONE line do not wrap):
command="cat /var/log/caddy/access.log",no-port-forwarding,no-X11-forwarding,no-agent-forwarding ssh-ed25519 AAAA... disinto-collect-engagement
Replace "AAAA..." with the contents of caddy-collect.pub.
To build the line automatically:
echo "command=\"cat /var/log/caddy/access.log\",no-port-forwarding,no-X11-forwarding,no-agent-forwarding $(cat caddy-collect.pub)"
4. Set permissions:
chmod 600 ~/.ssh/authorized_keys
What the restrictions do:
- command="cat /var/log/caddy/access.log"
Forces this key to only execute `cat /var/log/caddy/access.log`,
regardless of what the client requests.
- no-port-forwarding blocks SSH tunnels
- no-X11-forwarding blocks X11
- no-agent-forwarding blocks agent forwarding
If the access log is at a different path, update the command= restriction
AND set CADDY_ACCESS_LOG in the factory environment to match.
"""
# ── Step 3: Add private key to vault secrets ─────────────────────────────────
[[steps]]
id = "store-private-key"
title = "Add the private key as CADDY_SSH_KEY secret"
needs = ["generate-keypair"]
description = """
Store the private key in the factory's encrypted secrets store.
1. Add the private key using `disinto secrets add`:
cat caddy-collect | disinto secrets add CADDY_SSH_KEY
This encrypts the key with age and stores it as secrets/CADDY_SSH_KEY.enc.
2. IMPORTANT: After storing, securely delete the local private key file:
shred -u caddy-collect 2>/dev/null || rm -f caddy-collect
rm -f caddy-collect.pub
The public key is already installed on the Caddy host; the private key
now lives only in secrets/CADDY_SSH_KEY.enc.
Never commit the private key to any git repository.
"""
# ── Step 4: Configure Caddy host address ─────────────────────────────────────
[[steps]]
id = "store-caddy-host"
title = "Add the Caddy host details as secrets"
needs = ["install-public-key"]
description = """
Store the Caddy connection details so collect-engagement.sh knows
where to SSH.
1. Add each value using `disinto secrets add`:
echo 'disinto.ai' | disinto secrets add CADDY_SSH_HOST
echo 'debian' | disinto secrets add CADDY_SSH_USER
echo '/var/log/caddy/access.log' | disinto secrets add CADDY_ACCESS_LOG
Replace values with the actual SSH host, user, and log path for your setup.
"""
# ── Step 5: Test the connection ──────────────────────────────────────────────
[[steps]]
id = "test-connection"
title = "Verify the SSH key works and returns the access log"
needs = ["install-public-key", "store-private-key", "store-caddy-host"]
description = """
Test the end-to-end connection before the factory tries to use it.
1. From the factory host (or anywhere with the private key), run:
ssh -i caddy-collect -o StrictHostKeyChecking=accept-new user@caddy-host
Expected behavior:
- Outputs the contents of /var/log/caddy/access.log
- Disconnects immediately (command= restriction forces this)
If you already shredded the local key, decode it from the vault:
echo "$CADDY_SSH_KEY" | base64 -d > /tmp/caddy-collect-test
chmod 600 /tmp/caddy-collect-test
ssh -i /tmp/caddy-collect-test -o StrictHostKeyChecking=accept-new user@caddy-host
rm -f /tmp/caddy-collect-test
2. Verify the output is Caddy structured JSON (one JSON object per line):
ssh -i /tmp/caddy-collect-test user@caddy-host | head -1 | jq .
You should see fields like: ts, request, status, duration.
3. If the connection fails:
- Permission denied check authorized_keys format (must be one line)
- Connection refused check sshd is running on the Caddy host
- Empty output check /var/log/caddy/access.log exists and is readable
by the SSH user
- "jq: error" Caddy may be using Combined Log Format instead of
structured JSON; check Caddy's log configuration
4. Once verified, the factory's collect-engagement.sh can use this key
to fetch logs remotely via:
ssh -i <decoded-key-path> $CADDY_HOST
"""

View file

@ -213,7 +213,7 @@ should file a vault item instead of executing directly.
**Exceptions** (do NOT flag these): **Exceptions** (do NOT flag these):
- Code inside `vault/` the vault system itself is allowed to handle secrets - Code inside `vault/` the vault system itself is allowed to handle secrets
- References in comments or documentation explaining the architecture - References in comments or documentation explaining the architecture
- `bin/disinto` setup commands that manage `.env.vault.enc` and the `run` subcommand - `bin/disinto` setup commands that manage `secrets/*.enc` and the `run` subcommand
- Local operations (git push to forge, forge API calls with `FORGE_TOKEN`) - Local operations (git push to forge, forge API calls with `FORGE_TOKEN`)
## 6. Re-review (if previous review is provided) ## 6. Re-review (if previous review is provided)

View file

@ -16,7 +16,14 @@
# - Bash creates the ops PR with pitch content # - Bash creates the ops PR with pitch content
# - Bash posts the ACCEPT/REJECT footer comment # - Bash posts the ACCEPT/REJECT footer comment
# Step 3: Sprint PR creation with questions (issue #101) (one PR per pitch) # Step 3: Sprint PR creation with questions (issue #101) (one PR per pitch)
# Step 4: Answer parsing + sub-issue filing (issue #102) # Step 4: Post-merge sub-issue filing via filer-bot (#764)
#
# Permission model (#764):
# architect-bot: READ-ONLY on project repo (GET issues/PRs/labels for context).
# Cannot POST/PUT/PATCH/DELETE any project-repo resource.
# Write access ONLY on ops repo (branches, PRs, comments).
# filer-bot: issues:write on project repo. Files sub-issues from merged sprint
# PRs via ops-filer pipeline. Adds in-progress label to vision issues.
# #
# Architecture: # Architecture:
# - Bash script (architect-run.sh) handles ALL state management # - Bash script (architect-run.sh) handles ALL state management
@ -146,15 +153,32 @@ For each issue in ARCHITECT_TARGET_ISSUES, bash performs:
## Recommendation ## Recommendation
<architect's assessment: worth it / defer / alternative approach> <architect's assessment: worth it / defer / alternative approach>
## Sub-issues
<!-- filer:begin -->
- id: <kebab-case-id>
title: "vision(#N): <concise sub-issue title>"
labels: [backlog]
depends_on: []
body: |
## Goal
<what this sub-issue accomplishes>
## Acceptance criteria
- [ ] <criterion>
<!-- filer:end -->
IMPORTANT: Do NOT include design forks or questions yet. The pitch is a go/no-go IMPORTANT: Do NOT include design forks or questions yet. The pitch is a go/no-go
decision for the human. Questions come only after acceptance. decision for the human. Questions come only after acceptance.
The ## Sub-issues block is parsed by the filer-bot pipeline after sprint PR merge.
Each sub-issue between filer:begin/end markers becomes a Forgejo issue on the
project repo. The filer appends a decomposed-from marker to each body automatically.
4. Bash creates PR: 4. Bash creates PR:
- Create branch: architect/sprint-{pitch-number} - Create branch: architect/sprint-{pitch-number}
- Write sprint spec to sprints/{sprint-slug}.md - Write sprint spec to sprints/{sprint-slug}.md
- Create PR with pitch content as body - Create PR with pitch content as body
- Post footer comment: "Reply ACCEPT to proceed with design questions, or REJECT: <reason> to decline." - Post footer comment: "Reply ACCEPT to proceed with design questions, or REJECT: <reason> to decline."
- Add in-progress label to vision issue - NOTE: in-progress label is added by filer-bot after sprint PR merge (#764)
Output: Output:
- One PR per vision issue (up to 3 per run) - One PR per vision issue (up to 3 per run)
@ -185,6 +209,9 @@ This ensures approved PRs don't sit indefinitely without design conversation.
Architecture: Architecture:
- Bash creates PRs during stateless pitch generation (step 2) - Bash creates PRs during stateless pitch generation (step 2)
- Model has no role in PR creation no Forgejo API access - Model has no role in PR creation no Forgejo API access
- architect-bot is READ-ONLY on the project repo (#764) — all project-repo
writes (sub-issue filing, in-progress label) are handled by filer-bot
via the ops-filer pipeline after sprint PR merge
- This step describes the PR format for reference - This step describes the PR format for reference
PR Format (created by bash): PR Format (created by bash):
@ -201,64 +228,29 @@ PR Format (created by bash):
- Head: architect/sprint-{pitch-number} - Head: architect/sprint-{pitch-number}
- Footer comment: "Reply ACCEPT to proceed with design questions, or REJECT: <reason> to decline." - Footer comment: "Reply ACCEPT to proceed with design questions, or REJECT: <reason> to decline."
4. Add in-progress label to vision issue:
- Look up label ID: GET /repos/{owner}/{repo}/labels
- Add label: POST /repos/{owner}/{repo}/issues/{issue_number}/labels
After creating all PRs, signal PHASE:done. After creating all PRs, signal PHASE:done.
NOTE: in-progress label on the vision issue is added by filer-bot after sprint PR merge (#764).
## Forgejo API Reference ## Forgejo API Reference (ops repo only)
All operations use the Forgejo API with Authorization: token ${FORGE_TOKEN} header. All operations use the ops repo Forgejo API with `Authorization: token ${FORGE_TOKEN}` header.
architect-bot is READ-ONLY on the project repo cannot POST/PUT/PATCH/DELETE project-repo resources (#764).
### Create branch ### Create branch (ops repo)
``` ```
POST /repos/{owner}/{repo}/branches POST /repos/{owner}/{repo-ops}/branches
Body: {"new_branch_name": "architect/<sprint-slug>", "old_branch_name": "main"} Body: {"new_branch_name": "architect/<sprint-slug>", "old_branch_name": "main"}
``` ```
### Create/update file ### Create/update file (ops repo)
``` ```
PUT /repos/{owner}/{repo}/contents/<path> PUT /repos/{owner}/{repo-ops}/contents/<path>
Body: {"message": "sprint: add <sprint-slug>.md", "content": "<base64-encoded-content>", "branch": "architect/<sprint-slug>"} Body: {"message": "sprint: add <sprint-slug>.md", "content": "<base64-encoded-content>", "branch": "architect/<sprint-slug>"}
``` ```
### Create PR ### Create PR (ops repo)
``` ```
POST /repos/{owner}/{repo}/pulls POST /repos/{owner}/{repo-ops}/pulls
Body: {"title": "architect: <sprint summary>", "body": "<markdown-text>", "head": "architect/<sprint-slug>", "base": "main"}
```
**Important: PR body format**
- The body field must contain plain markdown text (the raw content from the model)
- Do NOT JSON-encode or escape the body pass it as a JSON string value
- Newlines and markdown formatting (headings, lists, etc.) must be preserved as-is
### Add label to issue
```
POST /repos/{owner}/{repo}/issues/{index}/labels
Body: {"labels": [<label-id>]}
```
## Forgejo API Reference
All operations use the Forgejo API with `Authorization: token ${FORGE_TOKEN}` header.
### Create branch
```
POST /repos/{owner}/{repo}/branches
Body: {"new_branch_name": "architect/<sprint-slug>", "old_branch_name": "main"}
```
### Create/update file
```
PUT /repos/{owner}/{repo}/contents/<path>
Body: {"message": "sprint: add <sprint-slug>.md", "content": "<base64-encoded-content>", "branch": "architect/<sprint-slug>"}
```
### Create PR
```
POST /repos/{owner}/{repo}/pulls
Body: {"title": "architect: <sprint summary>", "body": "<markdown-text>", "head": "architect/<sprint-slug>", "base": "main"} Body: {"title": "architect: <sprint summary>", "body": "<markdown-text>", "head": "architect/<sprint-slug>", "base": "main"}
``` ```
@ -267,30 +259,22 @@ Body: {"title": "architect: <sprint summary>", "body": "<markdown-text>", "head"
- Do NOT JSON-encode or escape the body pass it as a JSON string value - Do NOT JSON-encode or escape the body pass it as a JSON string value
- Newlines and markdown formatting (headings, lists, etc.) must be preserved as-is - Newlines and markdown formatting (headings, lists, etc.) must be preserved as-is
### Close PR ### Close PR (ops repo)
``` ```
PATCH /repos/{owner}/{repo}/pulls/{index} PATCH /repos/{owner}/{repo-ops}/pulls/{index}
Body: {"state": "closed"} Body: {"state": "closed"}
``` ```
### Delete branch ### Delete branch (ops repo)
``` ```
DELETE /repos/{owner}/{repo}/git/branches/<branch-name> DELETE /repos/{owner}/{repo-ops}/git/branches/<branch-name>
``` ```
### Get labels (look up label IDs by name) ### Read-only on project repo (context gathering)
``` ```
GET /repos/{owner}/{repo}/labels GET /repos/{owner}/{repo}/issues list issues
``` GET /repos/{owner}/{repo}/issues/{number} read issue details
GET /repos/{owner}/{repo}/labels list labels
### Add label to issue (for in-progress on vision issue) GET /repos/{owner}/{repo}/pulls list PRs
```
POST /repos/{owner}/{repo}/issues/{index}/labels
Body: {"labels": [<label-id>]}
```
### Remove label from issue (for in-progress removal on REJECT)
```
DELETE /repos/{owner}/{repo}/issues/{index}/labels/{label-id}
``` ```
""" """

View file

@ -177,7 +177,7 @@ DUST (trivial — single-line edit, rename, comment, style, whitespace):
VAULT (needs human decision or external resource): VAULT (needs human decision or external resource):
File a vault procurement item using vault_request(): File a vault procurement item using vault_request():
source "$(dirname "$0")/../lib/vault.sh" source "$(dirname "$0")/../lib/action-vault.sh"
TOML_CONTENT="# Vault action: <action_id> TOML_CONTENT="# Vault action: <action_id>
context = \"<description of what decision/resource is needed>\" context = \"<description of what decision/resource is needed>\"
unblocks = [\"#NNN\"] unblocks = [\"#NNN\"]

View file

@ -243,7 +243,7 @@ needs = ["preflight"]
[[steps]] [[steps]]
id = "commit-ops-changes" id = "commit-ops-changes"
title = "Write tree, memory, and journal; commit and push" title = "Write tree, memory, and journal; commit and push branch"
description = """ description = """
### 1. Write prerequisite tree ### 1. Write prerequisite tree
Write to: $OPS_REPO_ROOT/prerequisites.md Write to: $OPS_REPO_ROOT/prerequisites.md
@ -256,14 +256,16 @@ If (count - N) >= 5 or planner-memory.md missing, write to:
Include: run counter marker, date, constraint focus, patterns, direction. Include: run counter marker, date, constraint focus, patterns, direction.
Keep under 100 lines. Replace entire file. Keep under 100 lines. Replace entire file.
### 3. Commit ops repo changes ### 3. Commit ops repo changes to the planner branch
Commit the ops repo changes (prerequisites, memory, vault items): Commit the ops repo changes (prerequisites, memory, vault items) and push the
branch. Do NOT push directly to $PRIMARY_BRANCH planner-run.sh will create a
PR and walk it to merge via review-bot.
cd "$OPS_REPO_ROOT" cd "$OPS_REPO_ROOT"
git add prerequisites.md knowledge/planner-memory.md vault/pending/ git add prerequisites.md knowledge/planner-memory.md vault/pending/
git add -u git add -u
if ! git diff --cached --quiet; then if ! git diff --cached --quiet; then
git commit -m "chore: planner run $(date -u +%Y-%m-%d)" git commit -m "chore: planner run $(date -u +%Y-%m-%d)"
git push origin "$PRIMARY_BRANCH" git push origin HEAD
fi fi
cd "$PROJECT_REPO_ROOT" cd "$PROJECT_REPO_ROOT"

View file

@ -125,8 +125,8 @@ For each weakness you identify, choose one:
The prediction explains the theory. The vault PR triggers the proof The prediction explains the theory. The vault PR triggers the proof
after human approval. When the planner runs next, evidence is already there. after human approval. When the planner runs next, evidence is already there.
Vault dispatch (requires lib/vault.sh): Vault dispatch (requires lib/action-vault.sh):
source "$PROJECT_REPO_ROOT/lib/vault.sh" source "$PROJECT_REPO_ROOT/lib/action-vault.sh"
TOML_CONTENT="id = \"predict-<prediction_number>-<formula>\" TOML_CONTENT="id = \"predict-<prediction_number>-<formula>\"
context = \"Test prediction #<prediction_number>: <theory summary> — focus: <specific test>\" context = \"Test prediction #<prediction_number>: <theory summary> — focus: <specific test>\"
@ -154,7 +154,7 @@ tea is pre-configured with login "$TEA_LOGIN" and repo "$FORGE_REPO".
--title "<title>" --body "<body>" --labels "prediction/unreviewed" --title "<title>" --body "<body>" --labels "prediction/unreviewed"
2. Dispatch formula via vault (if exploiting): 2. Dispatch formula via vault (if exploiting):
source "$PROJECT_REPO_ROOT/lib/vault.sh" source "$PROJECT_REPO_ROOT/lib/action-vault.sh"
PR_NUM=$(vault_request "predict-NNN-<formula>" "$TOML_CONTENT") PR_NUM=$(vault_request "predict-NNN-<formula>" "$TOML_CONTENT")
# See EXPLOIT section above for TOML_CONTENT format # See EXPLOIT section above for TOML_CONTENT format

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: c4ca1e930d7be3f95060971ce4fa949dab2f76e7 --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Gardener Agent # Gardener Agent
**Role**: Backlog grooming — detect duplicate issues, missing acceptance **Role**: Backlog grooming — detect duplicate issues, missing acceptance
@ -32,7 +32,7 @@ the gardener runs as part of the polling loop alongside the planner, predictor,
PR, reviewed alongside AGENTS.md changes, executed by gardener-run.sh after merge. PR, reviewed alongside AGENTS.md changes, executed by gardener-run.sh after merge.
**Environment variables consumed**: **Environment variables consumed**:
- `FORGE_TOKEN`, `FORGE_GARDENER_TOKEN` (falls back to FORGE_TOKEN), `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT` - `FORGE_TOKEN`, `FORGE_GARDENER_TOKEN` (falls back to FORGE_TOKEN), `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`. `FORGE_TOKEN_OVERRIDE` is exported to `$FORGE_GARDENER_TOKEN` before sourcing env.sh so the gardener-bot identity survives re-sourcing (#762).
- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by gardener-run.sh) - `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by gardener-run.sh)
**Lifecycle**: gardener-run.sh (invoked by polling loop every 6h, `check_active gardener`) → **Lifecycle**: gardener-run.sh (invoked by polling loop every 6h, `check_active gardener`) →

View file

@ -26,10 +26,11 @@ FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
# Accept project config from argument; default to disinto # Accept project config from argument; default to disinto
export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}" export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}"
# Set override BEFORE sourcing env.sh so it survives any later re-source of
# env.sh from nested shells / claude -p tools (#762, #747)
export FORGE_TOKEN_OVERRIDE="${FORGE_GARDENER_TOKEN:-}"
# shellcheck source=../lib/env.sh # shellcheck source=../lib/env.sh
source "$FACTORY_ROOT/lib/env.sh" source "$FACTORY_ROOT/lib/env.sh"
# Use gardener-bot's own Forgejo identity (#747)
FORGE_TOKEN="${FORGE_GARDENER_TOKEN:-${FORGE_TOKEN}}"
# shellcheck source=../lib/formula-session.sh # shellcheck source=../lib/formula-session.sh
source "$FACTORY_ROOT/lib/formula-session.sh" source "$FACTORY_ROOT/lib/formula-session.sh"
# shellcheck source=../lib/worktree.sh # shellcheck source=../lib/worktree.sh

View file

@ -1,27 +1,7 @@
[ [
{
"action": "remove_label",
"issue": 742,
"label": "blocked"
},
{
"action": "add_label",
"issue": 742,
"label": "backlog"
},
{
"action": "comment",
"issue": 742,
"body": "Dev-agent failed to push on previous attempt (exit: no_push). Root cause is well-specified in the issue body. Re-entering backlog for retry."
},
{ {
"action": "edit_body", "action": "edit_body",
"issue": 712, "issue": 835,
"body": "## Goal\n\nLet `disinto-chat` perform scoped write actions against the factory — specifically: trigger a Woodpecker CI run, create a Forgejo issue, create a Forgejo PR — via explicit backend endpoints. The UI surfaces these as buttons the user clicks from a chat turn that proposes an action. The model never holds API tokens directly.\n\n## Why\n\n- #623 lists these escalations as the difference between \"chat that talks about the project\" and \"chat that moves the project forward\".\n- Routing through explicit backend endpoints (instead of giving the sandboxed claude process API tokens) keeps the trust model tight: the *user* authorises each action, not the model.\n\n## Scope\n\n### Files to touch\n\n- `docker/chat/server.{py,go}` — new authenticated endpoints (reuse #708 / #709 session check):\n - `POST /chat/action/ci-run` — body `{repo, branch}` → calls Woodpecker API with `WOODPECKER_TOKEN` (already in `.env` from existing factory setup) to trigger a pipeline.\n - `POST /chat/action/issue-create` — body `{title, body, labels}` → calls Forgejo API `/repos/<owner>/<repo>/issues` with `FORGE_TOKEN`.\n - `POST /chat/action/pr-create` — body `{head, base, title, body}` → calls `/repos/<owner>/<repo>/pulls`.\n - All actions record to #710's NDJSON history as `{role: \"action\", ...}` lines.\n- `docker/chat/ui/index.html` — small HTMX pattern: when claude's response contains a marker like `<action type=\"issue-create\">{...}</action>`, render a clickable button below the message; clicking POSTs to `/chat/action/<type>` with the payload.\n- `lib/generators.sh` chat env: pass `WOODPECKER_TOKEN`, `FORGE_TOKEN`, `FORGE_URL`, `FORGE_OWNER`, `FORGE_REPO`.\n\n### Out of scope\n\n- Destructive actions (branch delete, force push, secret rotation) — deliberately excluded.\n- Multi-step workflows / approval chains.\n- Arbitrary code execution in the chat container (that is what the agents exist for).\n\n## Acceptance\n\n- [ ] A chat turn that emits an `<action type=\"issue-create\">{...}</action>` block renders a button; clicking it creates an issue on Forgejo, visible via the API.\n- [ ] CI-trigger action creates a Woodpecker pipeline that can be seen in the CI UI.\n- [ ] PR-create action produces a Forgejo PR with the specified head / base.\n- [ ] All three actions are logged into the #710 history file with role `action` and the response from the API call.\n- [ ] Unauthenticated requests to `/chat/action/*` return 401 (inherits #708 gate).\n\n## Depends on\n\n- #708 (OAuth gate — actions are authorised by the logged-in user).\n- #742 (CI smoke test fix — #712 fails CI until agent-smoke.sh lib sourcing is stabilised)\n- #710 (history — actions need to be logged alongside chat turns).\n\n## Notes\n\n- Forgejo API auth: the factory's `FORGE_TOKEN` is a long-lived admin token. For MVP, reuse it; a follow-up issue can scope it down to per-user Forgejo tokens derived from the OAuth flow.\n- Woodpecker API is at `http://woodpecker:8000/api/...`, reachable via the compose network — no need to go through the edge container.\n- The `<action>` marker is deliberately simple markup the model can emit in its response text. Do not implement tool-calling protocol; do not spin up an MCP server.\n\n## Boundaries for dev-agent\n\n- Do not give the claude subprocess direct API tokens. The chat backend holds them; the model only emits action markers the user clicks.\n- Do not add destructive actions (delete, force-push). Additive only.\n- Do not invent a new markup format beyond `<action type=\"...\">{JSON}</action>`.\n- Parent vision: #623." "body": "Bugfix for S0.1 (#821). Discovered during Step 0 end-to-end verification on a fresh LXC.\n\n## Symptom\n\n```\n$ ./bin/disinto init --backend=nomad --empty\nError: --empty is only valid with --backend=nomad\n```\n\nThe error is nonsensical — `--backend=nomad` is right there.\n\n## Root cause\n\n`bin/disinto` → `disinto_init` (around line 710) consumes the first positional arg as `repo_url` **before** the argparse `while` loop runs:\n\n```bash\ndisinto_init() {\n local repo_url=\"${1:-}\"\n if [ -z \"$repo_url\" ]; then\n echo \"Error: repo URL required\" >&2\n ...\n fi\n shift\n # ... then while-loop parses flags ...\n}\n```\n\nSo `disinto init --backend=nomad --empty` becomes:\n- `repo_url = \"--backend=nomad\"` (swallowed)\n- `--empty` seen by loop → `empty=true`\n- `backend` stays at default `\"docker\"`\n- Validation at line 747: `empty=true && backend != \"nomad\"` → error\n\n## Why repo_url is wrong for nomad\n\nFor `--backend=nomad`, the cluster-up flow doesn't clone anything — the LXC already has the repo cloned by the operator. `repo_url` is a docker-backend concept.\n\n## Fix\n\nIn `disinto_init`, move backend detection to **before** the `repo_url` consumption, and make `repo_url` conditional on `backend=docker`:\n\n```bash\ndisinto_init() {\n # Pre-scan for --backend to know whether repo_url is required\n local backend=\"docker\"\n for arg in \"$@\"; do\n case \"$arg\" in\n --backend) ;; # handled below\n --backend=*) backend=\"${arg#--backend=}\" ;;\n esac\n done\n # Also handle space-separated form\n local i=1\n while [ $i -le $# ]; do\n if [ \"${!i}\" = \"--backend\" ]; then\n i=$((i+1))\n backend=\"${!i}\"\n fi\n i=$((i+1))\n done\n\n local repo_url=\"\"\n if [ \"$backend\" = \"docker\" ]; then\n repo_url=\"${1:-}\"\n if [ -z \"$repo_url\" ] || [[ \"$repo_url\" == --* ]]; then\n echo \"Error: repo URL required for docker backend\" >&2\n echo \"Usage: disinto init <repo-url> [options]\" >&2\n exit 1\n fi\n shift\n fi\n # ... rest of argparse unchanged, it re-reads --backend cleanly\n```\n\nSimpler alternative: if first arg starts with `--`, assume no positional and skip repo_url consumption entirely (covers nomad + any future `--help`-style invocation).\n\nEither shape is fine; pick the cleaner one.\n\n## Acceptance criteria\n\n- [ ] `./bin/disinto init --backend=nomad --empty` runs `lib/init/nomad/cluster-up.sh` without error on a clean LXC.\n- [ ] `./bin/disinto init --backend=nomad --empty --dry-run` prints the 9-step plan and exits 0.\n- [ ] `./bin/disinto init <repo-url>` (docker path) behaves identically to today — existing smoke path passes.\n- [ ] `./bin/disinto init` (no args, docker implied) still errors with the \"repo URL required\" message.\n- [ ] `./bin/disinto init --backend=docker` (no repo) errors helpfully — not \"Unknown option: --backend=docker\".\n- [ ] shellcheck clean.\n\n## Verified regression case from Step 0 testing\n\nOn a fresh Ubuntu 24.04 LXC, after `./lib/init/nomad/cluster-up.sh` was invoked directly (workaround), the cluster came up healthy end-to-end:\n\n- Nomad node status: 1 node ready\n- Vault status: Sealed=false, Initialized=true\n- Re-run of cluster-up.sh was fully idempotent\n\nSo the bug is isolated to `bin/disinto` argparse; the rest of the Step 0 code path is solid. This fix unblocks the formal Step 0 acceptance test.\n\n## Labels / meta\n\n- `[nomad-step-0] S0.1-fix` — no dependencies; gates Step 1.\n\n## Affected files\n\n- `bin/disinto` — `disinto_init()` function, around line 710: pre-scan for `--backend` before consuming `repo_url` positional argument\n"
},
{
"action": "edit_body",
"issue": 707,
"body": "## Goal\n\nGive `disinto-chat` its own Claude identity mount so its OAuth refresh races cannot corrupt the factory agents' shared `~/.claude` credentials. Default to a separate `~/.claude-chat/` on the host; support `ANTHROPIC_API_KEY` as a fallback that skips OAuth entirely.\n\n## Why\n\n- #623 root-caused this: Claude Code's internal refresh lock in `~/.claude.lock` operates outside bind-mounted directories, so two containers sharing `~/.claude` can race during token refresh and invalidate each other. The factory has already had OAuth expiry incidents traced to multiple agents sharing credentials.\n- Scoping chat to its own identity dir means chat can be logged in as a different Anthropic account, or pinned to an API key, without touching agent credentials.\n\n## Scope\n\n### Files to touch\n\n- `lib/generators.sh` chat service block (from #705):\n - Replace the throwaway named volume with `${CHAT_CLAUDE_DIR:-${HOME}/.claude-chat}:/home/chat/.claude-chat`.\n - Env: `CLAUDE_CONFIG_DIR=/home/chat/.claude-chat/config`, `CLAUDE_CREDENTIALS_DIR=/home/chat/.claude-chat/config/credentials`.\n - Conditional: if `ANTHROPIC_API_KEY` is set in `.env`, pass it through and **do not** mount `~/.claude-chat` at all (no credentials on disk in that mode).\n- `bin/disinto disinto_init()` — after #620's admin password prompt, add an optional prompt: `Use separate Anthropic identity for chat? (y/N)`. On yes, create `~/.claude-chat/` and invoke `claude login` in a subshell with `CLAUDE_CONFIG_DIR=~/.claude-chat/config`.\n- `lib/claude-config.sh` — factor out the existing `~/.claude` setup logic so a non-default `CLAUDE_CONFIG_DIR` is a first-class parameter. If it is already parameterised, just document it; if not, extract a helper `setup_claude_dir <dir>` and have the existing path call it with the default dir.\n- `docker/chat/Dockerfile` — declare `VOLUME /home/chat/.claude-chat`, set owner to the non-root chat user introduced in #706.\n\n### Out of scope\n\n- Cross-session lock coherence for multiple concurrent chat containers (single-chat-container assumption is fine for MVP).\n- Anthropic team / workspace support — single identity is enough.\n\n## Acceptance\n\n- [ ] Fresh `disinto init` with \"use separate chat identity\" answered yes creates `~/.claude-chat/` and logs in successfully.\n- [ ] With `ANTHROPIC_API_KEY=sk-ant-...` set in `.env`, chat starts without any `~/.claude-chat` mount (verified via `docker inspect disinto-chat`) and successfully completes a test prompt.\n- [ ] Running the factory agents AND chat simultaneously for 24h does not produce any OAuth refresh failures on either side (manual soak test — document result in PR).\n- [ ] `CLAUDE_CONFIG_DIR` and `CLAUDE_CREDENTIALS_DIR` inside the chat container resolve to `/home/chat/.claude-chat/config*`, not the shared factory path.\n\n## Depends on\n\n- #705 (chat scaffold).\n- #742 (CI smoke test fix — #707 fails CI until agent-smoke.sh lib sourcing is stabilised)\n- #620 (admin password prompt — same init flow this adds a step to).\n\n## Notes\n\n- The factory's existing shared mount is `/var/lib/disinto/claude-shared` (see `lib/generators.sh:113,327,381,426`). Chat must NOT use this path.\n- `flock(\"${HOME}/.claude/session.lock\")` logic mentioned in #623 is load-bearing, not redundant — do not \"simplify\" it.\n- Prefer the API-key path for anyone running the factory on shared hardware; call this out in README updates.\n\n## Boundaries for dev-agent\n\n- Do not try to make chat share `~/.claude` with the agents \"just for convenience\". The whole point of this chunk is the opposite.\n- Do not add a third claude config dir. One for agents, one for chat, done.\n- Do not refactor `lib/claude-config.sh` beyond extracting a parameterised helper if needed.\n- Parent vision: #623."
} }
] ]

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: 4e53f508d9b36c60bd68ed5fc497fc8775fec79f --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Shared Helpers (`lib/`) # Shared Helpers (`lib/`)
All agents source `lib/env.sh` as their first action. Additional helpers are All agents source `lib/env.sh` as their first action. Additional helpers are
@ -6,7 +6,7 @@ sourced as needed.
| File | What it provides | Sourced by | | File | What it provides | Sourced by |
|---|---|---| |---|---|---|
| `lib/env.sh` | Loads `.env`, sets `FACTORY_ROOT`, exports project config (`FORGE_REPO`, `PROJECT_NAME`, etc.), defines `log()`, `forge_api()`, `forge_api_all()` (paginates all pages; accepts optional second TOKEN parameter, defaults to `$FORGE_TOKEN`; handles invalid/empty JSON responses gracefully — returns empty on parse error instead of crashing), `woodpecker_api()`, `wpdb()`, `memory_guard()` (skips agent if RAM < threshold). Auto-loads project TOML if `PROJECT_TOML` is set. Exports per-agent tokens (`FORGE_PLANNER_TOKEN`, `FORGE_GARDENER_TOKEN`, `FORGE_VAULT_TOKEN`, `FORGE_SUPERVISOR_TOKEN`, `FORGE_PREDICTOR_TOKEN`) each falls back to `$FORGE_TOKEN` if not set. **Vault-only token guard (AD-006)**: `unset GITHUB_TOKEN CLAWHUB_TOKEN` so agents never hold external-action tokens only the runner container receives them. **Container note**: when `DISINTO_CONTAINER=1`, `.env` is NOT re-sourced compose already injects env vars (including `FORGE_URL=http://forgejo:3000`) and re-sourcing would clobber them. **Save/restore scope (#364)**: only `FORGE_URL` is preserved across `.env` re-sourcing (compose injects `http://forgejo:3000`, `.env` has `http://localhost:3000`). `FORGE_TOKEN` is NOT preserved so refreshed tokens in `.env` take effect immediately. **Required env var**: `FORGE_PASS` bot password for git HTTP push (Forgejo 11.x rejects API tokens for `git push`, #361). **Hard preconditions (#674)**: `USER` and `HOME` must be exported by the entrypoint before sourcing. When `PROJECT_TOML` is set, `PROJECT_REPO_ROOT`, `PRIMARY_BRANCH`, and `OPS_REPO_ROOT` must also be set (by entrypoint or TOML). | Every agent | | `lib/env.sh` | Loads `.env`, sets `FACTORY_ROOT`, exports project config (`FORGE_REPO`, `PROJECT_NAME`, etc.), defines `log()`, `forge_api()`, `forge_api_all()` (paginates all pages; accepts optional second TOKEN parameter, defaults to `$FORGE_TOKEN`; handles invalid/empty JSON responses gracefully — returns empty on parse error instead of crashing), `woodpecker_api()`, `wpdb()`, `memory_guard()` (skips agent if RAM < threshold), `load_secret()` (secret-source abstraction see below). Auto-loads project TOML if `PROJECT_TOML` is set. Exports per-agent tokens (`FORGE_PLANNER_TOKEN`, `FORGE_GARDENER_TOKEN`, `FORGE_VAULT_TOKEN`, `FORGE_SUPERVISOR_TOKEN`, `FORGE_PREDICTOR_TOKEN`) each falls back to `$FORGE_TOKEN` if not set. **Vault-only token guard (AD-006)**: `unset GITHUB_TOKEN CLAWHUB_TOKEN` so agents never hold external-action tokens only the runner container receives them. **Container note**: when `DISINTO_CONTAINER=1`, `.env` is NOT re-sourced compose already injects env vars (including `FORGE_URL=http://forgejo:3000`) and re-sourcing would clobber them. **Save/restore scope (#364)**: only `FORGE_URL` is preserved across `.env` re-sourcing (compose injects `http://forgejo:3000`, `.env` has `http://localhost:3000`). `FORGE_TOKEN` is NOT preserved so refreshed tokens in `.env` take effect immediately. **Per-agent token override (#762)**: agent run scripts export `FORGE_TOKEN_OVERRIDE=<agent-specific-token>` BEFORE sourcing `env.sh`; `env.sh` applies this override at lines 98-100, ensuring the correct identity survives any re-sourcing of `env.sh` by nested shells or `claude -p` invocations. **Required env var**: `FORGE_PASS` bot password for git HTTP push (Forgejo 11.x rejects API tokens for `git push`, #361). **Hard preconditions (#674)**: `USER` and `HOME` must be exported by the entrypoint before sourcing. When `PROJECT_TOML` is set, `PROJECT_REPO_ROOT`, `PRIMARY_BRANCH`, and `OPS_REPO_ROOT` must also be set (by entrypoint or TOML). **`load_secret NAME [DEFAULT]` (#793)**: backend-agnostic secret resolution. Precedence: (1) `/secrets/<NAME>.env` Nomad-rendered template, (2) current environment already set by `.env.enc` / compose, (3) `secrets/<NAME>.enc` age-encrypted per-key file (decrypted on demand, cached in process env), (4) DEFAULT or empty. Consumers call `$(load_secret GITHUB_TOKEN)` instead of `${GITHUB_TOKEN}` identical behavior whether secrets come from Docker compose injection or Nomad Vault templates. | Every agent |
| `lib/ci-helpers.sh` | `ci_passed()` — returns 0 if CI state is "success" (or no CI configured). `ci_required_for_pr()` — returns 0 if PR has code files (CI required), 1 if non-code only (CI not required). `is_infra_step()` — returns 0 if a single CI step failure matches infra heuristics (clone/git exit 128, any exit 137, log timeout patterns). `classify_pipeline_failure()` — returns "infra \<reason>" if any failed Woodpecker step matches infra heuristics via `is_infra_step()`, else "code". `ensure_priority_label()` — looks up (or creates) the `priority` label and returns its ID; caches in `_PRIORITY_LABEL_ID`. `ci_commit_status <sha>` — queries Woodpecker directly for CI state, falls back to forge commit status API. `ci_pipeline_number <sha>` — returns the Woodpecker pipeline number for a commit, falls back to parsing forge status `target_url`. `ci_promote <repo_id> <pipeline_num> <environment>` — promotes a pipeline to a named Woodpecker environment (vault-gated deployment: vault approves, vault-fire calls this — vault redesign in progress, see #73-#77). `ci_get_logs <pipeline_number> [--step <name>]` — reads CI logs from Woodpecker SQLite database via `lib/ci-log-reader.py`; outputs last 200 lines to stdout. Requires mounted woodpecker-data volume at /woodpecker-data. | dev-poll, review-poll, review-pr | | `lib/ci-helpers.sh` | `ci_passed()` — returns 0 if CI state is "success" (or no CI configured). `ci_required_for_pr()` — returns 0 if PR has code files (CI required), 1 if non-code only (CI not required). `is_infra_step()` — returns 0 if a single CI step failure matches infra heuristics (clone/git exit 128, any exit 137, log timeout patterns). `classify_pipeline_failure()` — returns "infra \<reason>" if any failed Woodpecker step matches infra heuristics via `is_infra_step()`, else "code". `ensure_priority_label()` — looks up (or creates) the `priority` label and returns its ID; caches in `_PRIORITY_LABEL_ID`. `ci_commit_status <sha>` — queries Woodpecker directly for CI state, falls back to forge commit status API. `ci_pipeline_number <sha>` — returns the Woodpecker pipeline number for a commit, falls back to parsing forge status `target_url`. `ci_promote <repo_id> <pipeline_num> <environment>` — promotes a pipeline to a named Woodpecker environment (vault-gated deployment: vault approves, vault-fire calls this — vault redesign in progress, see #73-#77). `ci_get_logs <pipeline_number> [--step <name>]` — reads CI logs from Woodpecker SQLite database via `lib/ci-log-reader.py`; outputs last 200 lines to stdout. Requires mounted woodpecker-data volume at /woodpecker-data. | dev-poll, review-poll, review-pr |
| `lib/ci-debug.sh` | CLI tool for Woodpecker CI: `list`, `status`, `logs`, `failures` subcommands. Not sourced — run directly. | Humans / dev-agent (tool access) | | `lib/ci-debug.sh` | CLI tool for Woodpecker CI: `list`, `status`, `logs`, `failures` subcommands. Not sourced — run directly. | Humans / dev-agent (tool access) |
| `lib/ci-log-reader.py` | Python tool: reads CI logs from Woodpecker SQLite database. `<pipeline_number> [--step <name>]` — returns last 200 lines from failed steps (or specified step). Used by `ci_get_logs()` in ci-helpers.sh. Requires `WOODPECKER_DATA_DIR` (default: /woodpecker-data). | ci-helpers.sh | | `lib/ci-log-reader.py` | Python tool: reads CI logs from Woodpecker SQLite database. `<pipeline_number> [--step <name>]` — returns last 200 lines from failed steps (or specified step). Used by `ci_get_logs()` in ci-helpers.sh. Requires `WOODPECKER_DATA_DIR` (default: /woodpecker-data). | ci-helpers.sh |
@ -14,7 +14,7 @@ sourced as needed.
| `lib/parse-deps.sh` | Extracts dependency issue numbers from an issue body (stdin → stdout, one number per line). Matches `## Dependencies` / `## Depends on` / `## Blocked by` sections and inline `depends on #N` / `blocked by #N` patterns. Inline scan skips fenced code blocks to prevent false positives from code examples in issue bodies. Not sourced — executed via `bash lib/parse-deps.sh`. | dev-poll | | `lib/parse-deps.sh` | Extracts dependency issue numbers from an issue body (stdin → stdout, one number per line). Matches `## Dependencies` / `## Depends on` / `## Blocked by` sections and inline `depends on #N` / `blocked by #N` patterns. Inline scan skips fenced code blocks to prevent false positives from code examples in issue bodies. Not sourced — executed via `bash lib/parse-deps.sh`. | dev-poll |
| `lib/formula-session.sh` | `acquire_run_lock()`, `load_formula()`, `load_formula_or_profile()`, `build_context_block()`, `ensure_ops_repo()`, `ops_commit_and_push()`, `build_prompt_footer()`, `build_sdk_prompt_footer()`, `formula_worktree_setup()`, `formula_prepare_profile_context()`, `formula_lessons_block()`, `profile_write_journal()`, `profile_load_lessons()`, `ensure_profile_repo()`, `_profile_has_repo()`, `_count_undigested_journals()`, `_profile_digest_journals()`, `_profile_restore_lessons()`, `_profile_commit_and_push()`, `resolve_agent_identity()`, `build_graph_section()`, `build_scratch_instruction()`, `read_scratch_context()`, `cleanup_stale_crashed_worktrees()` — shared helpers for formula-driven polling-loop agents (lock, .profile repo management, prompt assembly, worktree setup). Memory guard is provided by `memory_guard()` in `lib/env.sh` (not duplicated here). `resolve_agent_identity()` — sets `FORGE_TOKEN`, `AGENT_IDENTITY`, `FORGE_REMOTE` from per-agent token env vars and FORGE_URL remote detection. `build_graph_section()` generates the structural-analysis section (runs `lib/build-graph.py`, formats JSON output) — previously duplicated in planner-run.sh and predictor-run.sh, now shared here. `cleanup_stale_crashed_worktrees()` — thin wrapper around `worktree_cleanup_stale()` from `lib/worktree.sh` (kept for backwards compatibility). **Journal digestion guards (#702)**: `_profile_digest_journals()` respects `PROFILE_DIGEST_TIMEOUT` (default 300s) and `PROFILE_DIGEST_MAX_BATCH` (default 5 journals per run); `_profile_restore_lessons()` restores the previous lessons-learned.md on digest failure. | planner-run.sh, predictor-run.sh, gardener-run.sh, supervisor-run.sh, dev-agent.sh | | `lib/formula-session.sh` | `acquire_run_lock()`, `load_formula()`, `load_formula_or_profile()`, `build_context_block()`, `ensure_ops_repo()`, `ops_commit_and_push()`, `build_prompt_footer()`, `build_sdk_prompt_footer()`, `formula_worktree_setup()`, `formula_prepare_profile_context()`, `formula_lessons_block()`, `profile_write_journal()`, `profile_load_lessons()`, `ensure_profile_repo()`, `_profile_has_repo()`, `_count_undigested_journals()`, `_profile_digest_journals()`, `_profile_restore_lessons()`, `_profile_commit_and_push()`, `resolve_agent_identity()`, `build_graph_section()`, `build_scratch_instruction()`, `read_scratch_context()`, `cleanup_stale_crashed_worktrees()` — shared helpers for formula-driven polling-loop agents (lock, .profile repo management, prompt assembly, worktree setup). Memory guard is provided by `memory_guard()` in `lib/env.sh` (not duplicated here). `resolve_agent_identity()` — sets `FORGE_TOKEN`, `AGENT_IDENTITY`, `FORGE_REMOTE` from per-agent token env vars and FORGE_URL remote detection. `build_graph_section()` generates the structural-analysis section (runs `lib/build-graph.py`, formats JSON output) — previously duplicated in planner-run.sh and predictor-run.sh, now shared here. `cleanup_stale_crashed_worktrees()` — thin wrapper around `worktree_cleanup_stale()` from `lib/worktree.sh` (kept for backwards compatibility). **Journal digestion guards (#702)**: `_profile_digest_journals()` respects `PROFILE_DIGEST_TIMEOUT` (default 300s) and `PROFILE_DIGEST_MAX_BATCH` (default 5 journals per run); `_profile_restore_lessons()` restores the previous lessons-learned.md on digest failure. | planner-run.sh, predictor-run.sh, gardener-run.sh, supervisor-run.sh, dev-agent.sh |
| `lib/guard.sh` | `check_active(agent_name)` — reads `$FACTORY_ROOT/state/.{agent_name}-active`; exits 0 (skip) if the file is absent. Factory is off by default — state files must be created to enable each agent. **Logs a message to stderr** when skipping (`[check_active] SKIP: state file not found`), so agent dropout is visible in loop logs. Sourced by dev-poll.sh, review-poll.sh, predictor-run.sh, supervisor-run.sh. | polling-loop entry points | | `lib/guard.sh` | `check_active(agent_name)` — reads `$FACTORY_ROOT/state/.{agent_name}-active`; exits 0 (skip) if the file is absent. Factory is off by default — state files must be created to enable each agent. **Logs a message to stderr** when skipping (`[check_active] SKIP: state file not found`), so agent dropout is visible in loop logs. Sourced by dev-poll.sh, review-poll.sh, predictor-run.sh, supervisor-run.sh. | polling-loop entry points |
| `lib/mirrors.sh` | `mirror_push()` — pushes `$PRIMARY_BRANCH` + tags to all configured mirror remotes (fire-and-forget background pushes). Reads `MIRROR_NAMES` and `MIRROR_*` vars exported by `load-project.sh` from the `[mirrors]` TOML section. Failures are logged but never block the pipeline. Sourced by dev-poll.sh — called after every successful merge. | dev-poll.sh | | `lib/mirrors.sh` | `mirror_push()` — pushes `$PRIMARY_BRANCH` + tags to all configured mirror remotes (fire-and-forget background pushes). Reads `MIRROR_NAMES` and `MIRROR_*` vars exported by `load-project.sh` from the `[mirrors]` TOML section. Failures are logged but never block the pipeline. `mirror_pull_register(clone_url, owner, repo_name, [interval])` — registers a Forgejo pull mirror via `POST /repos/migrate` with `mirror: true`. Creates the target repo and queues the first sync automatically. Works against empty Forgejo instances — no pre-existing content required. Used for Nomad migration cutover: point at Codeberg source, wait for sync, then proceed with `disinto init`. See [docs/mirror-bootstrap.md](../docs/mirror-bootstrap.md) for the full cutover path. Sourced by dev-poll.sh — called after every successful merge. | dev-poll.sh |
| `lib/build-graph.py` | Python tool: parses VISION.md, prerequisites.md (from ops repo), AGENTS.md, formulas/*.toml, evidence/ (from ops repo), and forge issues/labels into a NetworkX DiGraph. Runs structural analyses (orphaned objectives, stale prerequisites, thin evidence, circular deps) and outputs a JSON report. Used by `review-pr.sh` (per-PR changed-file analysis) and `predictor-run.sh` (full-project analysis) to provide structural context to Claude. | review-pr.sh, predictor-run.sh | | `lib/build-graph.py` | Python tool: parses VISION.md, prerequisites.md (from ops repo), AGENTS.md, formulas/*.toml, evidence/ (from ops repo), and forge issues/labels into a NetworkX DiGraph. Runs structural analyses (orphaned objectives, stale prerequisites, thin evidence, circular deps) and outputs a JSON report. Used by `review-pr.sh` (per-PR changed-file analysis) and `predictor-run.sh` (full-project analysis) to provide structural context to Claude. | review-pr.sh, predictor-run.sh |
| `lib/secret-scan.sh` | `scan_for_secrets()` — detects potential secrets (API keys, bearer tokens, private keys, URLs with embedded credentials) in text; returns 1 if secrets found. `redact_secrets()` — replaces detected secret patterns with `[REDACTED]`. | issue-lifecycle.sh | | `lib/secret-scan.sh` | `scan_for_secrets()` — detects potential secrets (API keys, bearer tokens, private keys, URLs with embedded credentials) in text; returns 1 if secrets found. `redact_secrets()` — replaces detected secret patterns with `[REDACTED]`. | issue-lifecycle.sh |
| `lib/stack-lock.sh` | File-based lock protocol for singleton project stack access. `stack_lock_acquire(holder, project)` — polls until free, breaks stale heartbeats (>10 min old), claims lock. `stack_lock_release(project)` — deletes lock file. `stack_lock_check(project)` — inspect current lock state. `stack_lock_heartbeat(project)` — update heartbeat timestamp (callers must call every 2 min while holding). Lock files at `~/data/locks/<project>-stack.lock`. | docker/edge/dispatcher.sh, reproduce formula | | `lib/stack-lock.sh` | File-based lock protocol for singleton project stack access. `stack_lock_acquire(holder, project)` — polls until free, breaks stale heartbeats (>10 min old), claims lock. `stack_lock_release(project)` — deletes lock file. `stack_lock_check(project)` — inspect current lock state. `stack_lock_heartbeat(project)` — update heartbeat timestamp (callers must call every 2 min while holding). Lock files at `~/data/locks/<project>-stack.lock`. | docker/edge/dispatcher.sh, reproduce formula |
@ -22,7 +22,7 @@ sourced as needed.
| `lib/worktree.sh` | Reusable git worktree management: `worktree_create(path, branch, [base_ref])` — create worktree, checkout base, fetch submodules. `worktree_recover(path, branch, [remote])` — detect existing worktree, reuse if on correct branch (sets `_WORKTREE_REUSED`), otherwise clean and recreate. `worktree_cleanup(path)``git worktree remove --force`, clear Claude Code project cache (`~/.claude/projects/` matching path). `worktree_cleanup_stale([max_age_hours])` — scan `/tmp` for orphaned worktrees older than threshold, skip preserved and active tmux worktrees, prune. `worktree_preserve(path, reason)` — mark worktree as preserved for debugging (writes `.worktree-preserved` marker, skipped by stale cleanup). | dev-agent.sh, supervisor-run.sh, planner-run.sh, predictor-run.sh, gardener-run.sh | | `lib/worktree.sh` | Reusable git worktree management: `worktree_create(path, branch, [base_ref])` — create worktree, checkout base, fetch submodules. `worktree_recover(path, branch, [remote])` — detect existing worktree, reuse if on correct branch (sets `_WORKTREE_REUSED`), otherwise clean and recreate. `worktree_cleanup(path)``git worktree remove --force`, clear Claude Code project cache (`~/.claude/projects/` matching path). `worktree_cleanup_stale([max_age_hours])` — scan `/tmp` for orphaned worktrees older than threshold, skip preserved and active tmux worktrees, prune. `worktree_preserve(path, reason)` — mark worktree as preserved for debugging (writes `.worktree-preserved` marker, skipped by stale cleanup). | dev-agent.sh, supervisor-run.sh, planner-run.sh, predictor-run.sh, gardener-run.sh |
| `lib/pr-lifecycle.sh` | Reusable PR lifecycle library: `pr_create()`, `pr_find_by_branch()`, `pr_poll_ci()`, `pr_poll_review()`, `pr_merge()`, `pr_is_merged()`, `pr_walk_to_merge()`, `build_phase_protocol_prompt()`. Requires `lib/ci-helpers.sh`. | dev-agent.sh (future) | | `lib/pr-lifecycle.sh` | Reusable PR lifecycle library: `pr_create()`, `pr_find_by_branch()`, `pr_poll_ci()`, `pr_poll_review()`, `pr_merge()`, `pr_is_merged()`, `pr_walk_to_merge()`, `build_phase_protocol_prompt()`. Requires `lib/ci-helpers.sh`. | dev-agent.sh (future) |
| `lib/issue-lifecycle.sh` | Reusable issue lifecycle library: `issue_claim()` (add in-progress, remove backlog), `issue_release()` (remove in-progress, add backlog), `issue_block()` (post diagnostic comment with secret redaction, add blocked label), `issue_close()`, `issue_check_deps()` (parse deps, check transitive closure; sets `_ISSUE_BLOCKED_BY`, `_ISSUE_SUGGESTION`), `issue_suggest_next()` (find next unblocked backlog issue; sets `_ISSUE_NEXT`), `issue_post_refusal()` (structured refusal comment with dedup). Label IDs cached in globals on first lookup. Sources `lib/secret-scan.sh`. | dev-agent.sh (future) | | `lib/issue-lifecycle.sh` | Reusable issue lifecycle library: `issue_claim()` (add in-progress, remove backlog), `issue_release()` (remove in-progress, add backlog), `issue_block()` (post diagnostic comment with secret redaction, add blocked label), `issue_close()`, `issue_check_deps()` (parse deps, check transitive closure; sets `_ISSUE_BLOCKED_BY`, `_ISSUE_SUGGESTION`), `issue_suggest_next()` (find next unblocked backlog issue; sets `_ISSUE_NEXT`), `issue_post_refusal()` (structured refusal comment with dedup). Label IDs cached in globals on first lookup. Sources `lib/secret-scan.sh`. | dev-agent.sh (future) |
| `lib/vault.sh` | **Vault PR helper** — create vault action PRs on ops repo via Forgejo API (works from containers without SSH). `vault_request <action_id> <toml_content>` validates TOML (using `validate_vault_action` from `vault/vault-env.sh`), creates branch `vault/<action-id>`, writes `vault/actions/<action-id>.toml`, creates PR targeting `main` with title `vault: <action-id>` and body from context field, returns PR number. Idempotent: if PR exists, returns existing number. **Low-tier bypass**: if the action's `blast_radius` classifies as `low` (via `vault/classify.sh`), `vault_request` calls `_vault_commit_direct()` which commits directly to ops `main` using `FORGE_ADMIN_TOKEN` — no PR, no approval wait. Returns `0` (not a PR number) for direct commits. Requires `FORGE_TOKEN`, `FORGE_ADMIN_TOKEN` (low-tier only), `FORGE_URL`, `FORGE_REPO`, `FORGE_OPS_REPO`. Uses the calling agent's own token (saves/restores `FORGE_TOKEN` around sourcing `vault-env.sh`), so approval workflow respects individual agent identities. | dev-agent (vault actions), future vault dispatcher | | `lib/action-vault.sh` | **Vault PR helper** — create vault action PRs on ops repo via Forgejo API (works from containers without SSH). `vault_request <action_id> <toml_content>` validates TOML (using `validate_vault_action` from `action-vault/vault-env.sh`), creates branch `vault/<action-id>`, writes `vault/actions/<action-id>.toml`, creates PR targeting `main` with title `vault: <action-id>` and body from context field, returns PR number. Idempotent: if PR exists, returns existing number. **Low-tier bypass**: if the action's `blast_radius` classifies as `low` (via `action-vault/classify.sh`), `vault_request` calls `_vault_commit_direct()` which commits directly to ops `main` using `FORGE_ADMIN_TOKEN` — no PR, no approval wait. Returns `0` (not a PR number) for direct commits. Requires `FORGE_TOKEN`, `FORGE_ADMIN_TOKEN` (low-tier only), `FORGE_URL`, `FORGE_REPO`, `FORGE_OPS_REPO`. Uses the calling agent's own token (saves/restores `FORGE_TOKEN` around sourcing `vault-env.sh`), so approval workflow respects individual agent identities. | dev-agent (vault actions), future vault dispatcher |
| `lib/branch-protection.sh` | Branch protection helpers for Forgejo repos. `setup_vault_branch_protection()` — configures admin-only merge protection on main (require 1 approval, restrict merge to admin role, block direct pushes). `setup_profile_branch_protection()` — same protection for `.profile` repos. `verify_branch_protection()` — checks protection is correctly configured. `remove_branch_protection()` — removes protection (cleanup/testing). Handles race condition after initial push: retries with backoff if Forgejo hasn't processed the branch yet. Requires `FORGE_TOKEN`, `FORGE_URL`, `FORGE_OPS_REPO`. | bin/disinto (hire-an-agent) | | `lib/branch-protection.sh` | Branch protection helpers for Forgejo repos. `setup_vault_branch_protection()` — configures admin-only merge protection on main (require 1 approval, restrict merge to admin role, block direct pushes). `setup_profile_branch_protection()` — same protection for `.profile` repos. `verify_branch_protection()` — checks protection is correctly configured. `remove_branch_protection()` — removes protection (cleanup/testing). Handles race condition after initial push: retries with backoff if Forgejo hasn't processed the branch yet. Requires `FORGE_TOKEN`, `FORGE_URL`, `FORGE_OPS_REPO`. | bin/disinto (hire-an-agent) |
| `lib/agent-sdk.sh` | `agent_run([--resume SESSION_ID] [--worktree DIR] PROMPT)` — one-shot `claude -p` invocation with session persistence. Saves session ID to `SID_FILE`, reads it back on resume. `agent_recover_session()` — restore previous session ID from `SID_FILE` on startup. **Nudge guard**: skips nudge injection if the worktree is clean and no push is expected, preventing spurious re-invocations. Callers must define `SID_FILE`, `LOGFILE`, and `log()` before sourcing. **Concurrency**: external `flock` on `session.lock` is gated behind `CLAUDE_EXTERNAL_LOCK=1` (default off). When unset, each container's per-session `CLAUDE_CONFIG_DIR` isolation lets Claude Code's native lockfile handle OAuth refresh — no external serialization needed. Set `CLAUDE_EXTERNAL_LOCK=1` to re-enable the old flock wrapper as a rollback mechanism. See [`docs/CLAUDE-AUTH-CONCURRENCY.md`](../docs/CLAUDE-AUTH-CONCURRENCY.md) and AD-002 (#647). | formula-driven agents (dev-agent, planner-run, predictor-run, gardener-run) | | `lib/agent-sdk.sh` | `agent_run([--resume SESSION_ID] [--worktree DIR] PROMPT)` — one-shot `claude -p` invocation with session persistence. Saves session ID to `SID_FILE`, reads it back on resume. `agent_recover_session()` — restore previous session ID from `SID_FILE` on startup. **Nudge guard**: skips nudge injection if the worktree is clean and no push is expected, preventing spurious re-invocations. Callers must define `SID_FILE`, `LOGFILE`, and `log()` before sourcing. **Concurrency**: external `flock` on `session.lock` is gated behind `CLAUDE_EXTERNAL_LOCK=1` (default off). When unset, each container's per-session `CLAUDE_CONFIG_DIR` isolation lets Claude Code's native lockfile handle OAuth refresh — no external serialization needed. Set `CLAUDE_EXTERNAL_LOCK=1` to re-enable the old flock wrapper as a rollback mechanism. See [`docs/CLAUDE-AUTH-CONCURRENCY.md`](../docs/CLAUDE-AUTH-CONCURRENCY.md) and AD-002 (#647). | formula-driven agents (dev-agent, planner-run, predictor-run, gardener-run) |
| `lib/forge-setup.sh` | `setup_forge()` — Forgejo instance provisioning: creates admin user, bot accounts, org, repos (code + ops), configures webhooks, sets repo topics. Extracted from `bin/disinto`. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`. **Password storage (#361)**: after creating each bot account, stores its password in `.env` as `FORGE_<BOT>_PASS` (e.g. `FORGE_PASS`, `FORGE_REVIEW_PASS`, etc.) for use by `forge-push.sh`. | bin/disinto (init) | | `lib/forge-setup.sh` | `setup_forge()` — Forgejo instance provisioning: creates admin user, bot accounts, org, repos (code + ops), configures webhooks, sets repo topics. Extracted from `bin/disinto`. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`. **Password storage (#361)**: after creating each bot account, stores its password in `.env` as `FORGE_<BOT>_PASS` (e.g. `FORGE_PASS`, `FORGE_REVIEW_PASS`, etc.) for use by `forge-push.sh`. | bin/disinto (init) |
@ -30,6 +30,9 @@ sourced as needed.
| `lib/git-creds.sh` | Shared git credential helper configuration. `configure_git_creds([HOME_DIR] [RUN_AS_CMD])` — writes a static credential helper script and configures git globally to use password-based HTTP auth (Forgejo 11.x rejects API tokens for `git push`, #361). **Retry on cold boot (#741)**: resolves bot username from `FORGE_TOKEN` with 5 retries (exponential backoff 1-5s); fails loudly and returns 1 if Forgejo is unreachable — never falls back to a wrong hardcoded default (exports `BOT_USER` on success). `repair_baked_cred_urls([--as RUN_AS_CMD] DIR ...)` — rewrites any git remote URLs that have credentials baked in to use clean URLs instead; uses `safe.directory` bypass for root-owned repos (#671). Requires `FORGE_PASS`, `FORGE_URL`, `FORGE_TOKEN`. | entrypoints (agents, edge) | | `lib/git-creds.sh` | Shared git credential helper configuration. `configure_git_creds([HOME_DIR] [RUN_AS_CMD])` — writes a static credential helper script and configures git globally to use password-based HTTP auth (Forgejo 11.x rejects API tokens for `git push`, #361). **Retry on cold boot (#741)**: resolves bot username from `FORGE_TOKEN` with 5 retries (exponential backoff 1-5s); fails loudly and returns 1 if Forgejo is unreachable — never falls back to a wrong hardcoded default (exports `BOT_USER` on success). `repair_baked_cred_urls([--as RUN_AS_CMD] DIR ...)` — rewrites any git remote URLs that have credentials baked in to use clean URLs instead; uses `safe.directory` bypass for root-owned repos (#671). Requires `FORGE_PASS`, `FORGE_URL`, `FORGE_TOKEN`. | entrypoints (agents, edge) |
| `lib/ops-setup.sh` | `setup_ops_repo()` — creates ops repo on Forgejo if it doesn't exist, configures bot collaborators, clones/initializes ops repo locally, seeds directory structure (vault, knowledge, evidence, sprints). Evidence subdirectories seeded: engagement/, red-team/, holdout/, evolution/, user-test/. Also seeds sprints/ for architect output. Exports `_ACTUAL_OPS_SLUG`. `migrate_ops_repo(ops_root, [primary_branch])` — idempotent migration helper that seeds missing directories and .gitkeep files on existing ops repos (pre-#407 deployments). | bin/disinto (init) | | `lib/ops-setup.sh` | `setup_ops_repo()` — creates ops repo on Forgejo if it doesn't exist, configures bot collaborators, clones/initializes ops repo locally, seeds directory structure (vault, knowledge, evidence, sprints). Evidence subdirectories seeded: engagement/, red-team/, holdout/, evolution/, user-test/. Also seeds sprints/ for architect output. Exports `_ACTUAL_OPS_SLUG`. `migrate_ops_repo(ops_root, [primary_branch])` — idempotent migration helper that seeds missing directories and .gitkeep files on existing ops repos (pre-#407 deployments). | bin/disinto (init) |
| `lib/ci-setup.sh` | `_install_cron_impl()` — installs crontab entries for bare-metal deployments (compose mode uses polling loop instead). `_create_forgejo_oauth_app()` — generic helper to create an OAuth2 app on Forgejo (shared by Woodpecker and chat). `_create_woodpecker_oauth_impl()` — creates Woodpecker OAuth2 app (thin wrapper). `_create_chat_oauth_impl()` — creates disinto-chat OAuth2 app, writes `CHAT_OAUTH_CLIENT_ID`/`CHAT_OAUTH_CLIENT_SECRET` to `.env` (#708). `_generate_woodpecker_token_impl()` — auto-generates WOODPECKER_TOKEN via OAuth2 flow. `_activate_woodpecker_repo_impl()` — activates repo in Woodpecker. All gated by `_load_ci_context()` which validates required env vars. | bin/disinto (init) | | `lib/ci-setup.sh` | `_install_cron_impl()` — installs crontab entries for bare-metal deployments (compose mode uses polling loop instead). `_create_forgejo_oauth_app()` — generic helper to create an OAuth2 app on Forgejo (shared by Woodpecker and chat). `_create_woodpecker_oauth_impl()` — creates Woodpecker OAuth2 app (thin wrapper). `_create_chat_oauth_impl()` — creates disinto-chat OAuth2 app, writes `CHAT_OAUTH_CLIENT_ID`/`CHAT_OAUTH_CLIENT_SECRET` to `.env` (#708). `_generate_woodpecker_token_impl()` — auto-generates WOODPECKER_TOKEN via OAuth2 flow. `_activate_woodpecker_repo_impl()` — activates repo in Woodpecker. All gated by `_load_ci_context()` which validates required env vars. | bin/disinto (init) |
| `lib/generators.sh` | Template generation for `disinto init`: `generate_compose()` — docker-compose.yml (uses `codeberg.org/forgejo/forgejo:11.0` tag; adds `security_opt: [apparmor:unconfined]` to all services for rootless container compatibility; Forgejo includes a healthcheck so dependent services use `condition: service_healthy` — fixes cold-start races, #665; adds `chat` service block with isolated `chat-config` named volume and `CHAT_HISTORY_DIR` bind-mount for per-user NDJSON history persistence (#710); injects `FORWARD_AUTH_SECRET` for Caddy↔chat defense-in-depth auth (#709); cost-cap env vars `CHAT_MAX_REQUESTS_PER_HOUR`, `CHAT_MAX_REQUESTS_PER_DAY`, `CHAT_MAX_TOKENS_PER_DAY` (#711); subdomain fallback comment for `EDGE_TUNNEL_FQDN_*` vars (#713); all `depends_on` now use `condition: service_healthy/started` instead of bare service names), `generate_caddyfile()` — Caddyfile (routes: `/forge/*` → forgejo:3000, `/woodpecker/*` → woodpecker:8000, `/staging/*` → staging:80; `/chat/login` and `/chat/oauth/callback` bypass `forward_auth` so unauthenticated users can reach the OAuth flow; `/chat/*` gated by `forward_auth` on `chat:8080/chat/auth/verify` which stamps `X-Forwarded-User` (#709); root `/` redirects to `/forge/`), `generate_staging_index()` — staging index, `generate_deploy_pipelines()` — Woodpecker deployment pipeline configs. Requires `FACTORY_ROOT`, `PROJECT_NAME`, `PRIMARY_BRANCH`. | bin/disinto (init) | | `lib/generators.sh` | Template generation for `disinto init`: `generate_compose()` — docker-compose.yml (uses `codeberg.org/forgejo/forgejo:11.0` tag; adds `security_opt: [apparmor:unconfined]` to all services for rootless container compatibility; Forgejo includes a healthcheck so dependent services use `condition: service_healthy` — fixes cold-start races, #665; adds `chat` service block with isolated `chat-config` named volume and `CHAT_HISTORY_DIR` bind-mount for per-user NDJSON history persistence (#710); injects `FORWARD_AUTH_SECRET` for Caddy↔chat defense-in-depth auth (#709); cost-cap env vars `CHAT_MAX_REQUESTS_PER_HOUR`, `CHAT_MAX_REQUESTS_PER_DAY`, `CHAT_MAX_TOKENS_PER_DAY` (#711); subdomain fallback comment for `EDGE_TUNNEL_FQDN_*` vars (#713); all `depends_on` now use `condition: service_healthy/started` instead of bare service names; all services now include `restart: unless-stopped` including the edge service — #768; agents service now uses `image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest}` instead of `build:` (#429); `WOODPECKER_PLUGINS_PRIVILEGED` env var added to woodpecker service (#779); agents-llama conditional block gated on `ENABLE_LLAMA_AGENT=1` (#769); `agents-llama-all` compose service (profile `agents-llama-all`, all 7 roles: review,dev,gardener,architect,planner,predictor,supervisor) added by #801; agents service gains volume mounts for `./projects`, `./.env`, `./state`), `generate_caddyfile()` — Caddyfile (routes: `/forge/*` → forgejo:3000, `/woodpecker/*` → woodpecker:8000, `/staging/*` → staging:80; `/chat/login` and `/chat/oauth/callback` bypass `forward_auth` so unauthenticated users can reach the OAuth flow; `/chat/*` gated by `forward_auth` on `chat:8080/chat/auth/verify` which stamps `X-Forwarded-User` (#709); root `/` redirects to `/forge/`), `generate_staging_index()` — staging index, `generate_deploy_pipelines()` — Woodpecker deployment pipeline configs. Requires `FACTORY_ROOT`, `PROJECT_NAME`, `PRIMARY_BRANCH`. | bin/disinto (init) |
| `lib/sprint-filer.sh` | Post-merge sub-issue filer for sprint PRs. Invoked by the `.woodpecker/ops-filer.yml` pipeline after a sprint PR merges to ops repo `main`. Parses `<!-- filer:begin --> ... <!-- filer:end -->` blocks from sprint PR bodies to extract sub-issue definitions, creates them on the project repo using `FORGE_FILER_TOKEN` (narrow-scope `filer-bot` identity with `issues:write` only), adds `in-progress` label to the parent vision issue, and handles vision lifecycle closure when all sub-issues are closed. Uses `filer_api_all()` for paginated fetches. Idempotent: uses `<!-- decomposed-from: #<vision>, sprint: <slug>, id: <id> -->` markers to skip already-filed issues. Requires `FORGE_FILER_TOKEN`, `FORGE_API`, `FORGE_API_BASE`, `FORGE_OPS_REPO`. | `.woodpecker/ops-filer.yml` (CI pipeline on ops repo) |
| `lib/hire-agent.sh` | `disinto_hire_an_agent()` — user creation, `.profile` repo setup, formula copying, branch protection, and state marker creation for hiring a new agent. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`, `PROJECT_NAME`. Extracted from `bin/disinto`. | bin/disinto (hire) | | `lib/hire-agent.sh` | `disinto_hire_an_agent()` — user creation, `.profile` repo setup, formula copying, branch protection, and state marker creation for hiring a new agent. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`, `PROJECT_NAME`. Extracted from `bin/disinto`. | bin/disinto (hire) |
| `lib/release.sh` | `disinto_release()` — vault TOML creation, branch setup on ops repo, PR creation, and auto-merge request for a versioned release. `_assert_release_globals()` validates required env vars. Requires `FORGE_URL`, `FORGE_TOKEN`, `FORGE_OPS_REPO`, `FACTORY_ROOT`, `PRIMARY_BRANCH`. Extracted from `bin/disinto`. | bin/disinto (release) | | `lib/release.sh` | `disinto_release()` — vault TOML creation, branch setup on ops repo, PR creation, and auto-merge request for a versioned release. `_assert_release_globals()` validates required env vars. Requires `FORGE_URL`, `FORGE_TOKEN`, `FORGE_OPS_REPO`, `FACTORY_ROOT`, `PRIMARY_BRANCH`. Extracted from `bin/disinto`. | bin/disinto (release) |
| `lib/hvault.sh` | HashiCorp Vault helper module. `hvault_kv_get(PATH, [KEY])` — read KV v2 secret, optionally extract one key. `hvault_kv_put(PATH, KEY=VAL ...)` — write KV v2 secret. `hvault_kv_list(PATH)` — list keys at a KV path. `hvault_policy_apply(NAME, FILE)` — idempotent policy upsert. `hvault_jwt_login(ROLE, JWT)` — exchange JWT for short-lived token. `hvault_token_lookup()` — returns TTL/policies/accessor for current token. All functions use `VAULT_ADDR` + `VAULT_TOKEN` from env (fallback: `/etc/vault.d/root.token`), emit structured JSON errors to stderr on failure. Tests: `tests/lib-hvault.bats` (requires `vault server -dev`). | Not sourced at runtime yet — pure scaffolding for Nomad+Vault migration (#799) |
| `lib/init/nomad/` | Nomad+Vault Step 0 installer scripts. `cluster-up.sh` — idempotent orchestrator that runs all steps in order (installs packages, writes HCL, enables systemd units, unseals Vault); uses `poll_until_healthy()` helper for deduped readiness polling. `install.sh` — installs pinned Nomad+Vault apt packages. `vault-init.sh` — initializes Vault (unseal keys → `/etc/vault.d/`), creates dev-persisted unseal unit. `lib-systemd.sh` — shared systemd unit helpers. `systemd-nomad.sh`, `systemd-vault.sh` — write and enable service units. Idempotent: each step checks current state before acting. Sourced and called by `cluster-up.sh`; not sourced by agents. | `bin/disinto init --backend=nomad` |

View file

@ -1,9 +1,9 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# vault.sh — Helper for agents to create vault PRs on ops repo # action-vault.sh — Helper for agents to create vault PRs on ops repo
# #
# Source after lib/env.sh: # Source after lib/env.sh:
# source "$(dirname "$0")/../lib/env.sh" # source "$(dirname "$0")/../lib/env.sh"
# source "$(dirname "$0")/lib/vault.sh" # source "$(dirname "$0")/lib/action-vault.sh"
# #
# Required globals: FORGE_TOKEN, FORGE_URL, FORGE_REPO, FORGE_OPS_REPO # Required globals: FORGE_TOKEN, FORGE_URL, FORGE_REPO, FORGE_OPS_REPO
# Optional: OPS_REPO_ROOT (local path for ops repo) # Optional: OPS_REPO_ROOT (local path for ops repo)
@ -12,7 +12,7 @@
# vault_request <action_id> <toml_content> — Create vault PR, return PR number # vault_request <action_id> <toml_content> — Create vault PR, return PR number
# #
# The function: # The function:
# 1. Validates TOML content using validate_vault_action() from vault/vault-env.sh # 1. Validates TOML content using validate_vault_action() from action-vault/vault-env.sh
# 2. Creates a branch on the ops repo: vault/<action-id> # 2. Creates a branch on the ops repo: vault/<action-id>
# 3. Writes TOML to vault/actions/<action-id>.toml on that branch # 3. Writes TOML to vault/actions/<action-id>.toml on that branch
# 4. Creates PR targeting main with title "vault: <action-id>" # 4. Creates PR targeting main with title "vault: <action-id>"
@ -133,7 +133,7 @@ vault_request() {
printf '%s' "$toml_content" > "$tmp_toml" printf '%s' "$toml_content" > "$tmp_toml"
# Source vault-env.sh for validate_vault_action # Source vault-env.sh for validate_vault_action
local vault_env="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/vault/vault-env.sh" local vault_env="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/action-vault/vault-env.sh"
if [ ! -f "$vault_env" ]; then if [ ! -f "$vault_env" ]; then
echo "ERROR: vault-env.sh not found at $vault_env" >&2 echo "ERROR: vault-env.sh not found at $vault_env" >&2
return 1 return 1
@ -161,7 +161,7 @@ vault_request() {
ops_api="$(_vault_ops_api)" ops_api="$(_vault_ops_api)"
# Classify the action to determine if PR bypass is allowed # Classify the action to determine if PR bypass is allowed
local classify_script="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/vault/classify.sh" local classify_script="${FACTORY_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)}/action-vault/classify.sh"
local vault_tier local vault_tier
vault_tier=$("$classify_script" "${VAULT_ACTION_FORMULA:-}" "${VAULT_BLAST_RADIUS_OVERRIDE:-}") || { vault_tier=$("$classify_script" "${VAULT_ACTION_FORMULA:-}" "${VAULT_BLAST_RADIUS_OVERRIDE:-}") || {
# Classification failed, default to high tier (require PR) # Classification failed, default to high tier (require PR)

View file

@ -121,9 +121,10 @@ export FORGE_VAULT_TOKEN="${FORGE_VAULT_TOKEN:-${FORGE_TOKEN}}"
export FORGE_SUPERVISOR_TOKEN="${FORGE_SUPERVISOR_TOKEN:-${FORGE_TOKEN}}" export FORGE_SUPERVISOR_TOKEN="${FORGE_SUPERVISOR_TOKEN:-${FORGE_TOKEN}}"
export FORGE_PREDICTOR_TOKEN="${FORGE_PREDICTOR_TOKEN:-${FORGE_TOKEN}}" export FORGE_PREDICTOR_TOKEN="${FORGE_PREDICTOR_TOKEN:-${FORGE_TOKEN}}"
export FORGE_ARCHITECT_TOKEN="${FORGE_ARCHITECT_TOKEN:-${FORGE_TOKEN}}" export FORGE_ARCHITECT_TOKEN="${FORGE_ARCHITECT_TOKEN:-${FORGE_TOKEN}}"
export FORGE_FILER_TOKEN="${FORGE_FILER_TOKEN:-${FORGE_TOKEN}}"
# Bot usernames filter # Bot usernames filter
export FORGE_BOT_USERNAMES="${FORGE_BOT_USERNAMES:-dev-bot,review-bot,planner-bot,gardener-bot,vault-bot,supervisor-bot,predictor-bot,architect-bot}" export FORGE_BOT_USERNAMES="${FORGE_BOT_USERNAMES:-dev-bot,review-bot,planner-bot,gardener-bot,vault-bot,supervisor-bot,predictor-bot,architect-bot,filer-bot}"
# Project config # Project config
export FORGE_REPO="${FORGE_REPO:-}" export FORGE_REPO="${FORGE_REPO:-}"
@ -157,8 +158,8 @@ export WOODPECKER_SERVER="${WOODPECKER_SERVER:-http://localhost:8000}"
export CLAUDE_TIMEOUT="${CLAUDE_TIMEOUT:-7200}" export CLAUDE_TIMEOUT="${CLAUDE_TIMEOUT:-7200}"
# Vault-only token guard (#745): external-action tokens (GITHUB_TOKEN, CLAWHUB_TOKEN) # Vault-only token guard (#745): external-action tokens (GITHUB_TOKEN, CLAWHUB_TOKEN)
# must NEVER be available to agents. They live in .env.vault.enc and are injected # must NEVER be available to agents. They live in secrets/*.enc and are decrypted
# only into the ephemeral runner container at fire time. Unset them here so # only into the ephemeral runner container at fire time (#777). Unset them here so
# even an accidental .env inclusion cannot leak them into agent sessions. # even an accidental .env inclusion cannot leak them into agent sessions.
unset GITHUB_TOKEN 2>/dev/null || true unset GITHUB_TOKEN 2>/dev/null || true
unset CLAWHUB_TOKEN 2>/dev/null || true unset CLAWHUB_TOKEN 2>/dev/null || true
@ -312,6 +313,68 @@ memory_guard() {
fi fi
} }
# =============================================================================
# SECRET LOADING ABSTRACTION
# =============================================================================
# load_secret NAME [DEFAULT]
#
# Resolves a secret value using the following precedence:
# 1. /secrets/<NAME>.env — Nomad-rendered template (future)
# 2. Current environment — already set by .env.enc, compose, etc.
# 3. secrets/<NAME>.enc — age-encrypted per-key file (decrypted on demand)
# 4. DEFAULT (or empty)
#
# Prints the resolved value to stdout. Caches age-decrypted values in the
# process environment so subsequent calls are free.
# =============================================================================
load_secret() {
local name="$1"
local default="${2:-}"
# 1. Nomad-rendered template (future: Nomad writes /secrets/<NAME>.env)
local nomad_path="/secrets/${name}.env"
if [ -f "$nomad_path" ]; then
# Source into a subshell to extract just the value
local _nomad_val
_nomad_val=$(
set -a
# shellcheck source=/dev/null
source "$nomad_path"
set +a
printf '%s' "${!name:-}"
)
if [ -n "$_nomad_val" ]; then
export "$name=$_nomad_val"
printf '%s' "$_nomad_val"
return 0
fi
fi
# 2. Already in environment (set by .env.enc, compose injection, etc.)
if [ -n "${!name:-}" ]; then
printf '%s' "${!name}"
return 0
fi
# 3. Age-encrypted per-key file: secrets/<NAME>.enc (#777)
local _age_key="${HOME}/.config/sops/age/keys.txt"
local _enc_path="${FACTORY_ROOT}/secrets/${name}.enc"
if [ -f "$_enc_path" ] && [ -f "$_age_key" ] && command -v age &>/dev/null; then
local _dec_val
if _dec_val=$(age -d -i "$_age_key" "$_enc_path" 2>/dev/null) && [ -n "$_dec_val" ]; then
export "$name=$_dec_val"
printf '%s' "$_dec_val"
return 0
fi
fi
# 4. Default (or empty)
if [ -n "$default" ]; then
printf '%s' "$default"
fi
return 0
}
# Source tea helpers (available when tea binary is installed) # Source tea helpers (available when tea binary is installed)
if command -v tea &>/dev/null; then if command -v tea &>/dev/null; then
# shellcheck source=tea-helpers.sh # shellcheck source=tea-helpers.sh

View file

@ -31,8 +31,9 @@ _load_init_context() {
# Execute a command in the Forgejo container (for admin operations) # Execute a command in the Forgejo container (for admin operations)
_forgejo_exec() { _forgejo_exec() {
local use_bare="${DISINTO_BARE:-false}" local use_bare="${DISINTO_BARE:-false}"
local cname="${FORGEJO_CONTAINER_NAME:-disinto-forgejo}"
if [ "$use_bare" = true ]; then if [ "$use_bare" = true ]; then
docker exec -u git disinto-forgejo "$@" docker exec -u git "$cname" "$@"
else else
docker compose -f "${FACTORY_ROOT}/docker-compose.yml" exec -T -u git forgejo "$@" docker compose -f "${FACTORY_ROOT}/docker-compose.yml" exec -T -u git forgejo "$@"
fi fi
@ -94,11 +95,12 @@ setup_forge() {
# Bare-metal mode: standalone docker run # Bare-metal mode: standalone docker run
mkdir -p "${FORGEJO_DATA_DIR}" mkdir -p "${FORGEJO_DATA_DIR}"
if docker ps -a --format '{{.Names}}' | grep -q '^disinto-forgejo$'; then local cname="${FORGEJO_CONTAINER_NAME:-disinto-forgejo}"
docker start disinto-forgejo >/dev/null 2>&1 || true if docker ps -a --format '{{.Names}}' | grep -q "^${cname}$"; then
docker start "$cname" >/dev/null 2>&1 || true
else else
docker run -d \ docker run -d \
--name disinto-forgejo \ --name "$cname" \
--restart unless-stopped \ --restart unless-stopped \
-p "${forge_port}:3000" \ -p "${forge_port}:3000" \
-p 2222:22 \ -p 2222:22 \
@ -210,8 +212,8 @@ setup_forge() {
# Create human user (disinto-admin) as site admin if it doesn't exist # Create human user (disinto-admin) as site admin if it doesn't exist
local human_user="disinto-admin" local human_user="disinto-admin"
local human_pass # human_user == admin_user; reuse admin_pass for basic-auth operations
human_pass="admin-$(head -c 16 /dev/urandom | base64 | tr -dc 'a-zA-Z0-9' | head -c 20)" local human_pass="$admin_pass"
if ! curl -sf --max-time 5 -H "Authorization: token ${FORGE_TOKEN:-}" "${forge_url}/api/v1/users/${human_user}" >/dev/null 2>&1; then if ! curl -sf --max-time 5 -H "Authorization: token ${FORGE_TOKEN:-}" "${forge_url}/api/v1/users/${human_user}" >/dev/null 2>&1; then
echo "Creating human user: ${human_user}" echo "Creating human user: ${human_user}"
@ -243,6 +245,14 @@ setup_forge() {
echo "Human user: ${human_user} (already exists)" echo "Human user: ${human_user} (already exists)"
fi fi
# Preserve admin token if already stored in .env (idempotent re-run)
local admin_token=""
if _token_exists_in_env "FORGE_ADMIN_TOKEN" "$env_file" && [ "$rotate_tokens" = false ]; then
admin_token=$(grep '^FORGE_ADMIN_TOKEN=' "$env_file" | head -1 | cut -d= -f2-)
[ -n "$admin_token" ] && echo "Admin token: preserved (use --rotate-tokens to force)"
fi
if [ -z "$admin_token" ]; then
# Delete existing admin token if present (token sha1 is only returned at creation time) # Delete existing admin token if present (token sha1 is only returned at creation time)
local existing_token_id local existing_token_id
existing_token_id=$(curl -sf \ existing_token_id=$(curl -sf \
@ -256,7 +266,6 @@ setup_forge() {
fi fi
# Create admin token (fresh, so sha1 is returned) # Create admin token (fresh, so sha1 is returned)
local admin_token
admin_token=$(curl -sf -X POST \ admin_token=$(curl -sf -X POST \
-u "${admin_user}:${admin_pass}" \ -u "${admin_user}:${admin_pass}" \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
@ -269,23 +278,41 @@ setup_forge() {
exit 1 exit 1
fi fi
# Get or create human user token # Store admin token for idempotent re-runs
if grep -q '^FORGE_ADMIN_TOKEN=' "$env_file" 2>/dev/null; then
sed -i "s|^FORGE_ADMIN_TOKEN=.*|FORGE_ADMIN_TOKEN=${admin_token}|" "$env_file"
else
printf 'FORGE_ADMIN_TOKEN=%s\n' "$admin_token" >> "$env_file"
fi
echo "Admin token: generated and saved (FORGE_ADMIN_TOKEN)"
fi
# Get or create human user token (human_user == admin_user; use admin_pass)
local human_token="" local human_token=""
if _token_exists_in_env "HUMAN_TOKEN" "$env_file" && [ "$rotate_tokens" = false ]; then
human_token=$(grep '^HUMAN_TOKEN=' "$env_file" | head -1 | cut -d= -f2-)
if [ -n "$human_token" ]; then
export HUMAN_TOKEN="$human_token"
echo " Human token preserved (use --rotate-tokens to force)"
fi
fi
if [ -z "$human_token" ]; then
# Delete existing human token if present (token sha1 is only returned at creation time) # Delete existing human token if present (token sha1 is only returned at creation time)
local existing_human_token_id local existing_human_token_id
existing_human_token_id=$(curl -sf \ existing_human_token_id=$(curl -sf \
-u "${human_user}:${human_pass}" \ -u "${admin_user}:${admin_pass}" \
"${forge_url}/api/v1/users/${human_user}/tokens" 2>/dev/null \ "${forge_url}/api/v1/users/${human_user}/tokens" 2>/dev/null \
| jq -r '.[] | select(.name == "disinto-human-token") | .id') || existing_human_token_id="" | jq -r '.[] | select(.name == "disinto-human-token") | .id') || existing_human_token_id=""
if [ -n "$existing_human_token_id" ]; then if [ -n "$existing_human_token_id" ]; then
curl -sf -X DELETE \ curl -sf -X DELETE \
-u "${human_user}:${human_pass}" \ -u "${admin_user}:${admin_pass}" \
"${forge_url}/api/v1/users/${human_user}/tokens/${existing_human_token_id}" >/dev/null 2>&1 || true "${forge_url}/api/v1/users/${human_user}/tokens/${existing_human_token_id}" >/dev/null 2>&1 || true
fi fi
# Create human token (fresh, so sha1 is returned) # Create human token (use admin_pass since human_user == admin_user)
human_token=$(curl -sf -X POST \ human_token=$(curl -sf -X POST \
-u "${human_user}:${human_pass}" \ -u "${admin_user}:${admin_pass}" \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
"${forge_url}/api/v1/users/${human_user}/tokens" \ "${forge_url}/api/v1/users/${human_user}/tokens" \
-d '{"name":"disinto-human-token","scopes":["all"]}' 2>/dev/null \ -d '{"name":"disinto-human-token","scopes":["all"]}' 2>/dev/null \
@ -299,7 +326,8 @@ setup_forge() {
printf 'HUMAN_TOKEN=%s\n' "$human_token" >> "$env_file" printf 'HUMAN_TOKEN=%s\n' "$human_token" >> "$env_file"
fi fi
export HUMAN_TOKEN="$human_token" export HUMAN_TOKEN="$human_token"
echo " Human token saved (HUMAN_TOKEN)" echo " Human token generated and saved (HUMAN_TOKEN)"
fi
fi fi
# Create bot users and tokens # Create bot users and tokens
@ -719,7 +747,7 @@ setup_forge() {
fi fi
# Add all bot users as collaborators with appropriate permissions # Add all bot users as collaborators with appropriate permissions
# dev-bot: write (PR creation via lib/vault.sh) # dev-bot: write (PR creation via lib/action-vault.sh)
# review-bot: read (PR review) # review-bot: read (PR review)
# planner-bot: write (prerequisites.md, memory) # planner-bot: write (prerequisites.md, memory)
# gardener-bot: write (backlog grooming) # gardener-bot: write (backlog grooming)

View file

@ -819,8 +819,7 @@ build_prompt_footer() {
Base URL: ${FORGE_API} Base URL: ${FORGE_API}
Auth header: -H \"Authorization: token \${FORGE_TOKEN}\" Auth header: -H \"Authorization: token \${FORGE_TOKEN}\"
Read issue: curl -sf -H \"Authorization: token \${FORGE_TOKEN}\" '${FORGE_API}/issues/{number}' | jq '.body' Read issue: curl -sf -H \"Authorization: token \${FORGE_TOKEN}\" '${FORGE_API}/issues/{number}' | jq '.body'
Create issue: curl -sf -X POST -H \"Authorization: token \${FORGE_TOKEN}\" -H 'Content-Type: application/json' '${FORGE_API}/issues' -d '{\"title\":\"...\",\"body\":\"...\",\"labels\":[LABEL_ID]}'${extra_api} List labels: curl -sf -H \"Authorization: token \${FORGE_TOKEN}\" '${FORGE_API}/labels'${extra_api}
List labels: curl -sf -H \"Authorization: token \${FORGE_TOKEN}\" '${FORGE_API}/labels'
NEVER echo or include the actual token value in output — always reference \${FORGE_TOKEN}. NEVER echo or include the actual token value in output — always reference \${FORGE_TOKEN}.
## Environment ## Environment

View file

@ -97,29 +97,34 @@ _generate_local_model_services() {
POLL_INTERVAL) poll_interval_val="$value" ;; POLL_INTERVAL) poll_interval_val="$value" ;;
---) ---)
if [ -n "$service_name" ] && [ -n "$base_url" ]; then if [ -n "$service_name" ] && [ -n "$base_url" ]; then
# Per-agent FORGE_TOKEN / FORGE_PASS lookup (#834 Gap 3).
# Two hired llama agents must not share the same Forgejo identity,
# so we key the env-var lookup by forge_user (which hire-agent.sh
# writes as the Forgejo username). Apply the same tr 'a-z-' 'A-Z_'
# convention as hire-agent.sh Gap 1 so the names match.
local user_upper
user_upper=$(echo "$forge_user" | tr 'a-z-' 'A-Z_')
cat >> "$temp_file" <<EOF cat >> "$temp_file" <<EOF
agents-${service_name}: agents-${service_name}:
build: image: ghcr.io/disinto/agents:\${DISINTO_IMAGE_TAG:-latest}
context: .
dockerfile: docker/agents/Dockerfile
container_name: disinto-agents-${service_name} container_name: disinto-agents-${service_name}
restart: unless-stopped restart: unless-stopped
security_opt: security_opt:
- apparmor=unconfined - apparmor=unconfined
volumes: volumes:
- agents-${service_name}-data:/home/agent/data - agents-${service_name}-data:/home/agent/data
- project-repos:/home/agent/repos - project-repos-${service_name}:/home/agent/repos
- \${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:\${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - \${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:\${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- \${HOME}/.claude.json:/home/agent/.claude.json:ro - \${CLAUDE_CONFIG_FILE:-\${HOME}/.claude.json}:/home/agent/.claude.json:ro
- CLAUDE_BIN_PLACEHOLDER:/usr/local/bin/claude:ro - \${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- \${HOME}/.ssh:/home/agent/.ssh:ro - \${AGENT_SSH_DIR:-\${HOME}/.ssh}:/home/agent/.ssh:ro
environment: environment:
FORGE_URL: http://forgejo:3000 FORGE_URL: http://forgejo:3000
FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto} FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto}
# Use llama-specific credentials if available, otherwise fall back to main FORGE_TOKEN # Per-agent credentials keyed by forge_user (#834 Gap 3).
FORGE_TOKEN: \${FORGE_TOKEN_LLAMA:-\${FORGE_TOKEN:-}} FORGE_TOKEN: \${FORGE_TOKEN_${user_upper}:-}
FORGE_PASS: \${FORGE_PASS_LLAMA:-\${FORGE_PASS:-}} FORGE_PASS: \${FORGE_PASS_${user_upper}:-}
FORGE_REVIEW_TOKEN: \${FORGE_REVIEW_TOKEN:-} FORGE_REVIEW_TOKEN: \${FORGE_REVIEW_TOKEN:-}
FORGE_BOT_USERNAMES: \${FORGE_BOT_USERNAMES:-} FORGE_BOT_USERNAMES: \${FORGE_BOT_USERNAMES:-}
AGENT_ROLES: "${roles}" AGENT_ROLES: "${roles}"
@ -142,6 +147,7 @@ _generate_local_model_services() {
GARDENER_INTERVAL: "${GARDENER_INTERVAL:-21600}" GARDENER_INTERVAL: "${GARDENER_INTERVAL:-21600}"
ARCHITECT_INTERVAL: "${ARCHITECT_INTERVAL:-21600}" ARCHITECT_INTERVAL: "${ARCHITECT_INTERVAL:-21600}"
PLANNER_INTERVAL: "${PLANNER_INTERVAL:-43200}" PLANNER_INTERVAL: "${PLANNER_INTERVAL:-43200}"
SUPERVISOR_INTERVAL: "${SUPERVISOR_INTERVAL:-1200}"
depends_on: depends_on:
forgejo: forgejo:
condition: service_healthy condition: service_healthy
@ -154,13 +160,18 @@ _generate_local_model_services() {
EOF EOF
has_services=true has_services=true
fi fi
# Collect volume name for later # Collect per-agent volume names for later (#834 Gap 4: project-repos
local vol_name=" agents-${service_name}-data:" # must be per-agent so concurrent llama devs don't race on
# /home/agent/repos/_factory or state/.dev-active).
local vol_data=" agents-${service_name}-data:"
local vol_repos=" project-repos-${service_name}:"
if [ -n "$all_vols" ]; then if [ -n "$all_vols" ]; then
all_vols="${all_vols} all_vols="${all_vols}
${vol_name}" ${vol_data}
${vol_repos}"
else else
all_vols="${vol_name}" all_vols="${vol_data}
${vol_repos}"
fi fi
service_name="" base_url="" model="" roles="" api_key="" forge_user="" compact_pct="" poll_interval_val="" service_name="" base_url="" model="" roles="" api_key="" forge_user="" compact_pct="" poll_interval_val=""
;; ;;
@ -217,8 +228,14 @@ for name, config in agents.items():
# Add local-model volumes to the volumes section # Add local-model volumes to the volumes section
if [ -n "$all_vols" ]; then if [ -n "$all_vols" ]; then
# Escape embedded newlines as literal \n so sed's s/// replacement
# tolerates multi-line $all_vols (needed once >1 local-model agent is
# configured — without this, the second agent's volume entry would
# unterminate the sed expression).
local all_vols_escaped
all_vols_escaped=$(printf '%s' "$all_vols" | sed ':a;N;$!ba;s/\n/\\n/g')
# Find the volumes section and add the new volumes # Find the volumes section and add the new volumes
sed -i "/^volumes:/{n;:a;n;/^[a-z]/!{s/$/\n$all_vols/;b};ba}" "$temp_compose" sed -i "/^volumes:/{n;:a;n;/^[a-z]/!{s/$/\n$all_vols_escaped/;b};ba}" "$temp_compose"
fi fi
mv "$temp_compose" "$compose_file" mv "$temp_compose" "$compose_file"
@ -233,6 +250,7 @@ for name, config in agents.items():
# to materialize a working stack on a fresh checkout. # to materialize a working stack on a fresh checkout.
_generate_compose_impl() { _generate_compose_impl() {
local forge_port="${1:-3000}" local forge_port="${1:-3000}"
local use_build="${2:-false}"
local compose_file="${FACTORY_ROOT}/docker-compose.yml" local compose_file="${FACTORY_ROOT}/docker-compose.yml"
# Check if compose file already exists # Check if compose file already exists
@ -296,6 +314,7 @@ services:
WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-} WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-}
WOODPECKER_DATABASE_DRIVER: sqlite3 WOODPECKER_DATABASE_DRIVER: sqlite3
WOODPECKER_DATABASE_DATASOURCE: /var/lib/woodpecker/woodpecker.sqlite WOODPECKER_DATABASE_DATASOURCE: /var/lib/woodpecker/woodpecker.sqlite
WOODPECKER_PLUGINS_PRIVILEGED: ${WOODPECKER_PLUGINS_PRIVILEGED:-plugins/docker}
WOODPECKER_ENVIRONMENT: "FORGE_TOKEN:${FORGE_TOKEN}" WOODPECKER_ENVIRONMENT: "FORGE_TOKEN:${FORGE_TOKEN}"
depends_on: depends_on:
forgejo: forgejo:
@ -318,15 +337,19 @@ services:
WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-} WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-}
WOODPECKER_GRPC_SECURE: "false" WOODPECKER_GRPC_SECURE: "false"
WOODPECKER_HEALTHCHECK_ADDR: ":3333" WOODPECKER_HEALTHCHECK_ADDR: ":3333"
WOODPECKER_BACKEND_DOCKER_NETWORK: disinto_disinto-net WOODPECKER_BACKEND_DOCKER_NETWORK: ${WOODPECKER_CI_NETWORK:-disinto_disinto-net}
WOODPECKER_MAX_WORKFLOWS: 1 WOODPECKER_MAX_WORKFLOWS: 1
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3333/healthz"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
depends_on: depends_on:
- woodpecker - woodpecker
agents: agents:
build: image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest}
context: .
dockerfile: docker/agents/Dockerfile
container_name: disinto-agents container_name: disinto-agents
restart: unless-stopped restart: unless-stopped
security_opt: security_opt:
@ -335,11 +358,14 @@ services:
- agent-data:/home/agent/data - agent-data:/home/agent/data
- project-repos:/home/agent/repos - project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${HOME}/.claude.json:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- CLAUDE_BIN_PLACEHOLDER:/usr/local/bin/claude:ro - ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${HOME}/.ssh:/home/agent/.ssh:ro - ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${HOME}/.config/sops/age:/home/agent/.config/sops/age:ro - ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro - woodpecker-data:/woodpecker-data:ro
- ./projects:/home/agent/disinto/projects:ro
- ./.env:/home/agent/disinto/.env:ro
- ./state:/home/agent/disinto/state
environment: environment:
FORGE_URL: http://forgejo:3000 FORGE_URL: http://forgejo:3000
FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto} FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto}
@ -371,8 +397,14 @@ services:
PLANNER_INTERVAL: ${PLANNER_INTERVAL:-43200} PLANNER_INTERVAL: ${PLANNER_INTERVAL:-43200}
# IMPORTANT: agents get explicit environment variables (forge tokens, CI tokens, config). # IMPORTANT: agents get explicit environment variables (forge tokens, CI tokens, config).
# Vault-only secrets (GITHUB_TOKEN, CLAWHUB_TOKEN, deploy keys) live in # Vault-only secrets (GITHUB_TOKEN, CLAWHUB_TOKEN, deploy keys) live in
# .env.vault.enc and are NEVER injected here — only the runner # secrets/*.enc and are NEVER injected here — only the runner
# container receives them at fire time (AD-006, #745). # container receives them at fire time (AD-006, #745, #777).
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on: depends_on:
forgejo: forgejo:
condition: service_healthy condition: service_healthy
@ -381,10 +413,137 @@ services:
networks: networks:
- disinto-net - disinto-net
runner: COMPOSEEOF
# ── Conditional agents-llama block (ENABLE_LLAMA_AGENT=1) ──────────────
# Local-Qwen dev agent — gated on ENABLE_LLAMA_AGENT so factories without
# a local llama endpoint don't try to start it. See docs/agents-llama.md.
if [ "${ENABLE_LLAMA_AGENT:-0}" = "1" ]; then
cat >> "$compose_file" <<'LLAMAEOF'
agents-llama:
build: build:
context: . context: .
dockerfile: docker/agents/Dockerfile dockerfile: docker/agents/Dockerfile
container_name: disinto-agents-llama
restart: unless-stopped
security_opt:
- apparmor=unconfined
volumes:
- agent-data:/home/agent/data
- project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro
environment:
FORGE_URL: http://forgejo:3000
FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto}
FORGE_TOKEN: ${FORGE_TOKEN_LLAMA:-}
FORGE_PASS: ${FORGE_PASS_LLAMA:-}
FORGE_BOT_USERNAMES: ${FORGE_BOT_USERNAMES:-}
WOODPECKER_TOKEN: ${WOODPECKER_TOKEN:-}
CLAUDE_TIMEOUT: ${CLAUDE_TIMEOUT:-7200}
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC: ${CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC:-1}
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "60"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
ANTHROPIC_BASE_URL: ${ANTHROPIC_BASE_URL:-}
FORGE_ADMIN_PASS: ${FORGE_ADMIN_PASS:-}
DISINTO_CONTAINER: "1"
PROJECT_NAME: ${PROJECT_NAME:-project}
PROJECT_REPO_ROOT: /home/agent/repos/${PROJECT_NAME:-project}
WOODPECKER_DATA_DIR: /woodpecker-data
WOODPECKER_REPO_ID: "PLACEHOLDER_WP_REPO_ID"
CLAUDE_CONFIG_DIR: ${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}
POLL_INTERVAL: ${POLL_INTERVAL:-300}
AGENT_ROLES: dev
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on:
forgejo:
condition: service_healthy
networks:
- disinto-net
agents-llama-all:
build:
context: .
dockerfile: docker/agents/Dockerfile
container_name: disinto-agents-llama-all
restart: unless-stopped
profiles: ["agents-llama-all"]
security_opt:
- apparmor=unconfined
volumes:
- agent-data:/home/agent/data
- project-repos:/home/agent/repos
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
- ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
- ${AGENT_SSH_DIR:-${HOME}/.ssh}:/home/agent/.ssh:ro
- ${SOPS_AGE_DIR:-${HOME}/.config/sops/age}:/home/agent/.config/sops/age:ro
- woodpecker-data:/woodpecker-data:ro
environment:
FORGE_URL: http://forgejo:3000
FORGE_REPO: ${FORGE_REPO:-disinto-admin/disinto}
FORGE_TOKEN: ${FORGE_TOKEN_LLAMA:-}
FORGE_PASS: ${FORGE_PASS_LLAMA:-}
FORGE_REVIEW_TOKEN: ${FORGE_REVIEW_TOKEN:-}
FORGE_PLANNER_TOKEN: ${FORGE_PLANNER_TOKEN:-}
FORGE_GARDENER_TOKEN: ${FORGE_GARDENER_TOKEN:-}
FORGE_VAULT_TOKEN: ${FORGE_VAULT_TOKEN:-}
FORGE_SUPERVISOR_TOKEN: ${FORGE_SUPERVISOR_TOKEN:-}
FORGE_PREDICTOR_TOKEN: ${FORGE_PREDICTOR_TOKEN:-}
FORGE_ARCHITECT_TOKEN: ${FORGE_ARCHITECT_TOKEN:-}
FORGE_FILER_TOKEN: ${FORGE_FILER_TOKEN:-}
FORGE_BOT_USERNAMES: ${FORGE_BOT_USERNAMES:-}
WOODPECKER_TOKEN: ${WOODPECKER_TOKEN:-}
CLAUDE_TIMEOUT: ${CLAUDE_TIMEOUT:-7200}
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC: ${CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC:-1}
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE: "60"
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
ANTHROPIC_BASE_URL: ${ANTHROPIC_BASE_URL:-}
FORGE_ADMIN_PASS: ${FORGE_ADMIN_PASS:-}
DISINTO_CONTAINER: "1"
PROJECT_NAME: ${PROJECT_NAME:-project}
PROJECT_REPO_ROOT: /home/agent/repos/${PROJECT_NAME:-project}
WOODPECKER_DATA_DIR: /woodpecker-data
WOODPECKER_REPO_ID: "PLACEHOLDER_WP_REPO_ID"
CLAUDE_CONFIG_DIR: ${CLAUDE_CONFIG_DIR:-/var/lib/disinto/claude-shared/config}
POLL_INTERVAL: ${POLL_INTERVAL:-300}
GARDENER_INTERVAL: ${GARDENER_INTERVAL:-21600}
ARCHITECT_INTERVAL: ${ARCHITECT_INTERVAL:-21600}
PLANNER_INTERVAL: ${PLANNER_INTERVAL:-43200}
SUPERVISOR_INTERVAL: ${SUPERVISOR_INTERVAL:-1200}
AGENT_ROLES: review,dev,gardener,architect,planner,predictor,supervisor
healthcheck:
test: ["CMD", "pgrep", "-f", "entrypoint.sh"]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
depends_on:
forgejo:
condition: service_healthy
woodpecker:
condition: service_started
networks:
- disinto-net
LLAMAEOF
fi
# Resume the rest of the compose file (runner onward)
cat >> "$compose_file" <<'COMPOSEEOF'
runner:
image: ghcr.io/disinto/agents:${DISINTO_IMAGE_TAG:-latest}
profiles: ["vault"] profiles: ["vault"]
security_opt: security_opt:
- apparmor=unconfined - apparmor=unconfined
@ -405,8 +564,9 @@ services:
# Edge proxy — reverse proxy to Forgejo, Woodpecker, and staging # Edge proxy — reverse proxy to Forgejo, Woodpecker, and staging
# Serves on ports 80/443, routes based on path # Serves on ports 80/443, routes based on path
edge: edge:
build: ./docker/edge image: ghcr.io/disinto/edge:${DISINTO_IMAGE_TAG:-latest}
container_name: disinto-edge container_name: disinto-edge
restart: unless-stopped
security_opt: security_opt:
- apparmor=unconfined - apparmor=unconfined
ports: ports:
@ -441,7 +601,13 @@ services:
- /var/run/docker.sock:/var/run/docker.sock - /var/run/docker.sock:/var/run/docker.sock
- ./secrets/tunnel_key:/run/secrets/tunnel_key:ro - ./secrets/tunnel_key:/run/secrets/tunnel_key:ro
- ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared} - ${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}:${CLAUDE_SHARED_DIR:-/var/lib/disinto/claude-shared}
- ${HOME}/.claude.json:/home/agent/.claude.json:ro - ${CLAUDE_CONFIG_FILE:-${HOME}/.claude.json}:/home/agent/.claude.json:ro
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:2019/config/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
depends_on: depends_on:
forgejo: forgejo:
condition: service_healthy condition: service_healthy
@ -459,6 +625,12 @@ services:
command: ["caddy", "file-server", "--root", "/srv/site"] command: ["caddy", "file-server", "--root", "/srv/site"]
security_opt: security_opt:
- apparmor=unconfined - apparmor=unconfined
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:2019/config/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
volumes: volumes:
- ./docker:/srv/site:ro - ./docker:/srv/site:ro
networks: networks:
@ -499,7 +671,7 @@ services:
memswap_limit: 512m memswap_limit: 512m
volumes: volumes:
# Mount claude binary from host (same as agents) # Mount claude binary from host (same as agents)
- CLAUDE_BIN_PLACEHOLDER:/usr/local/bin/claude:ro - ${CLAUDE_BIN_DIR}:/usr/local/bin/claude:ro
# Throwaway named volume for chat config (isolated from host ~/.claude) # Throwaway named volume for chat config (isolated from host ~/.claude)
- chat-config:/var/chat/config - chat-config:/var/chat/config
# Chat history persistence: per-user NDJSON files on bind-mounted host volume # Chat history persistence: per-user NDJSON files on bind-mounted host volume
@ -518,6 +690,12 @@ services:
CHAT_MAX_REQUESTS_PER_HOUR: ${CHAT_MAX_REQUESTS_PER_HOUR:-60} CHAT_MAX_REQUESTS_PER_HOUR: ${CHAT_MAX_REQUESTS_PER_HOUR:-60}
CHAT_MAX_REQUESTS_PER_DAY: ${CHAT_MAX_REQUESTS_PER_DAY:-500} CHAT_MAX_REQUESTS_PER_DAY: ${CHAT_MAX_REQUESTS_PER_DAY:-500}
CHAT_MAX_TOKENS_PER_DAY: ${CHAT_MAX_TOKENS_PER_DAY:-1000000} CHAT_MAX_TOKENS_PER_DAY: ${CHAT_MAX_TOKENS_PER_DAY:-1000000}
healthcheck:
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
networks: networks:
- disinto-net - disinto-net
@ -556,20 +734,35 @@ COMPOSEEOF
fi fi
# Append local-model agent services if any are configured # Append local-model agent services if any are configured
# (must run before CLAUDE_BIN_PLACEHOLDER substitution so the placeholder
# in local-model services is also resolved)
_generate_local_model_services "$compose_file" _generate_local_model_services "$compose_file"
# Patch the Claude CLI binary path — resolve from host PATH at init time. # Resolve the Claude CLI binary path and persist as CLAUDE_BIN_DIR in .env.
# docker-compose.yml references ${CLAUDE_BIN_DIR} so the value must be set.
local claude_bin local claude_bin
claude_bin="$(command -v claude 2>/dev/null || true)" claude_bin="$(command -v claude 2>/dev/null || true)"
if [ -n "$claude_bin" ]; then if [ -n "$claude_bin" ]; then
# Resolve symlinks to get the real binary path
claude_bin="$(readlink -f "$claude_bin")" claude_bin="$(readlink -f "$claude_bin")"
sed -i "s|CLAUDE_BIN_PLACEHOLDER|${claude_bin}|g" "$compose_file"
else else
echo "Warning: claude CLI not found in PATH — update docker-compose.yml volumes manually" >&2 echo "Warning: claude CLI not found in PATH — set CLAUDE_BIN_DIR in .env manually" >&2
sed -i "s|CLAUDE_BIN_PLACEHOLDER|/usr/local/bin/claude|g" "$compose_file" claude_bin="/usr/local/bin/claude"
fi
# Persist CLAUDE_BIN_DIR into .env so docker-compose can resolve it.
local env_file="${FACTORY_ROOT}/.env"
if [ -f "$env_file" ]; then
if grep -q "^CLAUDE_BIN_DIR=" "$env_file" 2>/dev/null; then
sed -i "s|^CLAUDE_BIN_DIR=.*|CLAUDE_BIN_DIR=${claude_bin}|" "$env_file"
else
printf 'CLAUDE_BIN_DIR=%s\n' "$claude_bin" >> "$env_file"
fi
else
printf 'CLAUDE_BIN_DIR=%s\n' "$claude_bin" > "$env_file"
fi
# In build mode, replace image: with build: for locally-built images
if [ "$use_build" = true ]; then
sed -i 's|^\( agents:\)|\1|' "$compose_file"
sed -i '/^ image: ghcr\.io\/disinto\/agents:/{s|image: ghcr\.io/disinto/agents:.*|build:\n context: .\n dockerfile: docker/agents/Dockerfile|}' "$compose_file"
sed -i '/^ image: ghcr\.io\/disinto\/edge:/{s|image: ghcr\.io/disinto/edge:.*|build: ./docker/edge|}' "$compose_file"
fi fi
echo "Created: ${compose_file}" echo "Created: ${compose_file}"
@ -588,7 +781,11 @@ _generate_agent_docker_impl() {
fi fi
} }
# Generate docker/Caddyfile template for edge proxy. # Generate docker/Caddyfile for the edge proxy.
# **CANONICAL SOURCE**: This generator is the single source of truth for the Caddyfile.
# Output path: ${FACTORY_ROOT}/docker/Caddyfile (gitignored — generated artifact).
# The edge compose service mounts this path as /etc/caddy/Caddyfile.
# On a fresh clone, `disinto init` calls generate_caddyfile before first `disinto up`.
_generate_caddyfile_impl() { _generate_caddyfile_impl() {
local docker_dir="${FACTORY_ROOT}/docker" local docker_dir="${FACTORY_ROOT}/docker"
local caddyfile="${docker_dir}/Caddyfile" local caddyfile="${docker_dir}/Caddyfile"

View file

@ -167,10 +167,14 @@ disinto_hire_an_agent() {
echo "" echo ""
echo "Step 1.5: Generating Forge token for '${agent_name}'..." echo "Step 1.5: Generating Forge token for '${agent_name}'..."
# Convert role to uppercase token variable name (e.g., architect -> FORGE_ARCHITECT_TOKEN) # Key per-agent credentials by *agent name*, not role (#834 Gap 1).
local role_upper # Two agents with the same role (e.g. two `dev` agents) must not collide on
role_upper=$(echo "$role" | tr '[:lower:]' '[:upper:]') # FORGE_<ROLE>_TOKEN — the compose generator looks up FORGE_TOKEN_<USER_UPPER>
local token_var="FORGE_${role_upper}_TOKEN" # where USER_UPPER = tr 'a-z-' 'A-Z_' of the agent's forge_user.
local agent_upper
agent_upper=$(echo "$agent_name" | tr 'a-z-' 'A-Z_')
local token_var="FORGE_TOKEN_${agent_upper}"
local pass_var="FORGE_PASS_${agent_upper}"
# Generate token using the user's password (basic auth) # Generate token using the user's password (basic auth)
local agent_token="" local agent_token=""
@ -194,7 +198,7 @@ disinto_hire_an_agent() {
if [ -z "$agent_token" ]; then if [ -z "$agent_token" ]; then
echo " Warning: failed to create API token for '${agent_name}'" >&2 echo " Warning: failed to create API token for '${agent_name}'" >&2
else else
# Store token in .env under the role-specific variable name # Store token in .env under the per-agent variable name
if grep -q "^${token_var}=" "$env_file" 2>/dev/null; then if grep -q "^${token_var}=" "$env_file" 2>/dev/null; then
# Use sed with alternative delimiter and proper escaping for special chars in token # Use sed with alternative delimiter and proper escaping for special chars in token
local escaped_token local escaped_token
@ -208,6 +212,23 @@ disinto_hire_an_agent() {
export "${token_var}=${agent_token}" export "${token_var}=${agent_token}"
fi fi
# Persist FORGE_PASS_<AGENT_UPPER> to .env (#834 Gap 2).
# The container's git credential helper (docker/agents/entrypoint.sh) needs
# both FORGE_TOKEN_* and FORGE_PASS_* to pass HTTPS auth for git push
# (Forgejo 11.x rejects API tokens for git push, #361).
if [ -n "${user_pass:-}" ]; then
local escaped_pass
escaped_pass=$(printf '%s\n' "$user_pass" | sed 's/[&/\]/\\&/g')
if grep -q "^${pass_var}=" "$env_file" 2>/dev/null; then
sed -i "s|^${pass_var}=.*|${pass_var}=${escaped_pass}|" "$env_file"
echo " ${agent_name} password updated (${pass_var})"
else
printf '%s=%s\n' "$pass_var" "$user_pass" >> "$env_file"
echo " ${agent_name} password saved (${pass_var})"
fi
export "${pass_var}=${user_pass}"
fi
# Step 2: Create .profile repo on Forgejo # Step 2: Create .profile repo on Forgejo
echo "" echo ""
echo "Step 2: Creating '${agent_name}/.profile' repo (if not exists)..." echo "Step 2: Creating '${agent_name}/.profile' repo (if not exists)..."

279
lib/hvault.sh Normal file
View file

@ -0,0 +1,279 @@
#!/usr/bin/env bash
# hvault.sh — HashiCorp Vault helper module
#
# Typed, audited helpers for Vault KV v2 access so no script re-implements
# `curl -H "X-Vault-Token: ..."` ad-hoc.
#
# Usage: source this file, then call any hvault_* function.
#
# Environment:
# VAULT_ADDR — Vault server address (required, no default)
# VAULT_TOKEN — auth token (precedence: env > /etc/vault.d/root.token)
#
# All functions emit structured JSON errors to stderr on failure.
set -euo pipefail
# ── Internal helpers ─────────────────────────────────────────────────────────
# _hvault_err — emit structured JSON error to stderr
# Args: func_name, message, [detail]
_hvault_err() {
local func="$1" msg="$2" detail="${3:-}"
jq -n --arg func "$func" --arg msg "$msg" --arg detail "$detail" \
'{error:true,function:$func,message:$msg,detail:$detail}' >&2
}
# _hvault_resolve_token — resolve VAULT_TOKEN from env or token file
_hvault_resolve_token() {
if [ -n "${VAULT_TOKEN:-}" ]; then
return 0
fi
local token_file="/etc/vault.d/root.token"
if [ -f "$token_file" ]; then
VAULT_TOKEN="$(cat "$token_file")"
export VAULT_TOKEN
return 0
fi
return 1
}
# _hvault_check_prereqs — validate VAULT_ADDR and VAULT_TOKEN are set
# Args: caller function name
_hvault_check_prereqs() {
local caller="$1"
if [ -z "${VAULT_ADDR:-}" ]; then
_hvault_err "$caller" "VAULT_ADDR is not set" "export VAULT_ADDR before calling $caller"
return 1
fi
if ! _hvault_resolve_token; then
_hvault_err "$caller" "VAULT_TOKEN is not set and /etc/vault.d/root.token not found" \
"export VAULT_TOKEN or write token to /etc/vault.d/root.token"
return 1
fi
}
# _hvault_request — execute a Vault API request
# Args: method, path, [data]
# Outputs: response body to stdout
# Returns: 0 on 2xx, 1 otherwise (error JSON to stderr)
_hvault_request() {
local method="$1" path="$2" data="${3:-}"
local url="${VAULT_ADDR}/v1/${path}"
local http_code body
local tmpfile
tmpfile="$(mktemp)"
local curl_args=(
-s
-w '%{http_code}'
-H "X-Vault-Token: ${VAULT_TOKEN}"
-H "Content-Type: application/json"
-X "$method"
-o "$tmpfile"
)
if [ -n "$data" ]; then
curl_args+=(-d "$data")
fi
http_code="$(curl "${curl_args[@]}" "$url")" || {
_hvault_err "_hvault_request" "curl failed" "url=$url"
rm -f "$tmpfile"
return 1
}
body="$(cat "$tmpfile")"
rm -f "$tmpfile"
# Check HTTP status — 2xx is success
case "$http_code" in
2[0-9][0-9])
printf '%s' "$body"
return 0
;;
*)
_hvault_err "_hvault_request" "HTTP $http_code" "$body"
return 1
;;
esac
}
# ── Public API ───────────────────────────────────────────────────────────────
# hvault_kv_get PATH [KEY]
# Read a KV v2 secret at PATH, optionally extract a single KEY.
# Outputs: JSON value (full data object, or single key value)
hvault_kv_get() {
local path="${1:-}"
local key="${2:-}"
if [ -z "$path" ]; then
_hvault_err "hvault_kv_get" "PATH is required" "usage: hvault_kv_get PATH [KEY]"
return 1
fi
_hvault_check_prereqs "hvault_kv_get" || return 1
local response
response="$(_hvault_request GET "secret/data/${path}")" || return 1
if [ -n "$key" ]; then
printf '%s' "$response" | jq -e -r --arg key "$key" '.data.data[$key]' 2>/dev/null || {
_hvault_err "hvault_kv_get" "key not found" "key=$key path=$path"
return 1
}
else
printf '%s' "$response" | jq -e '.data.data' 2>/dev/null || {
_hvault_err "hvault_kv_get" "failed to parse response" "path=$path"
return 1
}
fi
}
# hvault_kv_put PATH KEY=VAL [KEY=VAL ...]
# Write a KV v2 secret at PATH. Accepts one or more KEY=VAL pairs.
hvault_kv_put() {
local path="${1:-}"
shift || true
if [ -z "$path" ] || [ $# -eq 0 ]; then
_hvault_err "hvault_kv_put" "PATH and at least one KEY=VAL required" \
"usage: hvault_kv_put PATH KEY=VAL [KEY=VAL ...]"
return 1
fi
_hvault_check_prereqs "hvault_kv_put" || return 1
# Build JSON payload from KEY=VAL pairs entirely via jq
local payload='{"data":{}}'
for kv in "$@"; do
local k="${kv%%=*}"
local v="${kv#*=}"
if [ "$k" = "$kv" ]; then
_hvault_err "hvault_kv_put" "invalid KEY=VAL pair" "got: $kv"
return 1
fi
payload="$(printf '%s' "$payload" | jq --arg k "$k" --arg v "$v" '.data[$k] = $v')"
done
_hvault_request POST "secret/data/${path}" "$payload" >/dev/null
}
# hvault_kv_list PATH
# List keys at a KV v2 path.
# Outputs: JSON array of key names
hvault_kv_list() {
local path="${1:-}"
if [ -z "$path" ]; then
_hvault_err "hvault_kv_list" "PATH is required" "usage: hvault_kv_list PATH"
return 1
fi
_hvault_check_prereqs "hvault_kv_list" || return 1
local response
response="$(_hvault_request LIST "secret/metadata/${path}")" || return 1
printf '%s' "$response" | jq -e '.data.keys' 2>/dev/null || {
_hvault_err "hvault_kv_list" "failed to parse response" "path=$path"
return 1
}
}
# hvault_policy_apply NAME FILE
# Idempotent policy upsert — create or update a Vault policy.
hvault_policy_apply() {
local name="${1:-}"
local file="${2:-}"
if [ -z "$name" ] || [ -z "$file" ]; then
_hvault_err "hvault_policy_apply" "NAME and FILE are required" \
"usage: hvault_policy_apply NAME FILE"
return 1
fi
if [ ! -f "$file" ]; then
_hvault_err "hvault_policy_apply" "policy file not found" "file=$file"
return 1
fi
_hvault_check_prereqs "hvault_policy_apply" || return 1
local policy_content
policy_content="$(cat "$file")"
local payload
payload="$(jq -n --arg policy "$policy_content" '{"policy": $policy}')"
_hvault_request PUT "sys/policies/acl/${name}" "$payload" >/dev/null
}
# hvault_jwt_login ROLE JWT
# Exchange a JWT for a short-lived Vault token.
# Outputs: client token string
hvault_jwt_login() {
local role="${1:-}"
local jwt="${2:-}"
if [ -z "$role" ] || [ -z "$jwt" ]; then
_hvault_err "hvault_jwt_login" "ROLE and JWT are required" \
"usage: hvault_jwt_login ROLE JWT"
return 1
fi
# Only need VAULT_ADDR, not VAULT_TOKEN (we're obtaining a token)
if [ -z "${VAULT_ADDR:-}" ]; then
_hvault_err "hvault_jwt_login" "VAULT_ADDR is not set"
return 1
fi
local payload
payload="$(jq -n --arg role "$role" --arg jwt "$jwt" \
'{"role": $role, "jwt": $jwt}')"
local response
# JWT login does not require an existing token — use curl directly
local tmpfile http_code
tmpfile="$(mktemp)"
http_code="$(curl -s -w '%{http_code}' \
-H "Content-Type: application/json" \
-X POST \
-d "$payload" \
-o "$tmpfile" \
"${VAULT_ADDR}/v1/auth/jwt/login")" || {
_hvault_err "hvault_jwt_login" "curl failed"
rm -f "$tmpfile"
return 1
}
local body
body="$(cat "$tmpfile")"
rm -f "$tmpfile"
case "$http_code" in
2[0-9][0-9])
printf '%s' "$body" | jq -e -r '.auth.client_token' 2>/dev/null || {
_hvault_err "hvault_jwt_login" "failed to extract client_token" "$body"
return 1
}
;;
*)
_hvault_err "hvault_jwt_login" "HTTP $http_code" "$body"
return 1
;;
esac
}
# hvault_token_lookup
# Returns TTL, policies, and accessor for the current token.
# Outputs: JSON object with ttl, policies, accessor fields
hvault_token_lookup() {
_hvault_check_prereqs "hvault_token_lookup" || return 1
local response
response="$(_hvault_request GET "auth/token/lookup-self")" || return 1
printf '%s' "$response" | jq -e '{
ttl: .data.ttl,
policies: .data.policies,
accessor: .data.accessor,
display_name: .data.display_name
}' 2>/dev/null || {
_hvault_err "hvault_token_lookup" "failed to parse token info"
return 1
}
}

338
lib/init/nomad/cluster-up.sh Executable file
View file

@ -0,0 +1,338 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/cluster-up.sh — Empty Nomad+Vault cluster orchestrator (S0.4)
#
# Wires together the S0.1S0.3 building blocks into one idempotent
# "bring up a single-node Nomad+Vault cluster" script:
#
# 1. install.sh (nomad + vault binaries)
# 2. systemd-nomad.sh (nomad.service — unit + enable, not started)
# 3. systemd-vault.sh (vault.service — unit + vault.hcl + enable)
# 4. Host-volume dirs (/srv/disinto/* matching nomad/client.hcl)
# 5. /etc/nomad.d/*.hcl (server.hcl + client.hcl from repo)
# 6. vault-init.sh (first-run init + unseal + persist keys)
# 7. systemctl start vault (auto-unseal via ExecStartPost; poll)
# 8. systemctl start nomad (poll until ≥1 ready node)
# 9. /etc/profile.d/disinto-nomad.sh (VAULT_ADDR + NOMAD_ADDR for shells)
#
# This is the "empty cluster" orchestrator — no jobs deployed. Subsequent
# Step-1 issues layer job deployment on top of this checkpoint.
#
# Idempotency contract:
# Running twice back-to-back on a healthy box is a no-op. Each sub-step
# is itself idempotent — see install.sh / systemd-*.sh / vault-init.sh
# headers for the per-step contract. Fast-paths in steps 7 and 8 skip
# the systemctl start when the service is already active + healthy.
#
# Usage:
# sudo lib/init/nomad/cluster-up.sh # bring cluster up
# sudo lib/init/nomad/cluster-up.sh --dry-run # print step list, exit 0
#
# Environment (override polling for slow boxes):
# VAULT_POLL_SECS max seconds to wait for vault to unseal (default: 30)
# NOMAD_POLL_SECS max seconds to wait for nomad node=ready (default: 60)
#
# Exit codes:
# 0 success (cluster up, or already up)
# 1 precondition or step failure
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
# Sub-scripts (siblings in this directory).
INSTALL_SH="${SCRIPT_DIR}/install.sh"
SYSTEMD_NOMAD_SH="${SCRIPT_DIR}/systemd-nomad.sh"
SYSTEMD_VAULT_SH="${SCRIPT_DIR}/systemd-vault.sh"
VAULT_INIT_SH="${SCRIPT_DIR}/vault-init.sh"
# In-repo Nomad configs copied to /etc/nomad.d/.
NOMAD_CONFIG_DIR="/etc/nomad.d"
NOMAD_SERVER_HCL_SRC="${REPO_ROOT}/nomad/server.hcl"
NOMAD_CLIENT_HCL_SRC="${REPO_ROOT}/nomad/client.hcl"
# /etc/profile.d entry — makes VAULT_ADDR + NOMAD_ADDR available to
# interactive shells without requiring the operator to source anything.
PROFILE_D_FILE="/etc/profile.d/disinto-nomad.sh"
# Host-volume paths — MUST match the `host_volume "..."` declarations
# in nomad/client.hcl. Adding a host_volume block there requires adding
# its path here so the dir exists before nomad starts (otherwise client
# fingerprinting fails and the node stays in "initializing").
HOST_VOLUME_DIRS=(
"/srv/disinto/forgejo-data"
"/srv/disinto/woodpecker-data"
"/srv/disinto/agent-data"
"/srv/disinto/project-repos"
"/srv/disinto/caddy-data"
"/srv/disinto/chat-history"
"/srv/disinto/ops-repo"
)
# Default API addresses — matches the listener bindings in
# nomad/server.hcl and nomad/vault.hcl. If either file ever moves
# off 127.0.0.1 / default port, update both places together.
VAULT_ADDR_DEFAULT="http://127.0.0.1:8200"
NOMAD_ADDR_DEFAULT="http://127.0.0.1:4646"
VAULT_POLL_SECS="${VAULT_POLL_SECS:-30}"
NOMAD_POLL_SECS="${NOMAD_POLL_SECS:-60}"
log() { printf '[cluster-up] %s\n' "$*"; }
die() { printf '[cluster-up] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Flag parsing ─────────────────────────────────────────────────────────────
dry_run=false
while [ $# -gt 0 ]; do
case "$1" in
--dry-run) dry_run=true; shift ;;
-h|--help)
cat <<EOF
Usage: sudo $(basename "$0") [--dry-run]
Brings up an empty single-node Nomad+Vault cluster (idempotent).
--dry-run Print the step list without performing any action.
EOF
exit 0
;;
*) die "unknown flag: $1" ;;
esac
done
# ── Dry-run: print step list + exit ──────────────────────────────────────────
if [ "$dry_run" = true ]; then
cat <<EOF
[dry-run] Step 1/9: install nomad + vault binaries
→ sudo ${INSTALL_SH}
[dry-run] Step 2/9: write + enable nomad.service (NOT started)
→ sudo ${SYSTEMD_NOMAD_SH}
[dry-run] Step 3/9: write + enable vault.service + vault.hcl (NOT started)
→ sudo ${SYSTEMD_VAULT_SH}
[dry-run] Step 4/9: create host-volume dirs under /srv/disinto/
EOF
for d in "${HOST_VOLUME_DIRS[@]}"; do
printf ' → install -d -m 0755 %s\n' "$d"
done
cat <<EOF
[dry-run] Step 5/9: install /etc/nomad.d/server.hcl + client.hcl from repo
${NOMAD_SERVER_HCL_SRC}${NOMAD_CONFIG_DIR}/server.hcl
${NOMAD_CLIENT_HCL_SRC}${NOMAD_CONFIG_DIR}/client.hcl
[dry-run] Step 6/9: first-run vault init + persist unseal.key + root.token
→ sudo ${VAULT_INIT_SH}
[dry-run] Step 7/9: systemctl start vault + poll until unsealed (${VAULT_POLL_SECS}s)
[dry-run] Step 8/9: systemctl start nomad + poll until ≥1 node ready (${NOMAD_POLL_SECS}s)
[dry-run] Step 9/9: write ${PROFILE_D_FILE}
export VAULT_ADDR=${VAULT_ADDR_DEFAULT}
export NOMAD_ADDR=${NOMAD_ADDR_DEFAULT}
Dry run complete — no changes made.
EOF
exit 0
fi
# ── Preconditions ────────────────────────────────────────────────────────────
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (spawns install/systemd/vault-init sub-scripts)"
fi
command -v systemctl >/dev/null 2>&1 \
|| die "systemctl not found (systemd required)"
for f in "$INSTALL_SH" "$SYSTEMD_NOMAD_SH" "$SYSTEMD_VAULT_SH" "$VAULT_INIT_SH"; do
[ -x "$f" ] || die "sub-script missing or non-executable: ${f}"
done
[ -f "$NOMAD_SERVER_HCL_SRC" ] \
|| die "source config not found: ${NOMAD_SERVER_HCL_SRC}"
[ -f "$NOMAD_CLIENT_HCL_SRC" ] \
|| die "source config not found: ${NOMAD_CLIENT_HCL_SRC}"
# ── Helpers ──────────────────────────────────────────────────────────────────
# install_file_if_differs SRC DST MODE
# Copy SRC to DST (root:root with MODE) iff on-disk content differs.
# No-op + log otherwise — preserves mtime, avoids spurious reloads.
install_file_if_differs() {
local src="$1" dst="$2" mode="$3"
if [ -f "$dst" ] && cmp -s "$src" "$dst"; then
log "unchanged: ${dst}"
return 0
fi
log "writing: ${dst}"
install -m "$mode" -o root -g root "$src" "$dst"
}
# vault_status_json — echo `vault status -format=json`, or '' on unreachable.
# vault status exit codes: 0 = unsealed, 2 = sealed/uninit, 1 = unreachable.
# We treat all of 0/2 as "reachable with state"; 1 yields empty output.
# Wrapped in `|| true` so set -e doesn't abort on exit 2 (the expected
# sealed-state case during first-boot polling).
vault_status_json() {
VAULT_ADDR="$VAULT_ADDR_DEFAULT" vault status -format=json 2>/dev/null || true
}
# vault_is_unsealed — true iff vault reachable AND initialized AND unsealed.
vault_is_unsealed() {
local out init sealed
out="$(vault_status_json)"
[ -n "$out" ] || return 1
init="$(printf '%s' "$out" | jq -r '.initialized' 2>/dev/null)" || init=""
sealed="$(printf '%s' "$out" | jq -r '.sealed' 2>/dev/null)" || sealed=""
[ "$init" = "true" ] && [ "$sealed" = "false" ]
}
# nomad_ready_count — echo the number of ready nodes, or 0 on error.
# `nomad node status -json` returns a JSON array of nodes, each with a
# .Status field ("initializing" | "ready" | "down" | "disconnected").
nomad_ready_count() {
local out
out="$(NOMAD_ADDR="$NOMAD_ADDR_DEFAULT" nomad node status -json 2>/dev/null || true)"
if [ -z "$out" ]; then
printf '0'
return 0
fi
printf '%s' "$out" \
| jq '[.[] | select(.Status == "ready")] | length' 2>/dev/null \
|| printf '0'
}
# nomad_has_ready_node — true iff nomad_ready_count ≥ 1. Wrapper exists
# so poll_until_healthy can call it as a single-arg command name.
nomad_has_ready_node() { [ "$(nomad_ready_count)" -ge 1 ]; }
# _die_with_service_status SVC REASON
# Log + dump `systemctl status SVC` to stderr + die with REASON. Factored
# out so the poll helper doesn't carry three copies of the same dump.
_die_with_service_status() {
local svc="$1" reason="$2"
log "${svc}.service ${reason} — systemctl status follows:"
systemctl --no-pager --full status "$svc" >&2 || true
die "${svc}.service ${reason}"
}
# poll_until_healthy SVC CHECK_CMD TIMEOUT
# Tick once per second for up to TIMEOUT seconds, invoking CHECK_CMD as a
# command name (no arguments). Returns 0 on the first successful check.
# Fails fast via _die_with_service_status if SVC enters systemd "failed"
# state, and dies with a status dump if TIMEOUT elapses before CHECK_CMD
# succeeds. Replaces the two in-line ready=1/break/sleep poll loops that
# would otherwise each duplicate the same pattern already in vault-init.sh.
poll_until_healthy() {
local svc="$1" check="$2" timeout="$3"
local waited=0
until [ "$waited" -ge "$timeout" ]; do
systemctl is-failed --quiet "$svc" \
&& _die_with_service_status "$svc" "entered failed state during startup"
if "$check"; then
log "${svc} healthy after ${waited}s"
return 0
fi
waited=$((waited + 1))
sleep 1
done
_die_with_service_status "$svc" "not healthy within ${timeout}s"
}
# ── Step 1/9: install.sh (nomad + vault binaries) ────────────────────────────
log "── Step 1/9: install nomad + vault binaries ──"
"$INSTALL_SH"
# ── Step 2/9: systemd-nomad.sh (unit + enable, not started) ──────────────────
log "── Step 2/9: install nomad.service (enable, not start) ──"
"$SYSTEMD_NOMAD_SH"
# ── Step 3/9: systemd-vault.sh (unit + vault.hcl + enable) ───────────────────
log "── Step 3/9: install vault.service + vault.hcl (enable, not start) ──"
"$SYSTEMD_VAULT_SH"
# ── Step 4/9: host-volume dirs matching nomad/client.hcl ─────────────────────
log "── Step 4/9: host-volume dirs under /srv/disinto/ ──"
# Parent /srv/disinto/ first (install -d handles missing parents, but being
# explicit makes the log output read naturally as a top-down creation).
install -d -m 0755 -o root -g root "/srv/disinto"
for d in "${HOST_VOLUME_DIRS[@]}"; do
if [ -d "$d" ]; then
log "unchanged: ${d}"
else
log "creating: ${d}"
install -d -m 0755 -o root -g root "$d"
fi
done
# ── Step 5/9: /etc/nomad.d/server.hcl + client.hcl ───────────────────────────
log "── Step 5/9: install /etc/nomad.d/{server,client}.hcl ──"
# systemd-nomad.sh already created /etc/nomad.d/. Re-assert for clarity +
# in case someone runs cluster-up.sh with an exotic step ordering later.
install -d -m 0755 -o root -g root "$NOMAD_CONFIG_DIR"
install_file_if_differs "$NOMAD_SERVER_HCL_SRC" "${NOMAD_CONFIG_DIR}/server.hcl" 0644
install_file_if_differs "$NOMAD_CLIENT_HCL_SRC" "${NOMAD_CONFIG_DIR}/client.hcl" 0644
# ── Step 6/9: vault-init (first-run init + unseal + persist keys) ────────────
log "── Step 6/9: vault-init (no-op after first run) ──"
# vault-init.sh spawns a temporary vault server if systemd isn't managing
# one, runs `operator init`, writes unseal.key + root.token, unseals once,
# then stops the temp server (EXIT trap). After it returns, port 8200 is
# free for systemctl-managed vault to take in step 7.
"$VAULT_INIT_SH"
# ── Step 7/9: systemctl start vault + poll until unsealed ────────────────────
log "── Step 7/9: start vault + poll until unsealed ──"
# Fast-path when vault.service is already active and Vault reports
# initialized=true,sealed=false — re-runs are a no-op.
if systemctl is-active --quiet vault && vault_is_unsealed; then
log "vault already active + unsealed — skip start"
else
systemctl start vault
poll_until_healthy vault vault_is_unsealed "$VAULT_POLL_SECS"
fi
# ── Step 8/9: systemctl start nomad + poll until ≥1 node ready ───────────────
log "── Step 8/9: start nomad + poll until ≥1 node ready ──"
if systemctl is-active --quiet nomad && nomad_has_ready_node; then
log "nomad already active + ≥1 node ready — skip start"
else
systemctl start nomad
poll_until_healthy nomad nomad_has_ready_node "$NOMAD_POLL_SECS"
fi
# ── Step 9/9: /etc/profile.d/disinto-nomad.sh ────────────────────────────────
log "── Step 9/9: write ${PROFILE_D_FILE} ──"
# Shell rc fragments in /etc/profile.d/ are sourced by /etc/profile for
# every interactive login shell. Setting VAULT_ADDR + NOMAD_ADDR here means
# the operator can run `vault status` / `nomad node status` straight after
# `ssh factory-box` without fumbling env vars.
desired_profile="# /etc/profile.d/disinto-nomad.sh — written by lib/init/nomad/cluster-up.sh
# Interactive-shell defaults for Vault + Nomad clients on this box.
export VAULT_ADDR=${VAULT_ADDR_DEFAULT}
export NOMAD_ADDR=${NOMAD_ADDR_DEFAULT}
"
if [ -f "$PROFILE_D_FILE" ] \
&& printf '%s' "$desired_profile" | cmp -s - "$PROFILE_D_FILE"; then
log "unchanged: ${PROFILE_D_FILE}"
else
log "writing: ${PROFILE_D_FILE}"
# Subshell + EXIT trap: guarantees the tempfile is cleaned up on both
# success AND set-e-induced failure of `install`. A function-scoped
# RETURN trap does NOT fire on errexit-abort in bash — the subshell is
# the reliable cleanup boundary here.
(
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
printf '%s' "$desired_profile" > "$tmp"
install -m 0644 -o root -g root "$tmp" "$PROFILE_D_FILE"
)
fi
log "── done: empty nomad+vault cluster is up ──"
log " Vault: ${VAULT_ADDR_DEFAULT} (Sealed=false Initialized=true)"
log " Nomad: ${NOMAD_ADDR_DEFAULT} (≥1 node ready)"

143
lib/init/nomad/install.sh Executable file
View file

@ -0,0 +1,143 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/install.sh — Idempotent apt install of HashiCorp Nomad + Vault
#
# Part of the Nomad+Vault migration. Installs both the `nomad` binary (S0.2,
# issue #822) and the `vault` binary (S0.3, issue #823) from the same
# HashiCorp apt repository. Does NOT configure, start, or enable any systemd
# unit — lib/init/nomad/systemd-nomad.sh and lib/init/nomad/systemd-vault.sh
# own that. Does NOT wire this script into `disinto init` — S0.4 owns that.
#
# Idempotency contract:
# - Running twice back-to-back is a no-op once both target versions are
# installed and the apt source is in place.
# - Adds the HashiCorp apt keyring only if it is absent.
# - Adds the HashiCorp apt sources list only if it is absent.
# - Skips `apt-get install` for any package whose installed version already
# matches the pin. If both are at pin, exits before touching apt.
#
# Configuration:
# NOMAD_VERSION — pinned Nomad version (default: see below). Apt package
# name is versioned as "nomad=<version>-1".
# VAULT_VERSION — pinned Vault version (default: see below). Apt package
# name is versioned as "vault=<version>-1".
#
# Usage:
# sudo lib/init/nomad/install.sh
# sudo NOMAD_VERSION=1.9.5 VAULT_VERSION=1.18.5 lib/init/nomad/install.sh
#
# Exit codes:
# 0 success (installed or already present)
# 1 precondition failure (not Debian/Ubuntu, missing tools, not root)
# =============================================================================
set -euo pipefail
# Pin to specific 1.x releases. Bump here, not at call sites.
NOMAD_VERSION="${NOMAD_VERSION:-1.9.5}"
VAULT_VERSION="${VAULT_VERSION:-1.18.5}"
HASHICORP_KEYRING="/usr/share/keyrings/hashicorp-archive-keyring.gpg"
HASHICORP_SOURCES="/etc/apt/sources.list.d/hashicorp.list"
HASHICORP_GPG_URL="https://apt.releases.hashicorp.com/gpg"
HASHICORP_REPO_URL="https://apt.releases.hashicorp.com"
log() { printf '[install] %s\n' "$*"; }
die() { printf '[install] ERROR: %s\n' "$*" >&2; exit 1; }
# _installed_version BINARY
# Echoes the installed semver for `nomad` or `vault` (e.g. "1.9.5").
# Both tools print their version on the first line of `<bin> version` as
# "<Name> v<semver>..." — the shared awk extracts $2 with the leading "v"
# stripped. Empty string when the binary is absent or output is unexpected.
_installed_version() {
local bin="$1"
command -v "$bin" >/dev/null 2>&1 || { printf ''; return 0; }
"$bin" version 2>/dev/null \
| awk 'NR==1 {sub(/^v/, "", $2); print $2; exit}'
}
# ── Preconditions ────────────────────────────────────────────────────────────
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs apt-get + /usr/share/keyrings write access)"
fi
for bin in apt-get gpg curl lsb_release; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
CODENAME="$(lsb_release -cs)"
[ -n "$CODENAME" ] || die "lsb_release returned empty codename"
# ── Fast-path: are both already at desired versions? ─────────────────────────
nomad_installed="$(_installed_version nomad)"
vault_installed="$(_installed_version vault)"
need_pkgs=()
if [ "$nomad_installed" = "$NOMAD_VERSION" ]; then
log "nomad ${NOMAD_VERSION} already installed"
else
need_pkgs+=("nomad=${NOMAD_VERSION}-1")
fi
if [ "$vault_installed" = "$VAULT_VERSION" ]; then
log "vault ${VAULT_VERSION} already installed"
else
need_pkgs+=("vault=${VAULT_VERSION}-1")
fi
if [ "${#need_pkgs[@]}" -eq 0 ]; then
log "nothing to do"
exit 0
fi
# ── Ensure HashiCorp apt keyring ─────────────────────────────────────────────
if [ ! -f "$HASHICORP_KEYRING" ]; then
log "adding HashiCorp apt keyring → ${HASHICORP_KEYRING}"
tmpkey="$(mktemp)"
trap 'rm -f "$tmpkey"' EXIT
curl -fsSL "$HASHICORP_GPG_URL" -o "$tmpkey" \
|| die "failed to fetch HashiCorp GPG key from ${HASHICORP_GPG_URL}"
gpg --dearmor -o "$HASHICORP_KEYRING" < "$tmpkey" \
|| die "failed to dearmor HashiCorp GPG key"
chmod 0644 "$HASHICORP_KEYRING"
rm -f "$tmpkey"
trap - EXIT
else
log "HashiCorp apt keyring already present"
fi
# ── Ensure HashiCorp apt sources list ────────────────────────────────────────
desired_source="deb [signed-by=${HASHICORP_KEYRING}] ${HASHICORP_REPO_URL} ${CODENAME} main"
if [ ! -f "$HASHICORP_SOURCES" ] \
|| ! grep -qxF "$desired_source" "$HASHICORP_SOURCES"; then
log "writing HashiCorp apt sources list → ${HASHICORP_SOURCES}"
printf '%s\n' "$desired_source" > "$HASHICORP_SOURCES"
apt_update_needed=1
else
log "HashiCorp apt sources list already present"
apt_update_needed=0
fi
# ── Install the pinned versions ──────────────────────────────────────────────
if [ "$apt_update_needed" -eq 1 ]; then
log "running apt-get update"
DEBIAN_FRONTEND=noninteractive apt-get update -qq \
|| die "apt-get update failed"
fi
log "installing ${need_pkgs[*]}"
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
"${need_pkgs[@]}" \
|| die "apt-get install ${need_pkgs[*]} failed"
# ── Verify ───────────────────────────────────────────────────────────────────
final_nomad="$(_installed_version nomad)"
if [ "$final_nomad" != "$NOMAD_VERSION" ]; then
die "post-install check: expected nomad ${NOMAD_VERSION}, got '${final_nomad}'"
fi
final_vault="$(_installed_version vault)"
if [ "$final_vault" != "$VAULT_VERSION" ]; then
die "post-install check: expected vault ${VAULT_VERSION}, got '${final_vault}'"
fi
log "nomad ${NOMAD_VERSION} + vault ${VAULT_VERSION} installed successfully"

View file

@ -0,0 +1,77 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/lib-systemd.sh — Shared idempotent systemd-unit installer
#
# Sourced by lib/init/nomad/systemd-nomad.sh and lib/init/nomad/systemd-vault.sh
# (and any future sibling) to collapse the "write unit if content differs,
# daemon-reload, enable (never start)" boilerplate.
#
# Install-but-don't-start is the invariant this helper enforces — mid-migration
# installers land files and enable units; the orchestrator (S0.4) starts them.
#
# Public API (sourced into caller scope):
#
# systemd_require_preconditions UNIT_PATH
# Asserts the caller is uid 0 and `systemctl` is on $PATH. Calls the
# caller's die() with a UNIT_PATH-scoped message on failure.
#
# systemd_install_unit UNIT_PATH UNIT_NAME UNIT_CONTENT
# Writes UNIT_CONTENT to UNIT_PATH (0644 root:root) only if on-disk
# content differs. If written, runs `systemctl daemon-reload`. Then
# enables UNIT_NAME (no-op if already enabled). Never starts the unit.
#
# Caller contract:
# - Callers MUST define `log()` and `die()` before sourcing this file (we
# call log() for status chatter and rely on the caller's error-handling
# stance; `set -e` propagates install/cmp/systemctl failures).
# =============================================================================
# systemd_require_preconditions UNIT_PATH
systemd_require_preconditions() {
local unit_path="$1"
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs write access to ${unit_path})"
fi
command -v systemctl >/dev/null 2>&1 \
|| die "systemctl not found (systemd is required)"
}
# systemd_install_unit UNIT_PATH UNIT_NAME UNIT_CONTENT
systemd_install_unit() {
local unit_path="$1"
local unit_name="$2"
local unit_content="$3"
local needs_reload=0
if [ ! -f "$unit_path" ] \
|| ! printf '%s\n' "$unit_content" | cmp -s - "$unit_path"; then
log "writing unit → ${unit_path}"
# Subshell-scoped EXIT trap guarantees the temp file is removed on
# both success AND set-e-induced failure of `install`. A function-
# scoped RETURN trap does NOT fire on errexit-abort (bash only runs
# RETURN on normal function exit), so the subshell is the reliable
# cleanup boundary. It's also isolated from the caller's EXIT trap.
(
local tmp
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
printf '%s\n' "$unit_content" > "$tmp"
install -m 0644 -o root -g root "$tmp" "$unit_path"
)
needs_reload=1
else
log "unit file already up to date"
fi
if [ "$needs_reload" -eq 1 ]; then
log "systemctl daemon-reload"
systemctl daemon-reload
fi
if systemctl is-enabled --quiet "$unit_name" 2>/dev/null; then
log "${unit_name} already enabled"
else
log "systemctl enable ${unit_name}"
systemctl enable "$unit_name" >/dev/null
fi
}

102
lib/init/nomad/systemd-nomad.sh Executable file
View file

@ -0,0 +1,102 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/systemd-nomad.sh — Idempotent systemd unit installer for Nomad
#
# Part of the Nomad+Vault migration (S0.2, issue #822). Writes
# /etc/systemd/system/nomad.service pointing at /etc/nomad.d/ and runs
# `systemctl enable nomad` WITHOUT starting the service — we don't launch
# the cluster until S0.4 wires everything together.
#
# Idempotency contract:
# - Existing unit file is NOT rewritten when on-disk content already
# matches the desired content (avoids spurious `daemon-reload`).
# - `systemctl enable` on an already-enabled unit is a no-op.
# - This script is safe to run unconditionally before every factory boot.
#
# Preconditions:
# - nomad binary installed (see lib/init/nomad/install.sh)
# - /etc/nomad.d/ will hold server.hcl / client.hcl (placed by S0.4)
#
# Usage:
# sudo lib/init/nomad/systemd-nomad.sh
#
# Exit codes:
# 0 success (unit installed + enabled, or already so)
# 1 precondition failure (not root, no systemctl, no nomad binary)
# =============================================================================
set -euo pipefail
UNIT_PATH="/etc/systemd/system/nomad.service"
NOMAD_CONFIG_DIR="/etc/nomad.d"
NOMAD_DATA_DIR="/var/lib/nomad"
log() { printf '[systemd-nomad] %s\n' "$*"; }
die() { printf '[systemd-nomad] ERROR: %s\n' "$*" >&2; exit 1; }
# shellcheck source=lib-systemd.sh
. "$(dirname "${BASH_SOURCE[0]}")/lib-systemd.sh"
# ── Preconditions ────────────────────────────────────────────────────────────
systemd_require_preconditions "$UNIT_PATH"
NOMAD_BIN="$(command -v nomad 2>/dev/null || true)"
[ -n "$NOMAD_BIN" ] \
|| die "nomad binary not found — run lib/init/nomad/install.sh first"
# ── Desired unit content ─────────────────────────────────────────────────────
# Upstream-recommended baseline (https://developer.hashicorp.com/nomad/docs/install/production/deployment-guide)
# trimmed for a single-node combined server+client dev box.
# - Wants=/After= network-online: nomad must have networking up.
# - User/Group=root: the Docker driver needs root to talk to dockerd.
# - LimitNOFILE/LimitNPROC=infinity: avoid Nomad's startup warning.
# - KillSignal=SIGINT: triggers Nomad's graceful shutdown path.
# - Restart=on-failure with a bounded burst to avoid crash-loops eating the
# journal when /etc/nomad.d/ is mis-configured.
read -r -d '' DESIRED_UNIT <<EOF || true
[Unit]
Description=Nomad
Documentation=https://developer.hashicorp.com/nomad/docs
Wants=network-online.target
After=network-online.target
# When Docker is present, ensure dockerd is up before nomad starts — the
# Docker task driver needs the daemon socket available at startup.
Wants=docker.service
After=docker.service
[Service]
Type=notify
User=root
Group=root
ExecReload=/bin/kill -HUP \$MAINPID
ExecStart=${NOMAD_BIN} agent -config=${NOMAD_CONFIG_DIR}
KillMode=process
KillSignal=SIGINT
LimitNOFILE=infinity
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
StartLimitBurst=3
StartLimitIntervalSec=10
TasksMax=infinity
OOMScoreAdjust=-1000
[Install]
WantedBy=multi-user.target
EOF
# ── Ensure config + data dirs exist ──────────────────────────────────────────
# We do not populate /etc/nomad.d/ here (that's S0.4). We do create the
# directory so `nomad agent -config=/etc/nomad.d` doesn't error if the unit
# is started before hcl files are dropped in.
for d in "$NOMAD_CONFIG_DIR" "$NOMAD_DATA_DIR"; do
if [ ! -d "$d" ]; then
log "creating ${d}"
install -d -m 0755 "$d"
fi
done
# ── Install + reload + enable (shared with systemd-vault.sh via lib-systemd) ─
systemd_install_unit "$UNIT_PATH" "nomad.service" "$DESIRED_UNIT"
log "done — unit installed and enabled (NOT started; S0.4 brings the cluster up)"

151
lib/init/nomad/systemd-vault.sh Executable file
View file

@ -0,0 +1,151 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/systemd-vault.sh — Idempotent systemd unit installer for Vault
#
# Part of the Nomad+Vault migration (S0.3, issue #823). Lands three things:
# 1. /etc/vault.d/ (0755 root:root)
# 2. /etc/vault.d/vault.hcl (copy of nomad/vault.hcl, 0644 root:root)
# 3. /var/lib/vault/data/ (0700 root:root, Vault file-storage backend)
# 4. /etc/systemd/system/vault.service (0644 root:root)
#
# Then `systemctl enable vault` WITHOUT starting the service. Bootstrap
# order is:
# lib/init/nomad/install.sh (nomad + vault binaries)
# lib/init/nomad/systemd-vault.sh (this script — unit + config + dirs)
# lib/init/nomad/vault-init.sh (init + write unseal.key + unseal once)
# systemctl start vault (ExecStartPost auto-unseals from file)
#
# The systemd unit's ExecStartPost reads /etc/vault.d/unseal.key and calls
# `vault operator unseal`. That file is written by vault-init.sh on first
# run; until it exists, `systemctl start vault` will leave Vault sealed
# (ExecStartPost fails, unit goes into failed state — intentional, visible).
#
# Seal model:
# The single unseal key lives at /etc/vault.d/unseal.key (0400 root).
# Seal-key theft == vault theft. Factory-dev-box-acceptable tradeoff —
# we avoid running a second Vault to auto-unseal the first.
#
# Idempotency contract:
# - Unit file NOT rewritten when on-disk content already matches desired.
# - vault.hcl NOT rewritten when on-disk content matches the repo copy.
# - `systemctl enable` on an already-enabled unit is a no-op.
# - Safe to run unconditionally before every factory boot.
#
# Preconditions:
# - vault binary installed (lib/init/nomad/install.sh)
# - nomad/vault.hcl present in the repo (relative to this script)
#
# Usage:
# sudo lib/init/nomad/systemd-vault.sh
#
# Exit codes:
# 0 success (unit+config installed + enabled, or already so)
# 1 precondition failure (not root, no systemctl, no vault binary,
# missing source config)
# =============================================================================
set -euo pipefail
UNIT_PATH="/etc/systemd/system/vault.service"
VAULT_CONFIG_DIR="/etc/vault.d"
VAULT_CONFIG_FILE="${VAULT_CONFIG_DIR}/vault.hcl"
VAULT_DATA_DIR="/var/lib/vault/data"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
VAULT_HCL_SRC="${REPO_ROOT}/nomad/vault.hcl"
log() { printf '[systemd-vault] %s\n' "$*"; }
die() { printf '[systemd-vault] ERROR: %s\n' "$*" >&2; exit 1; }
# shellcheck source=lib-systemd.sh
. "${SCRIPT_DIR}/lib-systemd.sh"
# ── Preconditions ────────────────────────────────────────────────────────────
systemd_require_preconditions "$UNIT_PATH"
VAULT_BIN="$(command -v vault 2>/dev/null || true)"
[ -n "$VAULT_BIN" ] \
|| die "vault binary not found — run lib/init/nomad/install.sh first"
[ -f "$VAULT_HCL_SRC" ] \
|| die "source config not found: ${VAULT_HCL_SRC}"
# ── Desired unit content ─────────────────────────────────────────────────────
# Adapted from HashiCorp's recommended vault.service template
# (https://developer.hashicorp.com/vault/tutorials/getting-started-deploy/deploy)
# for a single-node factory dev box:
# - User=root keeps the seal-key read path simple (unseal.key is 0400 root).
# - CAP_IPC_LOCK lets mlock() succeed so disable_mlock=false is honoured.
# Harmless when running as root; required if this is ever flipped to a
# dedicated `vault` user.
# - ExecStartPost auto-unseals on every boot using the persisted key.
# This is the dev-persisted-seal tradeoff — seal-key theft == vault
# theft, but no second Vault to babysit.
# - ConditionFileNotEmpty guards against starting without config — makes
# a missing vault.hcl visible in systemctl status, not a crash loop.
# - Type=notify so systemd waits for Vault's listener-ready notification
# before running ExecStartPost (ExecStartPost also has `sleep 2` as a
# belt-and-braces guard against Type=notify edge cases).
# - \$MAINPID is escaped so bash doesn't expand it inside this heredoc.
# - \$(cat ...) is escaped so the subshell runs at unit-execution time
# (inside bash -c), not at heredoc-expansion time here.
read -r -d '' DESIRED_UNIT <<EOF || true
[Unit]
Description=HashiCorp Vault
Documentation=https://developer.hashicorp.com/vault/docs
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=${VAULT_CONFIG_FILE}
StartLimitIntervalSec=60
StartLimitBurst=3
[Service]
Type=notify
User=root
Group=root
Environment=VAULT_ADDR=http://127.0.0.1:8200
SecureBits=keep-caps
CapabilityBoundingSet=CAP_IPC_LOCK
AmbientCapabilities=CAP_IPC_LOCK
ExecStart=${VAULT_BIN} server -config=${VAULT_CONFIG_FILE}
ExecStartPost=/bin/bash -c 'sleep 2 && ${VAULT_BIN} operator unseal \$(cat ${VAULT_CONFIG_DIR}/unseal.key)'
ExecReload=/bin/kill --signal HUP \$MAINPID
KillMode=process
KillSignal=SIGINT
Restart=on-failure
RestartSec=5
TimeoutStopSec=30
LimitNOFILE=65536
LimitMEMLOCK=infinity
[Install]
WantedBy=multi-user.target
EOF
# ── Ensure config + data dirs exist ──────────────────────────────────────────
# /etc/vault.d is 0755 — vault.hcl is world-readable (no secrets in it);
# the real secrets (unseal.key, root.token) get their own 0400 mode.
# /var/lib/vault/data is 0700 — vault's on-disk state (encrypted-at-rest
# by Vault itself, but an extra layer of "don't rely on that").
if [ ! -d "$VAULT_CONFIG_DIR" ]; then
log "creating ${VAULT_CONFIG_DIR}"
install -d -m 0755 -o root -g root "$VAULT_CONFIG_DIR"
fi
if [ ! -d "$VAULT_DATA_DIR" ]; then
log "creating ${VAULT_DATA_DIR}"
install -d -m 0700 -o root -g root "$VAULT_DATA_DIR"
fi
# ── Install vault.hcl only if content differs ────────────────────────────────
if [ ! -f "$VAULT_CONFIG_FILE" ] \
|| ! cmp -s "$VAULT_HCL_SRC" "$VAULT_CONFIG_FILE"; then
log "writing config → ${VAULT_CONFIG_FILE}"
install -m 0644 -o root -g root "$VAULT_HCL_SRC" "$VAULT_CONFIG_FILE"
else
log "config already up to date"
fi
# ── Install + reload + enable (shared with systemd-nomad.sh via lib-systemd) ─
systemd_install_unit "$UNIT_PATH" "vault.service" "$DESIRED_UNIT"
log "done — unit+config installed and enabled (NOT started; vault-init.sh next)"

206
lib/init/nomad/vault-init.sh Executable file
View file

@ -0,0 +1,206 @@
#!/usr/bin/env bash
# =============================================================================
# lib/init/nomad/vault-init.sh — Idempotent Vault first-run initializer
#
# Part of the Nomad+Vault migration (S0.3, issue #823). Initializes Vault
# in dev-persisted-seal mode (single unseal key on disk) and unseals once.
# On re-run, becomes a no-op — never re-initializes or rotates the key.
#
# What it does (first run):
# 1. Ensures Vault is reachable at ${VAULT_ADDR} — spawns a temporary
# `vault server -config=/etc/vault.d/vault.hcl` if not already up.
# 2. Runs `vault operator init -key-shares=1 -key-threshold=1` and
# captures the resulting unseal key + root token.
# 3. Writes /etc/vault.d/unseal.key (0400 root, no trailing newline).
# 4. Writes /etc/vault.d/root.token (0400 root, no trailing newline).
# 5. Unseals Vault once in the current process.
# 6. Shuts down the temporary server if we started one (so a subsequent
# `systemctl start vault` doesn't conflict on port 8200).
#
# Idempotency contract:
# - /etc/vault.d/unseal.key exists AND `vault status` reports
# initialized=true → exit 0, no mutation, no re-init.
# - Initialized-but-unseal.key-missing is a hard failure (can't recover
# the key without the existing storage; user must restore from backup).
#
# Bootstrap order:
# lib/init/nomad/install.sh (installs vault binary)
# lib/init/nomad/systemd-vault.sh (lands unit + config + dirs; enables)
# lib/init/nomad/vault-init.sh (this script — init + unseal once)
# systemctl start vault (ExecStartPost auto-unseals henceforth)
#
# Seal model:
# Single unseal key persisted on disk at /etc/vault.d/unseal.key. Seal-key
# theft == vault theft. Factory-dev-box-acceptable tradeoff — we avoid
# running a second Vault to auto-unseal the first.
#
# Environment:
# VAULT_ADDR — Vault API address (default: http://127.0.0.1:8200).
#
# Usage:
# sudo lib/init/nomad/vault-init.sh
#
# Exit codes:
# 0 success (initialized + unsealed + keys persisted; or already done)
# 1 precondition / operational failure
# =============================================================================
set -euo pipefail
VAULT_CONFIG_FILE="/etc/vault.d/vault.hcl"
UNSEAL_KEY_FILE="/etc/vault.d/unseal.key"
ROOT_TOKEN_FILE="/etc/vault.d/root.token"
VAULT_ADDR="${VAULT_ADDR:-http://127.0.0.1:8200}"
export VAULT_ADDR
# Track whether we spawned a temporary vault (for cleanup).
spawned_pid=""
spawned_log=""
log() { printf '[vault-init] %s\n' "$*"; }
die() { printf '[vault-init] ERROR: %s\n' "$*" >&2; exit 1; }
# ── Cleanup: stop the temporary server (if we started one) on any exit ───────
# EXIT trap fires on success AND failure AND signals — so we never leak a
# background vault process holding port 8200 after this script returns.
cleanup() {
if [ -n "$spawned_pid" ] && kill -0 "$spawned_pid" 2>/dev/null; then
log "stopping temporary vault (pid=${spawned_pid})"
kill "$spawned_pid" 2>/dev/null || true
wait "$spawned_pid" 2>/dev/null || true
fi
if [ -n "$spawned_log" ] && [ -f "$spawned_log" ]; then
rm -f "$spawned_log"
fi
}
trap cleanup EXIT
# ── Preconditions ────────────────────────────────────────────────────────────
if [ "$(id -u)" -ne 0 ]; then
die "must run as root (needs to write 0400 files under /etc/vault.d)"
fi
for bin in vault jq; do
command -v "$bin" >/dev/null 2>&1 \
|| die "required binary not found: ${bin}"
done
[ -f "$VAULT_CONFIG_FILE" ] \
|| die "config not found: ${VAULT_CONFIG_FILE} — run systemd-vault.sh first"
# ── Helpers ──────────────────────────────────────────────────────────────────
# vault_reachable — true iff `vault status` can reach the server.
# Exit codes from `vault status`:
# 0 = reachable, initialized, unsealed
# 2 = reachable, sealed (or uninitialized)
# 1 = unreachable / other error
# We treat 0 and 2 as "reachable". `|| status=$?` avoids set -e tripping
# on the expected sealed-is-also-fine case.
vault_reachable() {
local status=0
vault status -format=json >/dev/null 2>&1 || status=$?
[ "$status" -eq 0 ] || [ "$status" -eq 2 ]
}
# vault_initialized — echoes "true" / "false" / "" (empty on parse failure
# or unreachable vault). Always returns 0 so that `x="$(vault_initialized)"`
# is safe under `set -euo pipefail`.
#
# Key subtlety: `vault status` exits 2 when Vault is sealed OR uninitialized
# — the exact state we need to *observe* on first run. Without the
# `|| true` guard, pipefail + set -e inside a standalone assignment would
# propagate that exit 2 to the outer script and abort before we ever call
# `vault operator init`. We capture `vault status`'s output to a variable
# first (pipefail-safe), then feed it to jq separately.
vault_initialized() {
local out=""
out="$(vault status -format=json 2>/dev/null || true)"
[ -n "$out" ] || { printf ''; return 0; }
printf '%s' "$out" | jq -r '.initialized' 2>/dev/null || printf ''
}
# write_secret_file PATH CONTENT
# Write CONTENT to PATH atomically with 0400 root:root and no trailing
# newline. mktemp+install keeps perms tight for the whole lifetime of
# the file on disk — no 0644-then-chmod window.
write_secret_file() {
local path="$1" content="$2"
local tmp
tmp="$(mktemp)"
printf '%s' "$content" > "$tmp"
install -m 0400 -o root -g root "$tmp" "$path"
rm -f "$tmp"
}
# ── Ensure vault is reachable ────────────────────────────────────────────────
if ! vault_reachable; then
log "vault not reachable at ${VAULT_ADDR} — starting temporary server"
spawned_log="$(mktemp)"
vault server -config="$VAULT_CONFIG_FILE" >"$spawned_log" 2>&1 &
spawned_pid=$!
# Poll for readiness. Vault's API listener comes up before notify-ready
# in Type=notify mode, but well inside a few seconds even on cold boots.
ready=0
for _ in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
if vault_reachable; then
ready=1
break
fi
sleep 1
done
if [ "$ready" -ne 1 ]; then
log "vault did not become reachable within 15s — server log follows:"
if [ -f "$spawned_log" ]; then
sed 's/^/[vault-server] /' "$spawned_log" >&2 || true
fi
die "failed to start temporary vault server"
fi
log "temporary vault ready (pid=${spawned_pid})"
fi
# ── Idempotency gate ─────────────────────────────────────────────────────────
initialized="$(vault_initialized)"
if [ "$initialized" = "true" ] && [ -f "$UNSEAL_KEY_FILE" ]; then
log "vault already initialized and unseal.key present — no-op"
exit 0
fi
if [ "$initialized" = "true" ] && [ ! -f "$UNSEAL_KEY_FILE" ]; then
die "vault is initialized but ${UNSEAL_KEY_FILE} is missing — cannot recover the unseal key; restore from backup or wipe ${VAULT_CONFIG_FILE%/*}/data and re-run"
fi
if [ "$initialized" != "false" ]; then
die "unexpected initialized state: '${initialized}' (expected 'true' or 'false')"
fi
# ── Initialize ───────────────────────────────────────────────────────────────
log "initializing vault (key-shares=1, key-threshold=1)"
init_json="$(vault operator init \
-key-shares=1 \
-key-threshold=1 \
-format=json)" \
|| die "vault operator init failed"
unseal_key="$(printf '%s' "$init_json" | jq -er '.unseal_keys_b64[0]')" \
|| die "failed to extract unseal key from init response"
root_token="$(printf '%s' "$init_json" | jq -er '.root_token')" \
|| die "failed to extract root token from init response"
# Best-effort scrub of init_json from the env (the captured key+token still
# sit in the local vars above — there's no clean way to wipe bash memory).
unset init_json
# ── Persist keys ─────────────────────────────────────────────────────────────
log "writing ${UNSEAL_KEY_FILE} (0400 root)"
write_secret_file "$UNSEAL_KEY_FILE" "$unseal_key"
log "writing ${ROOT_TOKEN_FILE} (0400 root)"
write_secret_file "$ROOT_TOKEN_FILE" "$root_token"
# ── Unseal in the current process ────────────────────────────────────────────
log "unsealing vault"
vault operator unseal "$unseal_key" >/dev/null \
|| die "vault operator unseal failed"
log "done — vault initialized + unsealed + keys persisted"

View file

@ -132,6 +132,21 @@ issue_claim() {
"${FORGE_API}/issues/${issue}" \ "${FORGE_API}/issues/${issue}" \
-d "{\"assignees\":[\"${me}\"]}" >/dev/null 2>&1 || return 1 -d "{\"assignees\":[\"${me}\"]}" >/dev/null 2>&1 || return 1
# Verify the PATCH stuck. Forgejo's assignees PATCH is last-write-wins, so
# under concurrent claims from multiple dev agents two invocations can both
# see .assignee == null at the pre-check, both PATCH, and the loser's write
# gets silently overwritten (issue #830). Re-reading the assignee closes
# that TOCTOU window: only the actual winner observes its own login.
# Labels are intentionally applied AFTER this check so the losing claim
# leaves no stray "in-progress" label to roll back.
local actual
actual=$(curl -sf -H "Authorization: token ${FORGE_TOKEN}" \
"${FORGE_API}/issues/${issue}" | jq -r '.assignee.login // ""') || return 1
if [ "$actual" != "$me" ]; then
_ilc_log "issue #${issue} claim lost to ${actual:-<none>} — skipping"
return 1
fi
local ip_id bl_id local ip_id bl_id
ip_id=$(_ilc_in_progress_id) ip_id=$(_ilc_in_progress_id)
bl_id=$(_ilc_backlog_id) bl_id=$(_ilc_backlog_id)

View file

@ -1,8 +1,10 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# mirrors.sh — Push primary branch + tags to configured mirror remotes. # mirrors.sh — Mirror helpers: push to remotes + register pull mirrors via API.
# #
# Usage: source lib/mirrors.sh; mirror_push # Usage: source lib/mirrors.sh; mirror_push
# source lib/mirrors.sh; mirror_pull_register <clone_url> <owner> <repo_name> [interval]
# Requires: PROJECT_REPO_ROOT, PRIMARY_BRANCH, MIRROR_* vars from load-project.sh # Requires: PROJECT_REPO_ROOT, PRIMARY_BRANCH, MIRROR_* vars from load-project.sh
# FORGE_API_BASE, FORGE_TOKEN for pull-mirror registration
# shellcheck disable=SC2154 # globals set by load-project.sh / calling script # shellcheck disable=SC2154 # globals set by load-project.sh / calling script
@ -37,3 +39,73 @@ mirror_push() {
log "mirror: pushed to ${name} (pid $!)" log "mirror: pushed to ${name} (pid $!)"
done done
} }
# ---------------------------------------------------------------------------
# mirror_pull_register — register a Forgejo pull mirror via the /repos/migrate API.
#
# Creates a new repo as a pull mirror of an external source. Works against
# empty target repos (the repo is created by the API call itself).
#
# Usage:
# mirror_pull_register <clone_url> <owner> <repo_name> [interval]
#
# Args:
# clone_url — HTTPS URL of the source repo (e.g. https://codeberg.org/johba/disinto.git)
# owner — Forgejo org or user that will own the mirror repo
# repo_name — name of the new mirror repo on Forgejo
# interval — sync interval (default: "8h0m0s"; Forgejo duration format)
#
# Requires:
# FORGE_API_BASE, FORGE_TOKEN (from env.sh)
#
# Returns 0 on success, 1 on failure. Prints the new repo JSON to stdout.
# ---------------------------------------------------------------------------
mirror_pull_register() {
local clone_url="$1"
local owner="$2"
local repo_name="$3"
local interval="${4:-8h0m0s}"
if [ -z "${FORGE_API_BASE:-}" ] || [ -z "${FORGE_TOKEN:-}" ]; then
echo "ERROR: FORGE_API_BASE and FORGE_TOKEN must be set" >&2
return 1
fi
if [ -z "$clone_url" ] || [ -z "$owner" ] || [ -z "$repo_name" ]; then
echo "Usage: mirror_pull_register <clone_url> <owner> <repo_name> [interval]" >&2
return 1
fi
local payload
payload=$(jq -n \
--arg clone_addr "$clone_url" \
--arg repo_name "$repo_name" \
--arg repo_owner "$owner" \
--arg interval "$interval" \
'{
clone_addr: $clone_addr,
repo_name: $repo_name,
repo_owner: $repo_owner,
mirror: true,
mirror_interval: $interval,
service: "git"
}')
local http_code body
body=$(curl -s -w "\n%{http_code}" -X POST \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API_BASE}/repos/migrate" \
-d "$payload")
http_code=$(printf '%s' "$body" | tail -n1)
body=$(printf '%s' "$body" | sed '$d')
if [ "$http_code" -ge 200 ] && [ "$http_code" -lt 300 ]; then
printf '%s\n' "$body"
return 0
else
echo "ERROR: mirror_pull_register failed (HTTP ${http_code}): ${body}" >&2
return 1
fi
}

View file

@ -18,8 +18,8 @@
# ============================================================================= # =============================================================================
set -euo pipefail set -euo pipefail
# Source vault.sh for _vault_log helper # Source action-vault.sh for _vault_log helper
source "${FACTORY_ROOT}/lib/vault.sh" source "${FACTORY_ROOT}/lib/action-vault.sh"
# Assert required globals are set before using this module. # Assert required globals are set before using this module.
_assert_release_globals() { _assert_release_globals() {

View file

@ -30,9 +30,10 @@ _SECRET_PATTERNS=(
_SAFE_PATTERNS=( _SAFE_PATTERNS=(
# Shell variable references: $VAR, ${VAR}, ${VAR:-default} # Shell variable references: $VAR, ${VAR}, ${VAR:-default}
'\$\{?[A-Z_]+\}?' '\$\{?[A-Z_]+\}?'
# Git SHAs in typical git contexts (commit refs, not standalone secrets) # Git SHAs in typical git contexts (commit refs, watermarks, not standalone secrets)
'commit [0-9a-f]{40}' 'commit [0-9a-f]{40}'
'Merge [0-9a-f]{40}' 'Merge [0-9a-f]{40}'
'last-reviewed: [0-9a-f]{40}'
# Forge/GitHub URLs with short hex (PR refs, commit links) # Forge/GitHub URLs with short hex (PR refs, commit links)
'codeberg\.org/[^[:space:]]+' 'codeberg\.org/[^[:space:]]+'
'localhost:3000/[^[:space:]]+' 'localhost:3000/[^[:space:]]+'

585
lib/sprint-filer.sh Executable file
View file

@ -0,0 +1,585 @@
#!/usr/bin/env bash
# =============================================================================
# sprint-filer.sh — Parse merged sprint PRs and file sub-issues via filer-bot
#
# Invoked by the ops-filer Woodpecker pipeline after a sprint PR merges on the
# ops repo main branch. Parses each sprints/*.md file for a structured
# ## Sub-issues block (filer:begin/end markers), then creates idempotent
# Forgejo issues on the project repo using FORGE_FILER_TOKEN.
#
# Permission model (#764):
# filer-bot has issues:write on the project repo.
# architect-bot is read-only on the project repo.
#
# Usage:
# sprint-filer.sh <sprint-file.md> — file sub-issues from one sprint
# sprint-filer.sh --all <sprints-dir> — scan all sprint files in dir
#
# Environment:
# FORGE_FILER_TOKEN — filer-bot API token (issues:write on project repo)
# FORGE_API — project repo API base (e.g. http://forgejo:3000/api/v1/repos/org/repo)
# FORGE_API_BASE — API base URL (e.g. http://forgejo:3000/api/v1)
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# Source env.sh only if not already loaded (allows standalone + sourced use)
if [ -z "${FACTORY_ROOT:-}" ]; then
FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
# shellcheck source=env.sh
source "$SCRIPT_DIR/env.sh"
fi
# ── Logging ──────────────────────────────────────────────────────────────
LOG_AGENT="${LOG_AGENT:-filer}"
filer_log() {
printf '[%s] %s: %s\n' "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" "$LOG_AGENT" "$*" >&2
}
# ── Validate required environment ────────────────────────────────────────
: "${FORGE_FILER_TOKEN:?sprint-filer.sh requires FORGE_FILER_TOKEN}"
: "${FORGE_API:?sprint-filer.sh requires FORGE_API}"
# ── Paginated Forgejo API fetch ──────────────────────────────────────────
# Reuses forge_api_all from lib/env.sh with FORGE_FILER_TOKEN.
# Args: api_path (e.g. /issues?state=all&type=issues)
# Output: merged JSON array to stdout
filer_api_all() { forge_api_all "$1" "$FORGE_FILER_TOKEN"; }
# ── Parse sub-issues block from a sprint markdown file ───────────────────
# Extracts the YAML-in-markdown between <!-- filer:begin --> and <!-- filer:end -->
# Args: sprint_file_path
# Output: the raw sub-issues block (YAML lines) to stdout
# Returns: 0 if block found, 1 if not found or malformed
parse_subissues_block() {
local sprint_file="$1"
if [ ! -f "$sprint_file" ]; then
filer_log "ERROR: sprint file not found: ${sprint_file}"
return 1
fi
local in_block=false
local block=""
local found=false
while IFS= read -r line; do
if [[ "$line" == *"<!-- filer:begin -->"* ]]; then
in_block=true
found=true
continue
fi
if [[ "$line" == *"<!-- filer:end -->"* ]]; then
in_block=false
continue
fi
if [ "$in_block" = true ]; then
block+="${line}"$'\n'
fi
done < "$sprint_file"
if [ "$found" = false ]; then
filer_log "No filer:begin/end block found in ${sprint_file}"
return 1
fi
if [ "$in_block" = true ]; then
filer_log "ERROR: malformed sub-issues block in ${sprint_file} — filer:begin without filer:end"
return 1
fi
if [ -z "$block" ]; then
filer_log "WARNING: empty sub-issues block in ${sprint_file}"
return 1
fi
printf '%s' "$block"
}
# ── Extract vision issue number from sprint file ─────────────────────────
# Looks for "#N" references specifically in the "## Vision issues" section
# to avoid picking up cross-links or related-issue mentions earlier in the file.
# Falls back to first #N in the file if no "## Vision issues" section found.
# Args: sprint_file_path
# Output: first vision issue number found
extract_vision_issue() {
local sprint_file="$1"
# Try to extract from "## Vision issues" section first
local in_section=false
local result=""
while IFS= read -r line; do
if [[ "$line" =~ ^##[[:space:]]+Vision[[:space:]]+issues ]]; then
in_section=true
continue
fi
# Stop at next heading
if [ "$in_section" = true ] && [[ "$line" =~ ^## ]]; then
break
fi
if [ "$in_section" = true ]; then
result=$(printf '%s' "$line" | grep -oE '#[0-9]+' | head -1 | tr -d '#')
if [ -n "$result" ]; then
printf '%s' "$result"
return 0
fi
fi
done < "$sprint_file"
# Fallback: first #N in the entire file
grep -oE '#[0-9]+' "$sprint_file" | head -1 | tr -d '#'
}
# ── Extract sprint slug from file path ───────────────────────────────────
# Args: sprint_file_path
# Output: slug (filename without .md)
extract_sprint_slug() {
local sprint_file="$1"
basename "$sprint_file" .md
}
# ── Parse individual sub-issue entries from the block ────────────────────
# The block is a simple YAML-like format:
# - id: foo
# title: "..."
# labels: [backlog, priority]
# depends_on: [bar]
# body: |
# multi-line body
#
# Args: raw_block (via stdin)
# Output: JSON array of sub-issue objects
parse_subissue_entries() {
local block
block=$(cat)
# Use awk to parse the YAML-like structure into JSON
printf '%s' "$block" | awk '
BEGIN {
printf "["
first = 1
inbody = 0
id = ""; title = ""; labels = ""; depends = ""; body = ""
}
function flush_entry() {
if (id == "") return
if (!first) printf ","
first = 0
# Escape JSON special characters in body
gsub(/\\/, "\\\\", body)
gsub(/"/, "\\\"", body)
gsub(/\t/, "\\t", body)
# Replace newlines with \n for JSON
gsub(/\n/, "\\n", body)
# Remove trailing \n
sub(/\\n$/, "", body)
# Clean up title (remove surrounding quotes)
gsub(/^"/, "", title)
gsub(/"$/, "", title)
printf "{\"id\":\"%s\",\"title\":\"%s\",\"labels\":%s,\"depends_on\":%s,\"body\":\"%s\"}", id, title, labels, depends, body
id = ""; title = ""; labels = "[]"; depends = "[]"; body = ""
inbody = 0
}
/^- id:/ {
flush_entry()
sub(/^- id: */, "")
id = $0
labels = "[]"
depends = "[]"
next
}
/^ title:/ {
sub(/^ title: */, "")
title = $0
# Remove surrounding quotes
gsub(/^"/, "", title)
gsub(/"$/, "", title)
next
}
/^ labels:/ {
sub(/^ labels: */, "")
# Convert [a, b] to JSON array ["a","b"]
gsub(/\[/, "", $0)
gsub(/\]/, "", $0)
n = split($0, arr, /, */)
labels = "["
for (i = 1; i <= n; i++) {
gsub(/^ */, "", arr[i])
gsub(/ *$/, "", arr[i])
if (arr[i] != "") {
if (i > 1) labels = labels ","
labels = labels "\"" arr[i] "\""
}
}
labels = labels "]"
next
}
/^ depends_on:/ {
sub(/^ depends_on: */, "")
gsub(/\[/, "", $0)
gsub(/\]/, "", $0)
n = split($0, arr, /, */)
depends = "["
for (i = 1; i <= n; i++) {
gsub(/^ */, "", arr[i])
gsub(/ *$/, "", arr[i])
if (arr[i] != "") {
if (i > 1) depends = depends ","
depends = depends "\"" arr[i] "\""
}
}
depends = depends "]"
next
}
/^ body: *\|/ {
inbody = 1
body = ""
next
}
inbody && /^ / {
sub(/^ /, "")
body = body $0 "\n"
next
}
inbody && !/^ / && !/^$/ {
inbody = 0
# This line starts a new field or entry — re-process it
# (awk does not support re-scanning, so handle common cases)
if ($0 ~ /^- id:/) {
flush_entry()
sub(/^- id: */, "")
id = $0
labels = "[]"
depends = "[]"
}
}
END {
flush_entry()
printf "]"
}
'
}
# ── Check if sub-issue already exists (idempotency) ─────────────────────
# Searches for the decomposed-from marker in existing issues.
# Args: vision_issue_number sprint_slug subissue_id
# Returns: 0 if already exists, 1 if not
subissue_exists() {
local vision_issue="$1"
local sprint_slug="$2"
local subissue_id="$3"
local marker="<!-- decomposed-from: #${vision_issue}, sprint: ${sprint_slug}, id: ${subissue_id} -->"
# Search all issues (paginated) for the exact marker
local issues_json
issues_json=$(filer_api_all "/issues?state=all&type=issues")
if printf '%s' "$issues_json" | jq -e --arg marker "$marker" \
'[.[] | select(.body // "" | contains($marker))] | length > 0' >/dev/null 2>&1; then
return 0 # Already exists
fi
return 1 # Does not exist
}
# ── Resolve label names to IDs ───────────────────────────────────────────
# Args: label_names_json (JSON array of strings)
# Output: JSON array of label IDs
resolve_label_ids() {
local label_names_json="$1"
# Fetch all labels from project repo
local all_labels
all_labels=$(curl -sf -H "Authorization: token ${FORGE_FILER_TOKEN}" \
"${FORGE_API}/labels" 2>/dev/null) || all_labels="[]"
# Map names to IDs
printf '%s' "$label_names_json" | jq -r '.[]' | while IFS= read -r label_name; do
[ -z "$label_name" ] && continue
printf '%s' "$all_labels" | jq -r --arg name "$label_name" \
'.[] | select(.name == $name) | .id' 2>/dev/null
done | jq -Rs 'split("\n") | map(select(. != "") | tonumber)'
}
# ── Add in-progress label to vision issue ────────────────────────────────
# Args: vision_issue_number
add_inprogress_label() {
local issue_num="$1"
local labels_json
labels_json=$(curl -sf -H "Authorization: token ${FORGE_FILER_TOKEN}" \
"${FORGE_API}/labels" 2>/dev/null) || return 1
local label_id
label_id=$(printf '%s' "$labels_json" | jq -r '.[] | select(.name == "in-progress") | .id' 2>/dev/null) || true
if [ -z "$label_id" ]; then
filer_log "WARNING: in-progress label not found"
return 1
fi
if curl -sf -X POST \
-H "Authorization: token ${FORGE_FILER_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/${issue_num}/labels" \
-d "{\"labels\": [${label_id}]}" >/dev/null 2>&1; then
filer_log "Added in-progress label to vision issue #${issue_num}"
return 0
else
filer_log "WARNING: failed to add in-progress label to vision issue #${issue_num}"
return 1
fi
}
# ── File sub-issues from a sprint file ───────────────────────────────────
# This is the main entry point. Parses the sprint file, extracts sub-issues,
# and creates them idempotently via the Forgejo API.
# Args: sprint_file_path
# Returns: 0 on success, 1 on any error (fail-fast)
file_subissues() {
local sprint_file="$1"
filer_log "Processing sprint file: ${sprint_file}"
# Extract metadata
local vision_issue sprint_slug
vision_issue=$(extract_vision_issue "$sprint_file")
sprint_slug=$(extract_sprint_slug "$sprint_file")
if [ -z "$vision_issue" ]; then
filer_log "ERROR: could not extract vision issue number from ${sprint_file}"
return 1
fi
filer_log "Vision issue: #${vision_issue}, sprint slug: ${sprint_slug}"
# Parse the sub-issues block
local raw_block
raw_block=$(parse_subissues_block "$sprint_file") || return 1
# Parse individual entries
local entries_json
entries_json=$(printf '%s' "$raw_block" | parse_subissue_entries)
# Validate parsing produced valid JSON
if ! printf '%s' "$entries_json" | jq empty 2>/dev/null; then
filer_log "ERROR: failed to parse sub-issues block as valid JSON in ${sprint_file}"
return 1
fi
local entry_count
entry_count=$(printf '%s' "$entries_json" | jq 'length')
if [ "$entry_count" -eq 0 ]; then
filer_log "WARNING: no sub-issue entries found in ${sprint_file}"
return 1
fi
filer_log "Found ${entry_count} sub-issue(s) to file"
# File each sub-issue (fail-fast on first error)
local filed_count=0
local i=0
while [ "$i" -lt "$entry_count" ]; do
local entry
entry=$(printf '%s' "$entries_json" | jq ".[$i]")
local subissue_id subissue_title subissue_body labels_json
subissue_id=$(printf '%s' "$entry" | jq -r '.id')
subissue_title=$(printf '%s' "$entry" | jq -r '.title')
subissue_body=$(printf '%s' "$entry" | jq -r '.body')
labels_json=$(printf '%s' "$entry" | jq -c '.labels')
if [ -z "$subissue_id" ] || [ "$subissue_id" = "null" ]; then
filer_log "ERROR: sub-issue entry at index ${i} has no id — aborting"
return 1
fi
if [ -z "$subissue_title" ] || [ "$subissue_title" = "null" ]; then
filer_log "ERROR: sub-issue '${subissue_id}' has no title — aborting"
return 1
fi
# Idempotency check
if subissue_exists "$vision_issue" "$sprint_slug" "$subissue_id"; then
filer_log "Sub-issue '${subissue_id}' already exists — skipping"
i=$((i + 1))
continue
fi
# Append decomposed-from marker to body
local marker="<!-- decomposed-from: #${vision_issue}, sprint: ${sprint_slug}, id: ${subissue_id} -->"
local full_body="${subissue_body}
${marker}"
# Resolve label names to IDs
local label_ids
label_ids=$(resolve_label_ids "$labels_json")
# Build issue payload using jq for safe JSON construction
local payload
payload=$(jq -n \
--arg title "$subissue_title" \
--arg body "$full_body" \
--argjson labels "$label_ids" \
'{title: $title, body: $body, labels: $labels}')
# Create the issue
local response
response=$(curl -sf -X POST \
-H "Authorization: token ${FORGE_FILER_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues" \
-d "$payload" 2>/dev/null) || {
filer_log "ERROR: failed to create sub-issue '${subissue_id}' — aborting (${filed_count}/${entry_count} filed so far)"
return 1
}
local new_issue_num
new_issue_num=$(printf '%s' "$response" | jq -r '.number // empty')
filer_log "Filed sub-issue '${subissue_id}' as #${new_issue_num}: ${subissue_title}"
filed_count=$((filed_count + 1))
i=$((i + 1))
done
# Add in-progress label to the vision issue
add_inprogress_label "$vision_issue" || true
filer_log "Successfully filed ${filed_count}/${entry_count} sub-issue(s) for sprint ${sprint_slug}"
return 0
}
# ── Vision lifecycle: close completed vision issues ──────────────────────
# Checks open vision issues and closes any whose sub-issues are all closed.
# Uses the decomposed-from marker to find sub-issues.
check_and_close_completed_visions() {
filer_log "Checking for vision issues with all sub-issues complete..."
local vision_issues_json
vision_issues_json=$(filer_api_all "/issues?labels=vision&state=open")
if [ "$vision_issues_json" = "[]" ] || [ "$vision_issues_json" = "null" ]; then
filer_log "No open vision issues found"
return 0
fi
local all_issues
all_issues=$(filer_api_all "/issues?state=all&type=issues")
local vision_nums
vision_nums=$(printf '%s' "$vision_issues_json" | jq -r '.[].number' 2>/dev/null) || return 0
local closed_count=0
while IFS= read -r vid; do
[ -z "$vid" ] && continue
# Find sub-issues with decomposed-from marker for this vision
local sub_issues
sub_issues=$(printf '%s' "$all_issues" | jq --arg vid "$vid" \
'[.[] | select(.body // "" | contains("<!-- decomposed-from: #" + $vid))]')
local sub_count
sub_count=$(printf '%s' "$sub_issues" | jq 'length')
# No sub-issues means not ready to close
[ "$sub_count" -eq 0 ] && continue
# Check if all are closed
local open_count
open_count=$(printf '%s' "$sub_issues" | jq '[.[] | select(.state != "closed")] | length')
if [ "$open_count" -gt 0 ]; then
continue
fi
# All sub-issues closed — close the vision issue
filer_log "All ${sub_count} sub-issues for vision #${vid} are closed — closing vision"
local comment_body
comment_body="## Vision Issue Completed
All sub-issues have been implemented and merged. This vision issue is now closed.
---
*Automated closure by filer-bot · $(date -u '+%Y-%m-%d %H:%M UTC')*"
local comment_payload
comment_payload=$(jq -n --arg body "$comment_body" '{body: $body}')
curl -sf -X POST \
-H "Authorization: token ${FORGE_FILER_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/${vid}/comments" \
-d "$comment_payload" >/dev/null 2>&1 || true
curl -sf -X PATCH \
-H "Authorization: token ${FORGE_FILER_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}/issues/${vid}" \
-d '{"state":"closed"}' >/dev/null 2>&1 || true
closed_count=$((closed_count + 1))
done <<< "$vision_nums"
if [ "$closed_count" -gt 0 ]; then
filer_log "Closed ${closed_count} vision issue(s)"
fi
}
# ── Main ─────────────────────────────────────────────────────────────────
main() {
if [ "${1:-}" = "--all" ]; then
local sprints_dir="${2:?Usage: sprint-filer.sh --all <sprints-dir>}"
local exit_code=0
for sprint_file in "${sprints_dir}"/*.md; do
[ -f "$sprint_file" ] || continue
# Only process files with filer:begin markers
if ! grep -q '<!-- filer:begin -->' "$sprint_file"; then
continue
fi
if ! file_subissues "$sprint_file"; then
filer_log "ERROR: failed to process ${sprint_file}"
exit_code=1
fi
done
# Run vision lifecycle check after filing
check_and_close_completed_visions || true
return "$exit_code"
elif [ -n "${1:-}" ]; then
file_subissues "$1"
# Run vision lifecycle check after filing
check_and_close_completed_visions || true
else
echo "Usage: sprint-filer.sh <sprint-file.md>" >&2
echo " sprint-filer.sh --all <sprints-dir>" >&2
return 1
fi
}
# Run main only when executed directly (not when sourced for testing)
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
main "$@"
fi

93
nomad/AGENTS.md Normal file
View file

@ -0,0 +1,93 @@
<!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# nomad/ — Agent Instructions
Nomad + Vault HCL for the factory's single-node cluster. These files are
the source of truth that `lib/init/nomad/cluster-up.sh` copies onto a
factory box under `/etc/nomad.d/` and `/etc/vault.d/` at init time.
This directory is part of the **Nomad+Vault migration (Step 0)**
see issues #821#825 for the step breakdown. Jobspecs land in Step 1.
## What lives here
| File | Deployed to | Owned by |
|---|---|---|
| `server.hcl` | `/etc/nomad.d/server.hcl` | agent role, bind, ports, `data_dir` (S0.2) |
| `client.hcl` | `/etc/nomad.d/client.hcl` | Docker driver cfg + `host_volume` declarations (S0.2) |
| `vault.hcl` | `/etc/vault.d/vault.hcl` | Vault storage, listener, UI, `disable_mlock` (S0.3) |
Nomad auto-merges every `*.hcl` under `-config=/etc/nomad.d/`, so the
split between `server.hcl` and `client.hcl` is for readability, not
semantics. The top-of-file header in each config documents which blocks
it owns.
## What does NOT live here yet
- **Jobspecs.** Step 0 brings up an *empty* cluster. Step 1 (and later)
adds `*.nomad.hcl` job files for forgejo, woodpecker, agents, caddy,
etc. When that lands, jobspecs will live in `nomad/jobs/` and each
will get its own header comment pointing to the `host_volume` names
it consumes (`volume = "forgejo-data"`, etc. — declared in
`client.hcl`).
- **TLS, ACLs, gossip encryption.** Deliberately absent in Step 0 —
factory traffic stays on localhost. These land in later migration
steps alongside multi-node support.
## Adding a jobspec (Step 1 and later)
1. Drop a file in `nomad/jobs/<service>.nomad.hcl`.
2. If it needs persistent state, reference a `host_volume` already
declared in `client.hcl`*don't* add ad-hoc host paths in the
jobspec. If a new volume is needed, add it to **both**:
- `nomad/client.hcl` — the `host_volume "<name>" { path = … }` block
- `lib/init/nomad/cluster-up.sh` — the `HOST_VOLUME_DIRS` array
The two must stay in sync or nomad fingerprinting will fail and the
node stays in "initializing".
3. Pin image tags — `image = "forgejo/forgejo:1.22.5"`, not `:latest`.
4. Add the jobspec path to `.woodpecker/nomad-validate.yml`'s trigger
list so CI validates it.
## How CI validates these files
`.woodpecker/nomad-validate.yml` runs on every PR that touches `nomad/`,
`lib/init/nomad/`, or `bin/disinto`. Four fail-closed steps:
1. **`nomad config validate nomad/server.hcl nomad/client.hcl`**
— parses the HCL, fails on unknown blocks, bad port ranges, invalid
driver config. Vault HCL is excluded (different tool).
2. **`vault operator diagnose -config=nomad/vault.hcl -skip=storage -skip=listener`**
— Vault's equivalent syntax + schema check. `-skip=storage/listener`
disables the runtime checks (CI containers don't have
`/var/lib/vault/data` or port 8200).
3. **`shellcheck --severity=warning lib/init/nomad/*.sh bin/disinto`**
— all init/dispatcher shell clean. `bin/disinto` has no `.sh`
extension so the repo-wide shellcheck in `.woodpecker/ci.yml` skips
it — this is the one place it gets checked.
4. **`bats tests/disinto-init-nomad.bats`**
— exercises the dispatcher: `disinto init --backend=nomad --dry-run`,
`… --empty --dry-run`, and the `--backend=docker` regression guard.
If a PR breaks `nomad/server.hcl` (e.g. typo in a block name), step 1
fails with a clear error; the fix makes it pass. PRs that don't touch
any of the trigger paths skip this pipeline entirely.
## Version pinning
Nomad + Vault versions are pinned in **two** places — bumping one
without the other is a CI-caught drift:
- `lib/init/nomad/install.sh` — the apt-installed versions on factory
boxes (`NOMAD_VERSION`, `VAULT_VERSION`).
- `.woodpecker/nomad-validate.yml` — the `hashicorp/nomad:…` and
`hashicorp/vault:…` image tags used for static validation.
Bump both in the same PR. The CI pipeline will fail if the pinned
image's `config validate` rejects syntax the installed runtime would
accept (or vice versa).
## Related
- `lib/init/nomad/` — installer + systemd units + cluster-up orchestrator.
- `.woodpecker/nomad-validate.yml` — this directory's CI pipeline.
- Top-of-file headers in `server.hcl` / `client.hcl` / `vault.hcl`
document the per-file ownership contract.

88
nomad/client.hcl Normal file
View file

@ -0,0 +1,88 @@
# =============================================================================
# nomad/client.hcl Docker driver + host_volume declarations
#
# Part of the Nomad+Vault migration (S0.2, issue #822). Deployed to
# /etc/nomad.d/client.hcl on the factory dev box alongside server.hcl.
#
# This file owns: Docker driver plugin config + host_volume pre-wiring.
# server.hcl owns: agent role, bind, ports, data_dir.
#
# NOTE: Nomad merges every *.hcl under -config=/etc/nomad.d, so declaring
# a second `client { ... }` block here augments (not replaces) the one in
# server.hcl. On a single-node setup this file could be inlined into
# server.hcl the split is for readability, not semantics.
#
# host_volume declarations let Nomad jobspecs mount factory state by name
# (volume = "forgejo-data", etc.) without coupling host paths into jobspec
# HCL. Host paths under /srv/disinto/* are created out-of-band by the
# orchestrator (S0.4) before any job references them.
# =============================================================================
client {
# forgejo git server data (repos, avatars, attachments).
host_volume "forgejo-data" {
path = "/srv/disinto/forgejo-data"
read_only = false
}
# woodpecker CI data (pipeline artifacts, sqlite db).
host_volume "woodpecker-data" {
path = "/srv/disinto/woodpecker-data"
read_only = false
}
# agent runtime data (claude config, logs, phase files).
host_volume "agent-data" {
path = "/srv/disinto/agent-data"
read_only = false
}
# per-project git clones and worktrees.
host_volume "project-repos" {
path = "/srv/disinto/project-repos"
read_only = false
}
# caddy config + ACME state.
host_volume "caddy-data" {
path = "/srv/disinto/caddy-data"
read_only = false
}
# disinto chat transcripts + attachments.
host_volume "chat-history" {
path = "/srv/disinto/chat-history"
read_only = false
}
# ops repo clone (vault actions, sprint artifacts, knowledge).
host_volume "ops-repo" {
path = "/srv/disinto/ops-repo"
read_only = false
}
}
# Docker task driver. `volumes.enabled = true` is required so jobspecs
# can mount host_volume declarations defined above. `allow_privileged`
# stays false no factory workload needs privileged containers today,
# and flipping it is an audit-worthy change.
plugin "docker" {
config {
allow_privileged = false
volumes {
enabled = true
}
# Leave images behind when jobs stop, so short job churn doesn't thrash
# the image cache. Factory disk is not constrained; `docker system prune`
# is the escape hatch.
gc {
image = false
container = true
dangling_containers {
enabled = true
}
}
}
}

53
nomad/server.hcl Normal file
View file

@ -0,0 +1,53 @@
# =============================================================================
# nomad/server.hcl Single-node combined server+client configuration
#
# Part of the Nomad+Vault migration (S0.2, issue #822). Deployed to
# /etc/nomad.d/server.hcl on the factory dev box alongside client.hcl.
#
# This file owns: agent role, ports, bind, data directory.
# client.hcl owns: Docker driver plugin config + host_volume declarations.
#
# NOTE: On single-node setups these two files could be merged into one
# (Nomad auto-merges every *.hcl under -config=/etc/nomad.d). The split is
# purely for readability role/bind/port vs. plugin/volume wiring.
#
# This is a factory dev-box baseline TLS, ACLs, gossip encryption, and
# consul/vault integration are deliberately absent and land in later steps.
# =============================================================================
data_dir = "/var/lib/nomad"
bind_addr = "127.0.0.1"
log_level = "INFO"
# All Nomad agent traffic stays on localhost the factory box does not
# federate with peers. Ports are the Nomad defaults, pinned here so that
# future changes to these numbers are a visible diff.
ports {
http = 4646
rpc = 4647
serf = 4648
}
# Single-node combined mode: this agent is both the only server and the
# only client. bootstrap_expect=1 makes the server quorum-of-one.
server {
enabled = true
bootstrap_expect = 1
}
client {
enabled = true
}
# Advertise localhost to self to avoid surprises if the default IP
# autodetection picks a transient interface (e.g. docker0, wg0).
advertise {
http = "127.0.0.1"
rpc = "127.0.0.1"
serf = "127.0.0.1"
}
# UI on by default same bind as http, no TLS (localhost only).
ui {
enabled = true
}

41
nomad/vault.hcl Normal file
View file

@ -0,0 +1,41 @@
# =============================================================================
# nomad/vault.hcl Single-node Vault configuration (dev-persisted seal)
#
# Part of the Nomad+Vault migration (S0.3, issue #823). Deployed to
# /etc/vault.d/vault.hcl on the factory dev box.
#
# Seal model: the single unseal key lives on disk at /etc/vault.d/unseal.key
# (0400 root) and is read by systemd ExecStartPost on every boot. This is
# the factory-dev-box-acceptable tradeoff seal-key theft equals vault
# theft, but we avoid running a second Vault to auto-unseal the first.
#
# This is a factory dev-box baseline TLS, HA, Raft storage, and audit
# devices are deliberately absent. Storage is the `file` backend (single
# node only). Listener is localhost-only, so no external TLS is needed.
# =============================================================================
# File storage backend single-node only, no HA, no raft. State lives in
# /var/lib/vault/data which is created (root:root 0700) by
# lib/init/nomad/systemd-vault.sh before the unit starts.
storage "file" {
path = "/var/lib/vault/data"
}
# Localhost-only listener. TLS is disabled because all callers are on the
# same box — flipping this to tls_disable=false is an audit-worthy change
# paired with cert provisioning.
listener "tcp" {
address = "127.0.0.1:8200"
tls_disable = true
}
# mlock prevents Vault's in-memory secrets from being swapped to disk. We
# keep it enabled; the systemd unit grants CAP_IPC_LOCK so mlock() succeeds.
disable_mlock = false
# Advertised API address used by Vault clients on this host. Matches
# the listener above.
api_addr = "http://127.0.0.1:8200"
# UI on by default same bind as listener, no TLS (localhost only).
ui = true

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: c4ca1e930d7be3f95060971ce4fa949dab2f76e7 --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Planner Agent # Planner Agent
**Role**: Strategic planning using a Prerequisite Tree (Theory of Constraints), **Role**: Strategic planning using a Prerequisite Tree (Theory of Constraints),
@ -34,7 +34,9 @@ will then sections) and marks the prerequisite as blocked-on-vault in the tree.
Deduplication: checks pending/ + approved/ + fired/ before creating. Deduplication: checks pending/ + approved/ + fired/ before creating.
Phase 4 (journal-and-memory): write updated prerequisite tree + daily journal Phase 4 (journal-and-memory): write updated prerequisite tree + daily journal
entry (committed to ops repo) and update `$OPS_REPO_ROOT/knowledge/planner-memory.md`. entry (committed to ops repo) and update `$OPS_REPO_ROOT/knowledge/planner-memory.md`.
Phase 5 (commit-ops): commit all ops repo changes, push directly. Phase 5 (commit-ops): commit all ops repo changes to a `planner/run-YYYY-MM-DD`
branch, then create a PR and walk it to merge via review-bot (`pr_create`
`pr_walk_to_merge`), mirroring the architect's ops flow. No direct push to main.
AGENTS.md maintenance is handled by the Gardener. AGENTS.md maintenance is handled by the Gardener.
**Artifacts use `$OPS_REPO_ROOT`**: All planner artifacts (journal, **Artifacts use `$OPS_REPO_ROOT`**: All planner artifacts (journal,
@ -55,7 +57,7 @@ nervous system component, not work.
creates tmux session, injects formula prompt, monitors phase file, handles crash recovery, cleans up creates tmux session, injects formula prompt, monitors phase file, handles crash recovery, cleans up
- `formulas/run-planner.toml` — Execution spec: six steps (preflight, - `formulas/run-planner.toml` — Execution spec: six steps (preflight,
prediction-triage, update-prerequisite-tree, file-at-constraints, prediction-triage, update-prerequisite-tree, file-at-constraints,
journal-and-memory, commit-and-pr) with `needs` dependencies. Claude journal-and-memory, commit-ops-changes) with `needs` dependencies. Claude
executes all steps in a single interactive session with tool access executes all steps in a single interactive session with tool access
- `formulas/groom-backlog.toml` — Grooming formula for backlog triage and - `formulas/groom-backlog.toml` — Grooming formula for backlog triage and
grooming. (Note: the planner no longer dispatches breakdown mode — complex grooming. (Note: the planner no longer dispatches breakdown mode — complex

View file

@ -10,7 +10,9 @@
# 2. Load formula (formulas/run-planner.toml) # 2. Load formula (formulas/run-planner.toml)
# 3. Context: VISION.md, AGENTS.md, ops:RESOURCES.md, structural graph, # 3. Context: VISION.md, AGENTS.md, ops:RESOURCES.md, structural graph,
# planner memory, journal entries # planner memory, journal entries
# 4. agent_run(worktree, prompt) → Claude plans, may push knowledge updates # 4. Create ops branch planner/run-YYYY-MM-DD for changes
# 5. agent_run(worktree, prompt) → Claude plans, commits to ops branch
# 6. If ops branch has commits: pr_create → pr_walk_to_merge (review-bot)
# #
# Usage: # Usage:
# planner-run.sh [projects/disinto.toml] # project config (default: disinto) # planner-run.sh [projects/disinto.toml] # project config (default: disinto)
@ -22,10 +24,11 @@ FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
# Accept project config from argument; default to disinto (planner is disinto infrastructure) # Accept project config from argument; default to disinto (planner is disinto infrastructure)
export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}" export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}"
# Set override BEFORE sourcing env.sh so it survives any later re-source of
# env.sh from nested shells / claude -p tools (#762, #747)
export FORGE_TOKEN_OVERRIDE="${FORGE_PLANNER_TOKEN:-}"
# shellcheck source=../lib/env.sh # shellcheck source=../lib/env.sh
source "$FACTORY_ROOT/lib/env.sh" source "$FACTORY_ROOT/lib/env.sh"
# Use planner-bot's own Forgejo identity (#747)
FORGE_TOKEN="${FORGE_PLANNER_TOKEN:-${FORGE_TOKEN}}"
# shellcheck source=../lib/formula-session.sh # shellcheck source=../lib/formula-session.sh
source "$FACTORY_ROOT/lib/formula-session.sh" source "$FACTORY_ROOT/lib/formula-session.sh"
# shellcheck source=../lib/worktree.sh # shellcheck source=../lib/worktree.sh
@ -34,6 +37,10 @@ source "$FACTORY_ROOT/lib/worktree.sh"
source "$FACTORY_ROOT/lib/guard.sh" source "$FACTORY_ROOT/lib/guard.sh"
# shellcheck source=../lib/agent-sdk.sh # shellcheck source=../lib/agent-sdk.sh
source "$FACTORY_ROOT/lib/agent-sdk.sh" source "$FACTORY_ROOT/lib/agent-sdk.sh"
# shellcheck source=../lib/ci-helpers.sh
source "$FACTORY_ROOT/lib/ci-helpers.sh"
# shellcheck source=../lib/pr-lifecycle.sh
source "$FACTORY_ROOT/lib/pr-lifecycle.sh"
LOG_FILE="${DISINTO_LOG_DIR}/planner/planner.log" LOG_FILE="${DISINTO_LOG_DIR}/planner/planner.log"
# shellcheck disable=SC2034 # consumed by agent-sdk.sh # shellcheck disable=SC2034 # consumed by agent-sdk.sh
@ -145,12 +152,69 @@ ${PROMPT_FOOTER}"
# ── Create worktree ────────────────────────────────────────────────────── # ── Create worktree ──────────────────────────────────────────────────────
formula_worktree_setup "$WORKTREE" formula_worktree_setup "$WORKTREE"
# ── Prepare ops branch for PR-based merge (#765) ────────────────────────
PLANNER_OPS_BRANCH="planner/run-$(date -u +%Y-%m-%d)"
(
cd "$OPS_REPO_ROOT"
git fetch origin "${PRIMARY_BRANCH}" --quiet 2>/dev/null || true
git checkout "${PRIMARY_BRANCH}" --quiet 2>/dev/null || true
git pull --ff-only origin "${PRIMARY_BRANCH}" --quiet 2>/dev/null || true
# Create (or reset to) a fresh branch from PRIMARY_BRANCH
git checkout -B "$PLANNER_OPS_BRANCH" "origin/${PRIMARY_BRANCH}" --quiet 2>/dev/null || \
git checkout -b "$PLANNER_OPS_BRANCH" --quiet 2>/dev/null || true
)
log "ops branch: ${PLANNER_OPS_BRANCH}"
# ── Run agent ───────────────────────────────────────────────────────────── # ── Run agent ─────────────────────────────────────────────────────────────
export CLAUDE_MODEL="opus" export CLAUDE_MODEL="opus"
agent_run --worktree "$WORKTREE" "$PROMPT" agent_run --worktree "$WORKTREE" "$PROMPT"
log "agent_run complete" log "agent_run complete"
# ── PR lifecycle: create PR on ops repo and walk to merge (#765) ─────────
OPS_FORGE_API="${FORGE_API_BASE}/repos/${FORGE_OPS_REPO}"
ops_has_commits=false
if ! git -C "$OPS_REPO_ROOT" diff --quiet "origin/${PRIMARY_BRANCH}..${PLANNER_OPS_BRANCH}" 2>/dev/null; then
ops_has_commits=true
fi
if [ "$ops_has_commits" = "true" ]; then
log "ops branch has commits — creating PR"
# Push the branch to the ops remote
git -C "$OPS_REPO_ROOT" push origin "$PLANNER_OPS_BRANCH" --quiet 2>/dev/null || \
git -C "$OPS_REPO_ROOT" push --force-with-lease origin "$PLANNER_OPS_BRANCH" 2>/dev/null
# Temporarily point FORGE_API at the ops repo for pr-lifecycle functions
ORIG_FORGE_API="$FORGE_API"
export FORGE_API="$OPS_FORGE_API"
# Ops repo typically has no Woodpecker CI — skip CI polling
ORIG_WOODPECKER_REPO_ID="${WOODPECKER_REPO_ID:-2}"
export WOODPECKER_REPO_ID="0"
PR_NUM=$(pr_create "$PLANNER_OPS_BRANCH" \
"chore: planner run $(date -u +%Y-%m-%d)" \
"Automated planner run — updates prerequisite tree, memory, and vault items." \
"${PRIMARY_BRANCH}" \
"$OPS_FORGE_API") || true
if [ -n "$PR_NUM" ]; then
log "ops PR #${PR_NUM} created — walking to merge"
SESSION_ID=$(cat "$SID_FILE" 2>/dev/null || echo "planner-$$")
pr_walk_to_merge "$PR_NUM" "$SESSION_ID" "$OPS_REPO_ROOT" 1 2 || {
log "ops PR #${PR_NUM} walk finished: ${_PR_WALK_EXIT_REASON:-unknown}"
}
log "ops PR #${PR_NUM} result: ${_PR_WALK_EXIT_REASON:-unknown}"
else
log "WARNING: failed to create ops PR for branch ${PLANNER_OPS_BRANCH}"
fi
# Restore original FORGE_API
export FORGE_API="$ORIG_FORGE_API"
export WOODPECKER_REPO_ID="$ORIG_WOODPECKER_REPO_ID"
else
log "no ops changes — skipping PR creation"
fi
# Persist watermarks so next run can skip if nothing changed # Persist watermarks so next run can skip if nothing changed
mkdir -p "$FACTORY_ROOT/state" mkdir -p "$FACTORY_ROOT/state"
echo "$CURRENT_SHA" > "$LAST_SHA_FILE" echo "$CURRENT_SHA" > "$LAST_SHA_FILE"

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: c4ca1e930d7be3f95060971ce4fa949dab2f76e7 --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Predictor Agent # Predictor Agent
**Role**: Abstract adversary (the "goblin"). Runs a 2-step formula **Role**: Abstract adversary (the "goblin"). Runs a 2-step formula

View file

@ -23,10 +23,11 @@ FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
# Accept project config from argument; default to disinto # Accept project config from argument; default to disinto
export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}" export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}"
# Set override BEFORE sourcing env.sh so it survives any later re-source of
# env.sh from nested shells / claude -p tools (#762, #747)
export FORGE_TOKEN_OVERRIDE="${FORGE_PREDICTOR_TOKEN:-}"
# shellcheck source=../lib/env.sh # shellcheck source=../lib/env.sh
source "$FACTORY_ROOT/lib/env.sh" source "$FACTORY_ROOT/lib/env.sh"
# Use predictor-bot's own Forgejo identity (#747)
FORGE_TOKEN="${FORGE_PREDICTOR_TOKEN:-${FORGE_TOKEN}}"
# shellcheck source=../lib/formula-session.sh # shellcheck source=../lib/formula-session.sh
source "$FACTORY_ROOT/lib/formula-session.sh" source "$FACTORY_ROOT/lib/formula-session.sh"
# shellcheck source=../lib/worktree.sh # shellcheck source=../lib/worktree.sh

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: c4ca1e930d7be3f95060971ce4fa949dab2f76e7 --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Review Agent # Review Agent
**Role**: AI-powered PR review — post structured findings and formal **Role**: AI-powered PR review — post structured findings and formal

View file

@ -1,4 +1,4 @@
<!-- last-reviewed: c4ca1e930d7be3f95060971ce4fa949dab2f76e7 --> <!-- last-reviewed: 2a7ae0b7eae5979b2c53e3bd1c4280dfdc9df785 -->
# Supervisor Agent # Supervisor Agent
**Role**: Health monitoring and auto-remediation, executed as a formula-driven **Role**: Health monitoring and auto-remediation, executed as a formula-driven
@ -7,13 +7,11 @@ then runs an interactive Claude session (sonnet) that assesses health, auto-fixe
issues, and writes a daily journal. When blocked on external issues, and writes a daily journal. When blocked on external
resources or human decisions, files vault items instead of escalating directly. resources or human decisions, files vault items instead of escalating directly.
**Trigger**: `supervisor-run.sh` is invoked by the polling loop in `docker/edge/entrypoint-edge.sh` **Trigger**: `supervisor-run.sh` is invoked by two polling loops:
every 20 minutes (line 50-53). Sources `lib/guard.sh` and calls `check_active supervisor` first - **Agents container** (`docker/agents/entrypoint.sh`): every `SUPERVISOR_INTERVAL` seconds (default 1200 = 20 min). Controlled by the `supervisor` role in `AGENT_ROLES` (included in the default seven-role set since P1/#801). Logs to `supervisor.log` in the agents container.
— skips if `$FACTORY_ROOT/state/.supervisor-active` is absent. Then runs `claude -p` via - **Edge container** (`docker/edge/entrypoint-edge.sh`): separate loop in the edge container (line 169-172). Runs independently of the agents container's polling schedule.
`agent-sdk.sh`, injects `formulas/run-supervisor.toml` with pre-collected metrics as context,
and cleans up on completion or timeout (20 min max session). Note: the supervisor runs in the Both invoke the same `supervisor-run.sh`. Sources `lib/guard.sh` and calls `check_active supervisor` first — skips if `$FACTORY_ROOT/state/.supervisor-active` is absent. Then runs `claude -p` via `agent-sdk.sh`, injects `formulas/run-supervisor.toml` with pre-collected metrics as context, and cleans up on completion or timeout.
**edge container** (`entrypoint-edge.sh`), not the agent container — this distinction matters
for operators debugging the factory.
**Key files**: **Key files**:
- `supervisor/supervisor-run.sh` — Polling loop participant + orchestrator: lock, memory guard, - `supervisor/supervisor-run.sh` — Polling loop participant + orchestrator: lock, memory guard,
@ -39,6 +37,7 @@ P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).
**Environment variables consumed**: **Environment variables consumed**:
- `FORGE_TOKEN`, `FORGE_SUPERVISOR_TOKEN` (falls back to FORGE_TOKEN), `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`, `OPS_REPO_ROOT` - `FORGE_TOKEN`, `FORGE_SUPERVISOR_TOKEN` (falls back to FORGE_TOKEN), `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`, `OPS_REPO_ROOT`
- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by supervisor-run.sh) - `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by supervisor-run.sh)
- `SUPERVISOR_INTERVAL` — polling interval in seconds for agents container (default 1200 = 20 min)
- `WOODPECKER_TOKEN`, `WOODPECKER_SERVER`, `WOODPECKER_DB_PASSWORD`, `WOODPECKER_DB_USER`, `WOODPECKER_DB_HOST`, `WOODPECKER_DB_NAME` — CI database queries - `WOODPECKER_TOKEN`, `WOODPECKER_SERVER`, `WOODPECKER_DB_PASSWORD`, `WOODPECKER_DB_USER`, `WOODPECKER_DB_HOST`, `WOODPECKER_DB_NAME` — CI database queries
**Degraded mode (Issue #544)**: When `OPS_REPO_ROOT` is not set or the directory doesn't exist, the supervisor runs in degraded mode: **Degraded mode (Issue #544)**: When `OPS_REPO_ROOT` is not set or the directory doesn't exist, the supervisor runs in degraded mode:

View file

@ -25,10 +25,11 @@ FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
# Accept project config from argument; default to disinto # Accept project config from argument; default to disinto
export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}" export PROJECT_TOML="${1:-$FACTORY_ROOT/projects/disinto.toml}"
# Set override BEFORE sourcing env.sh so it survives any later re-source of
# env.sh from nested shells / claude -p tools (#762, #747)
export FORGE_TOKEN_OVERRIDE="${FORGE_SUPERVISOR_TOKEN:-}"
# shellcheck source=../lib/env.sh # shellcheck source=../lib/env.sh
source "$FACTORY_ROOT/lib/env.sh" source "$FACTORY_ROOT/lib/env.sh"
# Use supervisor-bot's own Forgejo identity (#747)
FORGE_TOKEN="${FORGE_SUPERVISOR_TOKEN:-${FORGE_TOKEN}}"
# shellcheck source=../lib/formula-session.sh # shellcheck source=../lib/formula-session.sh
source "$FACTORY_ROOT/lib/formula-session.sh" source "$FACTORY_ROOT/lib/formula-session.sh"
# shellcheck source=../lib/worktree.sh # shellcheck source=../lib/worktree.sh

View file

@ -0,0 +1,145 @@
#!/usr/bin/env bats
# =============================================================================
# tests/disinto-init-nomad.bats — Regression guard for `disinto init`
# backend dispatch (S0.5, issue #825).
#
# Exercises the three CLI paths the Nomad+Vault migration cares about:
# 1. --backend=nomad --dry-run → cluster-up step list
# 2. --backend=nomad --empty --dry-run → same, with "--empty" banner
# 3. --backend=docker --dry-run → docker path unaffected
#
# A throw-away `placeholder/repo` slug satisfies the CLI's positional-arg
# requirement (the nomad dispatcher never touches it). --dry-run on both
# backends short-circuits before any network/filesystem mutation, so the
# suite is hermetic — no Forgejo, no sudo, no real cluster.
# =============================================================================
setup_file() {
export DISINTO_ROOT
DISINTO_ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
export DISINTO_BIN="${DISINTO_ROOT}/bin/disinto"
[ -x "$DISINTO_BIN" ] || {
echo "disinto binary not executable: $DISINTO_BIN" >&2
return 1
}
}
# ── --backend=nomad --dry-run ────────────────────────────────────────────────
@test "disinto init --backend=nomad --dry-run exits 0 and prints the step list" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --dry-run
[ "$status" -eq 0 ]
# Dispatcher banner (cluster-up mode, no --empty).
[[ "$output" == *"nomad backend: default (cluster-up; jobs deferred to Step 1)"* ]]
# All nine cluster-up dry-run steps, in order.
[[ "$output" == *"[dry-run] Step 1/9: install nomad + vault binaries"* ]]
[[ "$output" == *"[dry-run] Step 2/9: write + enable nomad.service (NOT started)"* ]]
[[ "$output" == *"[dry-run] Step 3/9: write + enable vault.service + vault.hcl (NOT started)"* ]]
[[ "$output" == *"[dry-run] Step 4/9: create host-volume dirs under /srv/disinto/"* ]]
[[ "$output" == *"[dry-run] Step 5/9: install /etc/nomad.d/server.hcl + client.hcl from repo"* ]]
[[ "$output" == *"[dry-run] Step 6/9: first-run vault init + persist unseal.key + root.token"* ]]
[[ "$output" == *"[dry-run] Step 7/9: systemctl start vault + poll until unsealed"* ]]
[[ "$output" == *"[dry-run] Step 8/9: systemctl start nomad + poll until ≥1 node ready"* ]]
[[ "$output" == *"[dry-run] Step 9/9: write /etc/profile.d/disinto-nomad.sh"* ]]
[[ "$output" == *"Dry run complete — no changes made."* ]]
}
# ── --backend=nomad --empty --dry-run ────────────────────────────────────────
@test "disinto init --backend=nomad --empty --dry-run prints the --empty banner + step list" {
run "$DISINTO_BIN" init placeholder/repo --backend=nomad --empty --dry-run
[ "$status" -eq 0 ]
# --empty changes the dispatcher banner but not the step list — Step 1
# of the migration will branch on $empty to gate job deployment; today
# both modes invoke the same cluster-up dry-run.
[[ "$output" == *"nomad backend: --empty (cluster-up only, no jobs)"* ]]
[[ "$output" == *"[dry-run] Step 1/9: install nomad + vault binaries"* ]]
[[ "$output" == *"Dry run complete — no changes made."* ]]
}
# ── --backend=docker (regression guard) ──────────────────────────────────────
@test "disinto init --backend=docker does NOT dispatch to the nomad path" {
run "$DISINTO_BIN" init placeholder/repo --backend=docker --dry-run
[ "$status" -eq 0 ]
# Negative assertion: the nomad dispatcher banners must be absent.
[[ "$output" != *"nomad backend:"* ]]
[[ "$output" != *"[dry-run] Step 1/9: install nomad + vault binaries"* ]]
# Positive assertion: docker-path output still appears — the existing
# docker dry-run printed "=== disinto init ===" before listing the
# intended forge/compose actions.
[[ "$output" == *"=== disinto init ==="* ]]
[[ "$output" == *"── Dry-run: intended actions ────"* ]]
}
# ── Flag syntax: --flag=value vs --flag value ────────────────────────────────
# Both forms must work. The bin/disinto flag loop has separate cases for
# `--backend value` and `--backend=value`; a regression in either would
# silently route to the docker default, which is the worst failure mode
# for a mid-migration dispatcher ("loud-failing stub" lesson from S0.4).
@test "disinto init --backend nomad (space-separated) dispatches to nomad" {
run "$DISINTO_BIN" init placeholder/repo --backend nomad --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"nomad backend: default"* ]]
[[ "$output" == *"[dry-run] Step 1/9: install nomad + vault binaries"* ]]
}
# ── Flag validation ──────────────────────────────────────────────────────────
@test "--backend=bogus is rejected with a clear error" {
run "$DISINTO_BIN" init placeholder/repo --backend=bogus --dry-run
[ "$status" -ne 0 ]
[[ "$output" == *"invalid --backend value"* ]]
}
@test "--empty without --backend=nomad is rejected" {
run "$DISINTO_BIN" init placeholder/repo --backend=docker --empty --dry-run
[ "$status" -ne 0 ]
[[ "$output" == *"--empty is only valid with --backend=nomad"* ]]
}
# ── Positional vs flag-first invocation (#835) ───────────────────────────────
#
# Before the #835 fix, disinto_init eagerly consumed $1 as repo_url *before*
# argparse ran. That swallowed `--backend=nomad` as a repo_url and then
# complained that `--empty` required a nomad backend — the nonsense error
# flagged during S0.1 end-to-end verification. The cases below pin the CLI
# to the post-fix contract: the nomad path accepts flag-first invocation,
# the docker path still errors helpfully on a missing repo_url.
@test "disinto init --backend=nomad --empty --dry-run (no positional) dispatches to nomad" {
run "$DISINTO_BIN" init --backend=nomad --empty --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"nomad backend: --empty (cluster-up only, no jobs)"* ]]
[[ "$output" == *"[dry-run] Step 1/9: install nomad + vault binaries"* ]]
# The bug symptom must be absent — backend was misdetected as docker
# when --backend=nomad got swallowed as repo_url.
[[ "$output" != *"--empty is only valid with --backend=nomad"* ]]
}
@test "disinto init --backend nomad --dry-run (space-separated, no positional) dispatches to nomad" {
run "$DISINTO_BIN" init --backend nomad --dry-run
[ "$status" -eq 0 ]
[[ "$output" == *"nomad backend: default"* ]]
[[ "$output" == *"[dry-run] Step 1/9: install nomad + vault binaries"* ]]
}
@test "disinto init (no args) still errors with 'repo URL required'" {
run "$DISINTO_BIN" init
[ "$status" -ne 0 ]
[[ "$output" == *"repo URL required"* ]]
}
@test "disinto init --backend=docker (no positional) errors with 'repo URL required', not 'Unknown option'" {
run "$DISINTO_BIN" init --backend=docker
[ "$status" -ne 0 ]
[[ "$output" == *"repo URL required"* ]]
[[ "$output" != *"Unknown option"* ]]
}

215
tests/lib-hvault.bats Normal file
View file

@ -0,0 +1,215 @@
#!/usr/bin/env bats
# tests/lib-hvault.bats — Unit tests for lib/hvault.sh
#
# Runs against a dev-mode Vault server (single binary, no LXC needed).
# CI launches vault server -dev inline before running these tests.
VAULT_BIN="${VAULT_BIN:-vault}"
setup_file() {
export TEST_DIR
TEST_DIR="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
# Start dev-mode vault on a random port
export VAULT_DEV_PORT
VAULT_DEV_PORT="$(shuf -i 18200-18299 -n 1)"
export VAULT_ADDR="http://127.0.0.1:${VAULT_DEV_PORT}"
"$VAULT_BIN" server -dev \
-dev-listen-address="127.0.0.1:${VAULT_DEV_PORT}" \
-dev-root-token-id="test-root-token" \
-dev-no-store-token \
&>"${BATS_FILE_TMPDIR}/vault.log" &
export VAULT_PID=$!
export VAULT_TOKEN="test-root-token"
# Wait for vault to be ready (up to 10s)
local i=0
while ! curl -sf "${VAULT_ADDR}/v1/sys/health" >/dev/null 2>&1; do
sleep 0.5
i=$((i + 1))
if [ "$i" -ge 20 ]; then
echo "Vault failed to start. Log:" >&2
cat "${BATS_FILE_TMPDIR}/vault.log" >&2
return 1
fi
done
}
teardown_file() {
if [ -n "${VAULT_PID:-}" ]; then
kill "$VAULT_PID" 2>/dev/null || true
wait "$VAULT_PID" 2>/dev/null || true
fi
}
setup() {
# Source the module under test
source "${TEST_DIR}/lib/hvault.sh"
export VAULT_ADDR VAULT_TOKEN
}
# ── hvault_kv_put + hvault_kv_get ────────────────────────────────────────────
@test "hvault_kv_put writes and hvault_kv_get reads a secret" {
run hvault_kv_put "test/myapp" "username=admin" "password=s3cret"
[ "$status" -eq 0 ]
run hvault_kv_get "test/myapp"
[ "$status" -eq 0 ]
echo "$output" | jq -e '.username == "admin"'
echo "$output" | jq -e '.password == "s3cret"'
}
@test "hvault_kv_get extracts a single key" {
hvault_kv_put "test/single" "foo=bar" "baz=qux"
run hvault_kv_get "test/single" "foo"
[ "$status" -eq 0 ]
[ "$output" = "bar" ]
}
@test "hvault_kv_get fails for missing key" {
hvault_kv_put "test/keymiss" "exists=yes"
run hvault_kv_get "test/keymiss" "nope"
[ "$status" -ne 0 ]
}
@test "hvault_kv_get fails for missing path" {
run hvault_kv_get "test/does-not-exist-$(date +%s)"
[ "$status" -ne 0 ]
}
@test "hvault_kv_put fails without KEY=VAL" {
run hvault_kv_put "test/bad"
[ "$status" -ne 0 ]
echo "$output" | grep -q '"error":true' || echo "$stderr" | grep -q '"error":true'
}
@test "hvault_kv_put rejects malformed pair (no =)" {
run hvault_kv_put "test/bad2" "noequals"
[ "$status" -ne 0 ]
}
@test "hvault_kv_get fails without PATH" {
run hvault_kv_get
[ "$status" -ne 0 ]
}
# ── hvault_kv_list ───────────────────────────────────────────────────────────
@test "hvault_kv_list lists keys at a path" {
hvault_kv_put "test/listdir/a" "k=1"
hvault_kv_put "test/listdir/b" "k=2"
run hvault_kv_list "test/listdir"
[ "$status" -eq 0 ]
echo "$output" | jq -e '. | length >= 2'
echo "$output" | jq -e 'index("a")'
echo "$output" | jq -e 'index("b")'
}
@test "hvault_kv_list fails on nonexistent path" {
run hvault_kv_list "test/no-such-path-$(date +%s)"
[ "$status" -ne 0 ]
}
@test "hvault_kv_list fails without PATH" {
run hvault_kv_list
[ "$status" -ne 0 ]
}
# ── hvault_policy_apply ──────────────────────────────────────────────────────
@test "hvault_policy_apply creates a policy" {
local pfile="${BATS_TEST_TMPDIR}/test-policy.hcl"
cat > "$pfile" <<'HCL'
path "secret/data/test/*" {
capabilities = ["read"]
}
HCL
run hvault_policy_apply "test-reader" "$pfile"
[ "$status" -eq 0 ]
# Verify the policy exists via Vault API
run curl -sf -H "X-Vault-Token: ${VAULT_TOKEN}" \
"${VAULT_ADDR}/v1/sys/policies/acl/test-reader"
[ "$status" -eq 0 ]
echo "$output" | jq -e '.data.policy' | grep -q "secret/data/test"
}
@test "hvault_policy_apply is idempotent" {
local pfile="${BATS_TEST_TMPDIR}/idem-policy.hcl"
printf 'path "secret/*" { capabilities = ["list"] }\n' > "$pfile"
run hvault_policy_apply "idem-policy" "$pfile"
[ "$status" -eq 0 ]
# Apply again — should succeed
run hvault_policy_apply "idem-policy" "$pfile"
[ "$status" -eq 0 ]
}
@test "hvault_policy_apply fails with missing file" {
run hvault_policy_apply "bad-policy" "/nonexistent/policy.hcl"
[ "$status" -ne 0 ]
}
@test "hvault_policy_apply fails without args" {
run hvault_policy_apply
[ "$status" -ne 0 ]
}
# ── hvault_token_lookup ──────────────────────────────────────────────────────
@test "hvault_token_lookup returns token info" {
run hvault_token_lookup
[ "$status" -eq 0 ]
echo "$output" | jq -e '.policies'
echo "$output" | jq -e '.accessor'
echo "$output" | jq -e 'has("ttl")'
}
@test "hvault_token_lookup fails without VAULT_TOKEN" {
unset VAULT_TOKEN
run hvault_token_lookup
[ "$status" -ne 0 ]
}
@test "hvault_token_lookup fails without VAULT_ADDR" {
unset VAULT_ADDR
run hvault_token_lookup
[ "$status" -ne 0 ]
}
# ── hvault_jwt_login ─────────────────────────────────────────────────────────
@test "hvault_jwt_login fails without VAULT_ADDR" {
unset VAULT_ADDR
run hvault_jwt_login "myrole" "fakejwt"
[ "$status" -ne 0 ]
}
@test "hvault_jwt_login fails without args" {
run hvault_jwt_login
[ "$status" -ne 0 ]
}
@test "hvault_jwt_login returns error for unconfigured jwt auth" {
# JWT auth backend is not enabled in dev mode by default — expect failure
run hvault_jwt_login "myrole" "eyJhbGciOiJSUzI1NiJ9.fake.sig"
[ "$status" -ne 0 ]
}
# ── Env / prereq errors ─────────────────────────────────────────────────────
@test "all functions fail with structured JSON error when VAULT_ADDR unset" {
unset VAULT_ADDR
for fn in hvault_kv_get hvault_kv_put hvault_kv_list hvault_policy_apply hvault_token_lookup; do
run $fn "dummy" "dummy"
[ "$status" -ne 0 ]
done
}

183
tests/lib-issue-claim.bats Normal file
View file

@ -0,0 +1,183 @@
#!/usr/bin/env bats
# =============================================================================
# tests/lib-issue-claim.bats — Regression guard for the issue_claim TOCTOU
# fix landed in #830.
#
# Before the fix, two dev agents polling concurrently could both observe
# `.assignee == null`, both PATCH the assignee, and Forgejo's last-write-wins
# semantics would leave the loser believing it had claimed successfully.
# Two agents would then implement the same issue and collide at the PR/branch
# stage.
#
# The fix re-reads the assignee after the PATCH and aborts when it doesn't
# match self, with label writes moved AFTER the verification so a losing
# claim leaves no stray `in-progress` label.
#
# These tests stub `curl` with a bash function so each call tree can be
# driven through a specific response sequence (pre-check, PATCH, re-read)
# without a live Forgejo. The stub records every HTTP call to
# `$CALLS_LOG` for assertions.
# =============================================================================
setup() {
ROOT="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)"
export FACTORY_ROOT="$ROOT"
export FORGE_TOKEN="dummy-token"
export FORGE_URL="https://forge.example.test"
export FORGE_API="${FORGE_URL}/api/v1"
export CALLS_LOG="${BATS_TEST_TMPDIR}/curl-calls.log"
: > "$CALLS_LOG"
export ISSUE_GET_COUNT_FILE="${BATS_TEST_TMPDIR}/issue-get-count"
echo 0 > "$ISSUE_GET_COUNT_FILE"
# Scenario knobs — overridden per @test.
export MOCK_ME="bot"
export MOCK_INITIAL_ASSIGNEE=""
export MOCK_RECHECK_ASSIGNEE="bot"
# Stand-in for lib/env.sh's forge_api (we don't source env.sh — too
# much unrelated setup). Shape mirrors the real helper closely enough
# that _ilc_ensure_label_id() works.
forge_api() {
local method="$1" path="$2"
shift 2
curl -sf -X "$method" \
-H "Authorization: token ${FORGE_TOKEN}" \
-H "Content-Type: application/json" \
"${FORGE_API}${path}" "$@"
}
# curl shim — parses method + URL out of the argv and dispatches
# canned responses per endpoint. Every call gets logged as
# `METHOD URL` (one line) to $CALLS_LOG for later grep-based asserts.
curl() {
local method="GET" url="" arg
while [ $# -gt 0 ]; do
arg="$1"
case "$arg" in
-X) method="$2"; shift 2 ;;
-H|-d|--data-binary|-o) shift 2 ;;
-sf|-s|-f|--silent|--fail) shift ;;
*) url="$arg"; shift ;;
esac
done
printf '%s %s\n' "$method" "$url" >> "$CALLS_LOG"
case "$method $url" in
"GET ${FORGE_URL}/api/v1/user")
printf '{"login":"%s"}' "$MOCK_ME"
;;
"GET ${FORGE_API}/issues/"*)
# Distinguish pre-check (first GET) from re-read (subsequent GETs)
# via a counter file that persists across curl invocations in the
# same test.
local n
n=$(cat "$ISSUE_GET_COUNT_FILE")
n=$((n + 1))
echo "$n" > "$ISSUE_GET_COUNT_FILE"
local who
if [ "$n" -eq 1 ]; then
who="$MOCK_INITIAL_ASSIGNEE"
else
who="$MOCK_RECHECK_ASSIGNEE"
fi
if [ -z "$who" ]; then
printf '{"assignee":null}'
else
printf '{"assignee":{"login":"%s"}}' "$who"
fi
;;
"PATCH ${FORGE_API}/issues/"*)
: # accept any PATCH; body is ignored by the mock
;;
"GET ${FORGE_API}/labels")
printf '[]'
;;
"POST ${FORGE_API}/labels")
printf '{"id":99}'
;;
"POST ${FORGE_API}/issues/"*"/labels")
:
;;
"DELETE ${FORGE_API}/issues/"*"/labels/"*)
:
;;
*)
return 1
;;
esac
return 0
}
# shellcheck source=../lib/issue-lifecycle.sh
source "${ROOT}/lib/issue-lifecycle.sh"
}
# ── helpers ──────────────────────────────────────────────────────────────────
# count_calls METHOD URL — count matching lines in $CALLS_LOG.
count_calls() {
local method="$1" url="$2"
grep -cF "${method} ${url}" "$CALLS_LOG" 2>/dev/null || echo 0
}
# ── happy path ───────────────────────────────────────────────────────────────
@test "issue_claim returns 0 when re-read confirms self (no regression, single agent)" {
export MOCK_ME="bot"
export MOCK_INITIAL_ASSIGNEE=""
export MOCK_RECHECK_ASSIGNEE="bot"
run issue_claim 42
[ "$status" -eq 0 ]
# Exactly two GETs to /issues/42 — pre-check and post-PATCH re-read.
[ "$(count_calls GET "${FORGE_API}/issues/42")" -eq 2 ]
# Assignee PATCH fired.
[ "$(count_calls PATCH "${FORGE_API}/issues/42")" -eq 1 ]
# in-progress label added (POST /issues/42/labels).
[ "$(count_calls POST "${FORGE_API}/issues/42/labels")" -eq 1 ]
}
# ── lost race ────────────────────────────────────────────────────────────────
@test "issue_claim returns 1 and leaves no stray in-progress when re-read shows another agent" {
export MOCK_ME="bot"
export MOCK_INITIAL_ASSIGNEE=""
export MOCK_RECHECK_ASSIGNEE="rival"
run issue_claim 42
[ "$status" -eq 1 ]
[[ "$output" == *"claim lost to rival"* ]]
# Re-read happened (two GETs) — this is the new verification step.
[ "$(count_calls GET "${FORGE_API}/issues/42")" -eq 2 ]
# PATCH happened (losers still PATCH before verifying).
[ "$(count_calls PATCH "${FORGE_API}/issues/42")" -eq 1 ]
# CRITICAL: no in-progress label operations on a lost claim.
# (No need to roll back what was never written.)
[ "$(count_calls POST "${FORGE_API}/issues/42/labels")" -eq 0 ]
[ "$(count_calls GET "${FORGE_API}/labels")" -eq 0 ]
}
# ── pre-check skip ──────────────────────────────────────────────────────────
@test "issue_claim skips early (no PATCH) when pre-check shows another assignee" {
export MOCK_ME="bot"
export MOCK_INITIAL_ASSIGNEE="rival"
export MOCK_RECHECK_ASSIGNEE="rival"
run issue_claim 42
[ "$status" -eq 1 ]
[[ "$output" == *"already assigned to rival"* ]]
# Only the pre-check GET — no PATCH, no re-read, no labels.
[ "$(count_calls GET "${FORGE_API}/issues/42")" -eq 1 ]
[ "$(count_calls PATCH "${FORGE_API}/issues/42")" -eq 0 ]
[ "$(count_calls POST "${FORGE_API}/issues/42/labels")" -eq 0 ]
}

View file

@ -29,7 +29,8 @@ cleanup() {
pkill -f "mock-forgejo.py" 2>/dev/null || true pkill -f "mock-forgejo.py" 2>/dev/null || true
rm -rf "$MOCK_BIN" /tmp/smoke-test-repo \ rm -rf "$MOCK_BIN" /tmp/smoke-test-repo \
"${FACTORY_ROOT}/projects/smoke-repo.toml" \ "${FACTORY_ROOT}/projects/smoke-repo.toml" \
/tmp/smoke-claude-shared /tmp/smoke-home-claude /tmp/smoke-claude-shared /tmp/smoke-home-claude \
/tmp/smoke-env-before-rerun /tmp/smoke-env-before-dryrun
# Restore .env only if we created the backup # Restore .env only if we created the backup
if [ -f "${FACTORY_ROOT}/.env.smoke-backup" ]; then if [ -f "${FACTORY_ROOT}/.env.smoke-backup" ]; then
mv "${FACTORY_ROOT}/.env.smoke-backup" "${FACTORY_ROOT}/.env" mv "${FACTORY_ROOT}/.env.smoke-backup" "${FACTORY_ROOT}/.env"
@ -178,8 +179,30 @@ else
fail "disinto init exited non-zero" fail "disinto init exited non-zero"
fi fi
# ── Idempotency test: run init again ─────────────────────────────────────── # ── Dry-run test: must not modify state ────────────────────────────────────
echo "=== Dry-run test ==="
cp "${FACTORY_ROOT}/.env" /tmp/smoke-env-before-dryrun
if bash "${FACTORY_ROOT}/bin/disinto" init \
"${TEST_SLUG}" \
--bare --yes --dry-run \
--forge-url "$FORGE_URL" \
--repo-root "/tmp/smoke-test-repo" 2>&1 | grep -q "Dry run complete"; then
pass "disinto init --dry-run exited successfully"
else
fail "disinto init --dry-run did not complete"
fi
# Verify --dry-run did not modify .env
if diff -q /tmp/smoke-env-before-dryrun "${FACTORY_ROOT}/.env" >/dev/null 2>&1; then
pass "dry-run: .env unchanged"
else
fail "dry-run: .env was modified (should be read-only)"
fi
rm -f /tmp/smoke-env-before-dryrun
# ── Idempotency test: run init again, verify .env is stable ────────────────
echo "=== Idempotency test: running disinto init again ===" echo "=== Idempotency test: running disinto init again ==="
cp "${FACTORY_ROOT}/.env" /tmp/smoke-env-before-rerun
if bash "${FACTORY_ROOT}/bin/disinto" init \ if bash "${FACTORY_ROOT}/bin/disinto" init \
"${TEST_SLUG}" \ "${TEST_SLUG}" \
--bare --yes \ --bare --yes \
@ -190,6 +213,29 @@ else
fail "disinto init (re-run) exited non-zero" fail "disinto init (re-run) exited non-zero"
fi fi
# Verify .env is stable across re-runs (no token churn)
if diff -q /tmp/smoke-env-before-rerun "${FACTORY_ROOT}/.env" >/dev/null 2>&1; then
pass "idempotency: .env unchanged on re-run"
else
fail "idempotency: .env changed on re-run (token churn detected)"
diff /tmp/smoke-env-before-rerun "${FACTORY_ROOT}/.env" >&2 || true
fi
rm -f /tmp/smoke-env-before-rerun
# Verify FORGE_ADMIN_TOKEN is stored in .env
if grep -q '^FORGE_ADMIN_TOKEN=' "${FACTORY_ROOT}/.env"; then
pass ".env contains FORGE_ADMIN_TOKEN"
else
fail ".env missing FORGE_ADMIN_TOKEN"
fi
# Verify HUMAN_TOKEN is stored in .env
if grep -q '^HUMAN_TOKEN=' "${FACTORY_ROOT}/.env"; then
pass ".env contains HUMAN_TOKEN"
else
fail ".env missing HUMAN_TOKEN"
fi
# ── 4. Verify Forgejo state ───────────────────────────────────────────────── # ── 4. Verify Forgejo state ─────────────────────────────────────────────────
echo "=== 4/6 Verifying Forgejo state ===" echo "=== 4/6 Verifying Forgejo state ==="

162
tests/smoke-load-secret.sh Normal file
View file

@ -0,0 +1,162 @@
#!/usr/bin/env bash
# tests/smoke-load-secret.sh — Unit tests for load_secret() precedence chain
#
# Covers the 4 precedence cases:
# 1. /secrets/<NAME>.env (Nomad template)
# 2. Current environment
# 3. secrets/<NAME>.enc (age-encrypted per-key file)
# 4. Default / empty fallback
#
# Required tools: bash, age (for case 3)
set -euo pipefail
FACTORY_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
fail() { printf 'FAIL: %s\n' "$*" >&2; FAILED=1; }
pass() { printf 'PASS: %s\n' "$*"; }
FAILED=0
# Set up a temp workspace and fake HOME so age key paths work
test_dir=$(mktemp -d)
fake_home=$(mktemp -d)
trap 'rm -rf "$test_dir" "$fake_home"' EXIT
# Minimal env for sourcing env.sh's load_secret function without the full boot
# We source the function definition directly to isolate the unit under test.
# shellcheck disable=SC2034
export USER="${USER:-test}"
export HOME="$fake_home"
# Source env.sh to get load_secret (and FACTORY_ROOT)
source "${FACTORY_ROOT}/lib/env.sh"
# ── Case 4: Default / empty fallback ────────────────────────────────────────
echo "=== 1/5 Case 4: default fallback ==="
unset TEST_SECRET_FALLBACK 2>/dev/null || true
val=$(load_secret TEST_SECRET_FALLBACK "my-default")
if [ "$val" = "my-default" ]; then
pass "load_secret returns default when nothing is set"
else
fail "Expected 'my-default', got '${val}'"
fi
val=$(load_secret TEST_SECRET_FALLBACK)
if [ -z "$val" ]; then
pass "load_secret returns empty when no default and nothing set"
else
fail "Expected empty, got '${val}'"
fi
# ── Case 2: Environment variable already set ────────────────────────────────
echo "=== 2/5 Case 2: environment variable ==="
export TEST_SECRET_ENV="from-environment"
val=$(load_secret TEST_SECRET_ENV "ignored-default")
if [ "$val" = "from-environment" ]; then
pass "load_secret returns env value over default"
else
fail "Expected 'from-environment', got '${val}'"
fi
unset TEST_SECRET_ENV
# ── Case 3: Age-encrypted per-key file ──────────────────────────────────────
echo "=== 3/5 Case 3: age-encrypted secret ==="
if command -v age &>/dev/null && command -v age-keygen &>/dev/null; then
# Generate a test age key
age_key_dir="${fake_home}/.config/sops/age"
mkdir -p "$age_key_dir"
age-keygen -o "${age_key_dir}/keys.txt" 2>/dev/null
pub_key=$(age-keygen -y "${age_key_dir}/keys.txt")
# Create encrypted secret
secrets_dir="${FACTORY_ROOT}/secrets"
mkdir -p "$secrets_dir"
printf 'age-test-value' | age -r "$pub_key" -o "${secrets_dir}/TEST_SECRET_AGE.enc"
unset TEST_SECRET_AGE 2>/dev/null || true
val=$(load_secret TEST_SECRET_AGE "fallback")
if [ "$val" = "age-test-value" ]; then
pass "load_secret decrypts age-encrypted secret"
else
fail "Expected 'age-test-value', got '${val}'"
fi
# Verify caching: call load_secret directly (not in subshell) so export propagates
unset TEST_SECRET_AGE 2>/dev/null || true
load_secret TEST_SECRET_AGE >/dev/null
if [ "${TEST_SECRET_AGE:-}" = "age-test-value" ]; then
pass "load_secret caches decrypted value in environment (direct call)"
else
fail "Decrypted value not cached in environment"
fi
# Clean up test secret
rm -f "${secrets_dir}/TEST_SECRET_AGE.enc"
rmdir "$secrets_dir" 2>/dev/null || true
unset TEST_SECRET_AGE
else
echo "SKIP: age/age-keygen not found — skipping age decryption test"
fi
# ── Case 1: Nomad template path ────────────────────────────────────────────
echo "=== 4/5 Case 1: Nomad template (/secrets/<NAME>.env) ==="
nomad_dir="/secrets"
if [ -w "$(dirname "$nomad_dir")" ] 2>/dev/null || [ -w "$nomad_dir" ] 2>/dev/null; then
mkdir -p "$nomad_dir"
printf 'TEST_SECRET_NOMAD=from-nomad-template\n' > "${nomad_dir}/TEST_SECRET_NOMAD.env"
# Even with env set, Nomad path takes precedence
export TEST_SECRET_NOMAD="from-env-should-lose"
val=$(load_secret TEST_SECRET_NOMAD "default")
if [ "$val" = "from-nomad-template" ]; then
pass "load_secret prefers Nomad template over env"
else
fail "Expected 'from-nomad-template', got '${val}'"
fi
rm -f "${nomad_dir}/TEST_SECRET_NOMAD.env"
rmdir "$nomad_dir" 2>/dev/null || true
unset TEST_SECRET_NOMAD
else
echo "SKIP: /secrets not writable — skipping Nomad template test (needs root or container)"
fi
# ── Precedence: env beats age ────────────────────────────────────────────
echo "=== 5/5 Precedence: env beats age-encrypted ==="
if command -v age &>/dev/null && command -v age-keygen &>/dev/null; then
age_key_dir="${fake_home}/.config/sops/age"
mkdir -p "$age_key_dir"
[ -f "${age_key_dir}/keys.txt" ] || age-keygen -o "${age_key_dir}/keys.txt" 2>/dev/null
pub_key=$(age-keygen -y "${age_key_dir}/keys.txt")
secrets_dir="${FACTORY_ROOT}/secrets"
mkdir -p "$secrets_dir"
printf 'age-value-should-lose' | age -r "$pub_key" -o "${secrets_dir}/TEST_SECRET_PREC.enc"
export TEST_SECRET_PREC="env-value-wins"
val=$(load_secret TEST_SECRET_PREC "default")
if [ "$val" = "env-value-wins" ]; then
pass "load_secret prefers env over age-encrypted file"
else
fail "Expected 'env-value-wins', got '${val}'"
fi
rm -f "${secrets_dir}/TEST_SECRET_PREC.enc"
rmdir "$secrets_dir" 2>/dev/null || true
unset TEST_SECRET_PREC
else
echo "SKIP: age not found — skipping precedence test"
fi
# ── Summary ───────────────────────────────────────────────────────────────
echo ""
if [ "$FAILED" -ne 0 ]; then
echo "=== SMOKE-LOAD-SECRET TEST FAILED ==="
exit 1
fi
echo "=== SMOKE-LOAD-SECRET TEST PASSED ==="

View file

@ -83,9 +83,12 @@ curl -sL https://raw.githubusercontent.com/disinto-admin/disinto/fix/issue-621/t
- Permissions: `root:disinto-register 0750` - Permissions: `root:disinto-register 0750`
3. **Installs Caddy**: 3. **Installs Caddy**:
- Backs up any pre-existing `/etc/caddy/Caddyfile` to `/etc/caddy/Caddyfile.pre-disinto`
- Download Caddy with Gandi DNS plugin - Download Caddy with Gandi DNS plugin
- Enable admin API on `127.0.0.1:2019` - Enable admin API on `127.0.0.1:2019`
- Configure wildcard cert for `*.disinto.ai` via DNS-01 - Configure wildcard cert for `*.disinto.ai` via DNS-01
- Creates `/etc/caddy/extra.d/` for operator-owned site blocks
- Emitted Caddyfile ends with `import /etc/caddy/extra.d/*.caddy`
4. **Sets up SSH**: 4. **Sets up SSH**:
- Creates `disinto-register` authorized_keys with forced command - Creates `disinto-register` authorized_keys with forced command
@ -95,6 +98,27 @@ curl -sL https://raw.githubusercontent.com/disinto-admin/disinto/fix/issue-621/t
- `/opt/disinto-edge/register.sh` — forced command handler - `/opt/disinto-edge/register.sh` — forced command handler
- `/opt/disinto-edge/lib/*.sh` — helper libraries - `/opt/disinto-edge/lib/*.sh` — helper libraries
## Operator-Owned Site Blocks
Edge-control owns the top-level `/etc/caddy/Caddyfile` and dynamic `<project>.<DOMAIN_SUFFIX>` routes injected via the Caddy admin API. Operators own everything under `/etc/caddy/extra.d/`.
To serve non-tunnel content (apex domain, www redirect, static sites), drop `.caddy` files into `/etc/caddy/extra.d/`:
```bash
# Example: /etc/caddy/extra.d/landing.caddy
disinto.ai {
root * /home/debian/disinto-site
file_server
}
# Example: /etc/caddy/extra.d/www-redirect.caddy
www.disinto.ai {
redir https://disinto.ai{uri} permanent
}
```
These files survive across `install.sh` re-runs. The `--extra-caddyfile <path>` flag overrides the default import glob (`/etc/caddy/extra.d/*.caddy`) if needed.
## Usage ## Usage
### Register a Tunnel (from dev box) ### Register a Tunnel (from dev box)

View file

@ -43,6 +43,7 @@ INSTALL_DIR="/opt/disinto-edge"
REGISTRY_DIR="/var/lib/disinto" REGISTRY_DIR="/var/lib/disinto"
CADDY_VERSION="2.8.4" CADDY_VERSION="2.8.4"
DOMAIN_SUFFIX="disinto.ai" DOMAIN_SUFFIX="disinto.ai"
EXTRA_CADDYFILE="/etc/caddy/extra.d/*.caddy"
usage() { usage() {
cat <<EOF cat <<EOF
@ -54,6 +55,8 @@ Options:
--registry-dir <dir> Registry directory (default: /var/lib/disinto) --registry-dir <dir> Registry directory (default: /var/lib/disinto)
--caddy-version <ver> Caddy version to install (default: ${CADDY_VERSION}) --caddy-version <ver> Caddy version to install (default: ${CADDY_VERSION})
--domain-suffix <suffix> Domain suffix for tunnels (default: disinto.ai) --domain-suffix <suffix> Domain suffix for tunnels (default: disinto.ai)
--extra-caddyfile <path> Import path for operator-owned Caddy config
(default: /etc/caddy/extra.d/*.caddy)
-h, --help Show this help -h, --help Show this help
Example: Example:
@ -84,6 +87,10 @@ while [[ $# -gt 0 ]]; do
DOMAIN_SUFFIX="$2" DOMAIN_SUFFIX="$2"
shift 2 shift 2
;; ;;
--extra-caddyfile)
EXTRA_CADDYFILE="$2"
shift 2
;;
-h|--help) -h|--help)
usage usage
;; ;;
@ -225,8 +232,29 @@ EOF
chmod 600 "$GANDI_ENV" chmod 600 "$GANDI_ENV"
# Create Caddyfile with admin API and wildcard cert # Create Caddyfile with admin API and wildcard cert
# Note: Caddy auto-generates server names (srv0, srv1, …). lib/caddy.sh
# discovers the server name dynamically via _discover_server_name() so we
# don't need to name the server here.
CADDYFILE="/etc/caddy/Caddyfile" CADDYFILE="/etc/caddy/Caddyfile"
cat > "$CADDYFILE" <<EOF
# Back up existing Caddyfile before overwriting
if [ -f "$CADDYFILE" ] && [ ! -f "${CADDYFILE}.pre-disinto" ]; then
cp "$CADDYFILE" "${CADDYFILE}.pre-disinto"
log_info "Backed up existing Caddyfile to ${CADDYFILE}.pre-disinto"
fi
# Create extra.d directory for operator-owned site blocks
EXTRA_DIR="/etc/caddy/extra.d"
mkdir -p "$EXTRA_DIR"
chmod 0755 "$EXTRA_DIR"
if getent group caddy >/dev/null 2>&1; then
chown root:caddy "$EXTRA_DIR"
else
log_warn "Group 'caddy' does not exist; extra.d owned by root:root"
fi
log_info "Created ${EXTRA_DIR} for operator-owned Caddy config"
cat > "$CADDYFILE" <<CADDYEOF
# Caddy configuration for edge control plane # Caddy configuration for edge control plane
# Admin API enabled on 127.0.0.1:2019 # Admin API enabled on 127.0.0.1:2019
@ -240,7 +268,10 @@ cat > "$CADDYFILE" <<EOF
dns gandi {env.GANDI_API_KEY} dns gandi {env.GANDI_API_KEY}
} }
} }
EOF
# Operator-owned site blocks (apex, www, static content, etc.)
import ${EXTRA_CADDYFILE}
CADDYEOF
# Start Caddy # Start Caddy
systemctl restart caddy 2>/dev/null || { systemctl restart caddy 2>/dev/null || {
@ -359,6 +390,7 @@ echo "Configuration:"
echo " Install directory: ${INSTALL_DIR}" echo " Install directory: ${INSTALL_DIR}"
echo " Registry: ${REGISTRY_FILE}" echo " Registry: ${REGISTRY_FILE}"
echo " Caddy admin API: http://127.0.0.1:2019" echo " Caddy admin API: http://127.0.0.1:2019"
echo " Operator site blocks: ${EXTRA_DIR}/ (import ${EXTRA_CADDYFILE})"
echo "" echo ""
echo "Users:" echo "Users:"
echo " disinto-register - SSH forced command (runs ${INSTALL_DIR}/register.sh)" echo " disinto-register - SSH forced command (runs ${INSTALL_DIR}/register.sh)"

View file

@ -19,6 +19,24 @@ CADDY_ADMIN_URL="${CADDY_ADMIN_URL:-http://127.0.0.1:2019}"
# Domain suffix for projects # Domain suffix for projects
DOMAIN_SUFFIX="${DOMAIN_SUFFIX:-disinto.ai}" DOMAIN_SUFFIX="${DOMAIN_SUFFIX:-disinto.ai}"
# Discover the Caddy server name that listens on :80/:443
# Usage: _discover_server_name
_discover_server_name() {
local server_name
server_name=$(curl -sS "${CADDY_ADMIN_URL}/config/apps/http/servers" \
| jq -r 'to_entries | map(select(.value.listen[]? | test(":(80|443)$"))) | .[0].key // empty') || {
echo "Error: could not query Caddy admin API for servers" >&2
return 1
}
if [ -z "$server_name" ]; then
echo "Error: could not find a Caddy server listening on :80/:443" >&2
return 1
fi
echo "$server_name"
}
# Add a route for a project # Add a route for a project
# Usage: add_route <project> <port> # Usage: add_route <project> <port>
add_route() { add_route() {
@ -26,6 +44,9 @@ add_route() {
local port="$2" local port="$2"
local fqdn="${project}.${DOMAIN_SUFFIX}" local fqdn="${project}.${DOMAIN_SUFFIX}"
local server_name
server_name=$(_discover_server_name) || return 1
# Build the route configuration (partial config) # Build the route configuration (partial config)
local route_config local route_config
route_config=$(cat <<EOF route_config=$(cat <<EOF
@ -58,16 +79,21 @@ add_route() {
EOF EOF
) )
# Append route using POST /config/apps/http/servers/edge/routes # Append route via admin API, checking HTTP status
local response local response status body
response=$(curl -s -X POST \ response=$(curl -sS -w '\n%{http_code}' -X POST \
"${CADDY_ADMIN_URL}/config/apps/http/servers/edge/routes" \ "${CADDY_ADMIN_URL}/config/apps/http/servers/${server_name}/routes" \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d "$route_config" 2>&1) || { -d "$route_config") || {
echo "Error: failed to add route for ${fqdn}" >&2 echo "Error: failed to add route for ${fqdn}" >&2
echo "Response: ${response}" >&2
return 1 return 1
} }
status=$(echo "$response" | tail -n1)
body=$(echo "$response" | sed '$d')
if [ "$status" -ge 400 ]; then
echo "Error: Caddy admin API returned ${status}: ${body}" >&2
return 1
fi
echo "Added route: ${fqdn} → 127.0.0.1:${port}" >&2 echo "Added route: ${fqdn} → 127.0.0.1:${port}" >&2
} }
@ -78,31 +104,45 @@ remove_route() {
local project="$1" local project="$1"
local fqdn="${project}.${DOMAIN_SUFFIX}" local fqdn="${project}.${DOMAIN_SUFFIX}"
# First, get current routes local server_name
local routes_json server_name=$(_discover_server_name) || return 1
routes_json=$(curl -s "${CADDY_ADMIN_URL}/config/apps/http/servers/edge/routes" 2>&1) || {
# First, get current routes, checking HTTP status
local response status body
response=$(curl -sS -w '\n%{http_code}' \
"${CADDY_ADMIN_URL}/config/apps/http/servers/${server_name}/routes") || {
echo "Error: failed to get current routes" >&2 echo "Error: failed to get current routes" >&2
return 1 return 1
} }
status=$(echo "$response" | tail -n1)
body=$(echo "$response" | sed '$d')
if [ "$status" -ge 400 ]; then
echo "Error: Caddy admin API returned ${status}: ${body}" >&2
return 1
fi
# Find the route index that matches our fqdn using jq # Find the route index that matches our fqdn using jq
local route_index local route_index
route_index=$(echo "$routes_json" | jq -r "to_entries[] | select(.value.match[]?.host[]? == \"${fqdn}\") | .key" 2>/dev/null | head -1) route_index=$(echo "$body" | jq -r "to_entries[] | select(.value.match[]?.host[]? == \"${fqdn}\") | .key" 2>/dev/null | head -1)
if [ -z "$route_index" ] || [ "$route_index" = "null" ]; then if [ -z "$route_index" ] || [ "$route_index" = "null" ]; then
echo "Warning: route for ${fqdn} not found" >&2 echo "Warning: route for ${fqdn} not found" >&2
return 0 return 0
fi fi
# Delete the route at the found index # Delete the route at the found index, checking HTTP status
local response response=$(curl -sS -w '\n%{http_code}' -X DELETE \
response=$(curl -s -X DELETE \ "${CADDY_ADMIN_URL}/config/apps/http/servers/${server_name}/routes/${route_index}" \
"${CADDY_ADMIN_URL}/config/apps/http/servers/edge/routes/${route_index}" \ -H "Content-Type: application/json") || {
-H "Content-Type: application/json" 2>&1) || {
echo "Error: failed to remove route for ${fqdn}" >&2 echo "Error: failed to remove route for ${fqdn}" >&2
echo "Response: ${response}" >&2
return 1 return 1
} }
status=$(echo "$response" | tail -n1)
body=$(echo "$response" | sed '$d')
if [ "$status" -ge 400 ]; then
echo "Error: Caddy admin API returned ${status}: ${body}" >&2
return 1
fi
echo "Removed route: ${fqdn}" >&2 echo "Removed route: ${fqdn}" >&2
} }
@ -110,13 +150,18 @@ remove_route() {
# Reload Caddy to apply configuration changes # Reload Caddy to apply configuration changes
# Usage: reload_caddy # Usage: reload_caddy
reload_caddy() { reload_caddy() {
local response local response status body
response=$(curl -s -X POST \ response=$(curl -sS -w '\n%{http_code}' -X POST \
"${CADDY_ADMIN_URL}/reload" 2>&1) || { "${CADDY_ADMIN_URL}/reload") || {
echo "Error: failed to reload Caddy" >&2 echo "Error: failed to reload Caddy" >&2
echo "Response: ${response}" >&2
return 1 return 1
} }
status=$(echo "$response" | tail -n1)
body=$(echo "$response" | sed '$d')
if [ "$status" -ge 400 ]; then
echo "Error: Caddy reload returned ${status}: ${body}" >&2
return 1
fi
echo "Caddy reloaded" >&2 echo "Caddy reloaded" >&2
} }