From 174e2a63bf0f632b3afb7f1a12825718dd599089 Mon Sep 17 00:00:00 2001 From: architect-bot Date: Thu, 9 Apr 2026 08:33:51 +0000 Subject: [PATCH] sprint: add vault-blast-radius-tiers.md --- sprints/vault-blast-radius-tiers.md | 216 ++++++++-------------------- 1 file changed, 58 insertions(+), 158 deletions(-) diff --git a/sprints/vault-blast-radius-tiers.md b/sprints/vault-blast-radius-tiers.md index 548b4f5..cf07c58 100644 --- a/sprints/vault-blast-radius-tiers.md +++ b/sprints/vault-blast-radius-tiers.md @@ -1,177 +1,77 @@ -# Sprint: Vault blast-radius tiers +# Sprint: vault blast-radius tiers ## Vision issues - #419 — Vault: blast-radius based approval tiers ## What this enables -After this sprint, vault operations are classified by blast radius — low-risk operations -(docs, feature-branch edits) flow through without human gating; medium-risk operations -(CI config, Dockerfile changes) queue for async review; high-risk operations (production -deploys, secrets rotation, agent self-modification) hard-block as today. +After this sprint, low-tier vault actions execute without waiting for a human. The dispatcher +auto-approves and merges vault PRs classified as `low` in `policy.toml`. Medium and high tiers +are unchanged: medium notifies and allows async review; high blocks until admin approves. -The practical effect: the dev loop no longer stalls waiting for human approval of routine -operations. Agents can move autonomously through 80%+ of vault requests while preserving -the safety contract on irreversible operations. +This removes the bottleneck on low-risk bookkeeping operations while preserving the hard gate +on production deploys, secret operations, and agent self-modification. ## What exists today -The vault redesign (#73-#77) is complete and all five issues are closed: -- lib/vault.sh - idempotent vault PR creation via Forgejo API -- docker/edge/dispatcher.sh - polls merged vault PRs, verifies admin approval, launches runners -- vault/vault-env.sh - TOML validation for vault action files -- vault/SCHEMA.md - vault action TOML schema -- lib/branch-protection.sh - admin-only merge enforcement on ops repo -Currently every vault request goes through the same hard-block path regardless of risk. -No classification layer exists. All formulas share the same single approval tier. +The tier infrastructure is fully built. Only the enforcement is missing. + +- `vault/policy.toml` — Maps every formula to low/medium/high. Current low tier: groom-backlog, + triage, reproduce, review-pr. Medium: dev, run-planner, run-gardener, run-predictor, + run-supervisor, run-architect, upgrade-dependency. High: run-publish-site, run-rent-a-human, + add-rpc-method, release. +- `vault/classify.sh` — Shell classifier called by `vault-env.sh`. Returns tier for a given formula. +- `vault/SCHEMA.md` — Documents `blast_radius` override field (string: "low"/"medium"/"high") + that vault action TOMLs can use to override policy defaults. +- `vault/validate.sh` — Validates vault action TOML fields including blast_radius. +- `docker/edge/dispatcher.sh` — Edge dispatcher. Polls ops repo for merged vault PRs and executes + them. Currently fires ALL merged vault PRs without tier differentiation. + +What's missing: the dispatcher does not read blast_radius, does not auto-approve low-tier PRs, +and does not differentiate notification behavior for medium vs high tier. ## Complexity -Files touched: ~14 (7 new, 7 modified) -Gluecode vs greenfield: ~60% gluecode, ~40% greenfield. -Estimated sub-issues: 4-7 depending on fork choices. + +Files touched: 3 +- `docker/edge/dispatcher.sh` — read blast_radius from vault action TOML; for low tier, call + Forgejo API to approve + merge the PR directly (admin token); for medium, post "pending async + review" comment; for high, leave pending (existing behavior) +- `lib/vault.sh` `vault_request()` — include blast_radius in the PR body so the dispatcher + can read it without re-parsing the TOML +- `docs/VAULT.md` — document the three-tier behavior for operators + +Sub-issues: 3 +Gluecode ratio: ~70% gluecode (dispatcher reads existing classify.sh output), ~30% new (auto-approve API call, comment logic) ## Risks -1. Classification errors on consequential operations. Default-deny mitigates: unknown formula → high. -2. Dispatcher complexity. Mitigation: extract to classify.sh, dispatcher delegates. -3. Branch-protection interaction (primary design fork, see below). -## Cost - new infra to maintain -- vault/policy.toml or blast_radius fields — operators update when adding formulas. -- vault/classify.sh — one shell script, shellcheck-covered, no runtime daemon. -- No new services, cron jobs, or agent roles. +- Admin token for auto-approve: the dispatcher needs an admin-level Forgejo token to approve + and merge PRs. Currently `FORGE_TOKEN` is used; branch protection has `admin_enforced: true` + which means even admin bots are blocked from bypassing the approval gate. This is the core + design fork: either (a) relax admin_enforced for low-tier PRs, or (b) use a separate + Forgejo "auto-approver" account with admin rights, or (c) bypass the PR workflow entirely + for low-tier actions (execute directly without a PR). +- Policy drift: as new formulas are added, policy.toml must be updated. If a formula is missing, + classify.sh should default to "high" (fail safe). Currently the default behavior is unknown — + this needs to be hardened. +- Audit trail: low-tier auto-approvals should still leave a record. Auto-approve comment + ("auto-approved: low blast radius") satisfies this. + +## Cost — new infra to maintain + +- One new Forgejo account or token (if auto-approver route chosen) — needs rotation policy +- `policy.toml` maintenance: every new formula must be classified before shipping +- No new services, cron jobs, or containers ## Recommendation -Worth it. Vault redesign done; blast-radius tiers are the natural next step. Primary reason -agents cannot operate continuously is that every vault action blocks on human availability. ---- +Worth it, but the design fork on auto-approve mechanism must be resolved before implementation +begins — this is the questions step. -## Design forks +The cleanest approach is option (c): bypass the PR workflow for low-tier actions entirely. +The dispatcher detects blast_radius=low, executes the formula immediately without creating +a PR, and writes to `vault/fired/` directly. This avoids the admin token problem, preserves +the PR workflow for medium/high, and keeps the audit trail in git. However, it changes the +blast_radius=low behavior from "PR exists but auto-merges" to "no PR, just executes" — operators +need to understand the difference. -Three decisions must be made before implementation begins. - -### Fork 1 (Critical): Auto-approve merge mechanism - -Branch protection on the ops repo requires `required_approvals: 1` and `admin_enforced: true`. -For low-tier vault PRs, the dispatcher must merge without a human approval. - -**A. Skip PR entirely for low-tier** -vault-bot commits directly to `vault/actions/` on main using admin token. No PR created. -Dispatcher detects new TOML file by absence of `.result.json`. -- Simplest dispatcher code -- No PR audit trail for low-tier executions -- `FORGE_ADMIN_TOKEN` already exists in vault env (used by `is_user_admin()`) - -**B. Dispatcher self-approves low-tier PRs** -vault-bot creates PR as today, then immediately posts an APPROVED review using its own token, -then merges. vault-bot needs Forgejo admin role so `admin_enforced: true` does not block it. -- Full PR audit trail for all tiers -- Requires granting vault-bot admin role on Forgejo - -**C. Tier-aware branch protection** -Create a separate Forgejo protection rule for `vault/*` branch pattern with `required_approvals: 0`. -Main branch protection stays unchanged. vault-bot merges low-tier PRs directly. -- No new accounts or elevated role for vault-bot -- Protection rules are in Forgejo admin UI, not code (harder to version) -- Forgejo `vault/*` glob support needs verification - -**D. Dedicated auto-approve bot** -Create a `vault-auto-bot` Forgejo account with admin role that auto-approves low-tier PRs. -Cleanest trust separation; most operational overhead. - ---- - -### Fork 2 (Secondary): Policy storage format - -Where does the formula → tier mapping live? - -**A. `vault/policy.toml` in disinto repo** -Flat TOML: `formula = "tier"`. classify.sh reads it at runtime. -Unknown formulas default to `high`. Changing policy requires a disinto PR. - -**B. `blast_radius` field in each `formulas/*.toml`** -Add `blast_radius = "low"|"medium"|"high"` to each formula TOML. -classify.sh reads the target formula TOML for its tier. -Co-located with formula — impossible to add a formula without declaring its risk. - -**C. `vault/policy.toml` in ops repo** -Same format as A but lives in the ops repo. Operators update without a disinto PR. -Useful for per-deployment overrides. - -**D. Hybrid: formula TOML default + ops override** -Formula TOML carries a default tier. Ops `vault/policy.toml` can override per-deployment. -Most flexible; classify.sh must merge two sources. - ---- - -### Fork 3 (Secondary): Medium-tier dev-loop behavior - -When dev-agent creates a vault PR for a medium-tier action, what does it do while waiting? - -**A. Non-blocking: fire and continue immediately** -Agent creates vault PR and moves to next issue without waiting. -Maximum autonomy; sequencing becomes unpredictable. - -**B. Soft-block with 2-hour timeout** -Agent waits up to 2 hours polling for vault PR merge. If no response, continues. -Balances oversight with velocity. - -**C. Status-quo block (medium = high)** -Medium-tier blocks the agent loop like high-tier today. Only low-tier actions unblocked. -Simplest behavior change — no modification to dev-agent flow needed. - -**D. Label-based approval signal** -Agent polls for a `vault-approved` label on the vault PR instead of waiting for merge. -Decouples "approved to continue" from "PR merged and executed." - ---- - -## Proposed sub-issues - -### Core (always filed regardless of fork choices) - -**Sub-issue 1: vault/classify.sh — classification engine** -Implement `vault/classify.sh`: reads formula name, secrets, optional `blast_radius` override, -applies policy rules, outputs tier (`low|medium|high`). Default-deny: unknown → `high`. -Files: `vault/classify.sh` (new), `vault/vault-env.sh` (call classify) - -**Sub-issue 2: docs/BLAST-RADIUS.md and SCHEMA.md update** -Write `docs/BLAST-RADIUS.md`. Add optional `blast_radius` field to `vault/SCHEMA.md` -and validator. -Files: `docs/BLAST-RADIUS.md` (new), `vault/SCHEMA.md`, `vault/vault-env.sh` - -**Sub-issue 3: Update prerequisites.md** -Mark vault redesign (#73-#77) as DONE (stale). Add blast-radius tiers to the tree. -Files: `disinto-ops/prerequisites.md` - -### Fork 1 variants (pick one) - -**1A** — Modify `lib/vault.sh` to skip PR for low-tier, commit directly to main. -Modify `dispatcher.sh` to skip `verify_admin_merged()` for low-tier TOMLs. - -**1B** — Modify `dispatcher.sh` to post APPROVED review + merge for low-tier. -Grant vault-bot admin role in Forgejo setup scripts. - -**1C** — Add `setup_vault_branch_protection_tiered()` to `lib/branch-protection.sh` -with `required_approvals: 0` for `vault/*` pattern (verify Forgejo glob support first). - -**1D** — Add `vault-auto-bot` account to `forge-setup.sh`. Implement approval watcher. - -### Fork 2 variants (pick one) - -**2A** — Create `vault/policy.toml` in disinto repo. classify.sh reads it. - -**2B** — Add `blast_radius` field to all 15 `formulas/*.toml`. classify.sh reads formula TOML. - -**2C** — Create `disinto-ops/vault/policy.toml`. classify.sh reads ops copy at runtime. - -**2D** — Two-pass classify.sh: formula TOML default, ops policy override. - -### Fork 3 variants (pick one) - -**3A** — Non-blocking: `lib/vault.sh` returns immediately after PR creation for all tiers. - -**3B** — Soft-block: poll medium-tier PR every 15 min for up to 2 hours. - -**3C** — No change: medium-tier behavior unchanged (only low-tier unblocked). - -**3D** — Create `vault-approved` label. Modify `lib/vault.sh` medium path to poll label. +The PR route (option b) is more visible but requires a dedicated account. \ No newline at end of file