diff --git a/sprints/vault-blast-radius-tiers.md b/sprints/vault-blast-radius-tiers.md new file mode 100644 index 0000000..548b4f5 --- /dev/null +++ b/sprints/vault-blast-radius-tiers.md @@ -0,0 +1,177 @@ +# Sprint: Vault blast-radius tiers + +## Vision issues +- #419 — Vault: blast-radius based approval tiers + +## What this enables +After this sprint, vault operations are classified by blast radius — low-risk operations +(docs, feature-branch edits) flow through without human gating; medium-risk operations +(CI config, Dockerfile changes) queue for async review; high-risk operations (production +deploys, secrets rotation, agent self-modification) hard-block as today. + +The practical effect: the dev loop no longer stalls waiting for human approval of routine +operations. Agents can move autonomously through 80%+ of vault requests while preserving +the safety contract on irreversible operations. + +## What exists today +The vault redesign (#73-#77) is complete and all five issues are closed: +- lib/vault.sh - idempotent vault PR creation via Forgejo API +- docker/edge/dispatcher.sh - polls merged vault PRs, verifies admin approval, launches runners +- vault/vault-env.sh - TOML validation for vault action files +- vault/SCHEMA.md - vault action TOML schema +- lib/branch-protection.sh - admin-only merge enforcement on ops repo + +Currently every vault request goes through the same hard-block path regardless of risk. +No classification layer exists. All formulas share the same single approval tier. + +## Complexity +Files touched: ~14 (7 new, 7 modified) +Gluecode vs greenfield: ~60% gluecode, ~40% greenfield. +Estimated sub-issues: 4-7 depending on fork choices. + +## Risks +1. Classification errors on consequential operations. Default-deny mitigates: unknown formula → high. +2. Dispatcher complexity. Mitigation: extract to classify.sh, dispatcher delegates. +3. Branch-protection interaction (primary design fork, see below). + +## Cost - new infra to maintain +- vault/policy.toml or blast_radius fields — operators update when adding formulas. +- vault/classify.sh — one shell script, shellcheck-covered, no runtime daemon. +- No new services, cron jobs, or agent roles. + +## Recommendation +Worth it. Vault redesign done; blast-radius tiers are the natural next step. Primary reason +agents cannot operate continuously is that every vault action blocks on human availability. + +--- + +## Design forks + +Three decisions must be made before implementation begins. + +### Fork 1 (Critical): Auto-approve merge mechanism + +Branch protection on the ops repo requires `required_approvals: 1` and `admin_enforced: true`. +For low-tier vault PRs, the dispatcher must merge without a human approval. + +**A. Skip PR entirely for low-tier** +vault-bot commits directly to `vault/actions/` on main using admin token. No PR created. +Dispatcher detects new TOML file by absence of `.result.json`. +- Simplest dispatcher code +- No PR audit trail for low-tier executions +- `FORGE_ADMIN_TOKEN` already exists in vault env (used by `is_user_admin()`) + +**B. Dispatcher self-approves low-tier PRs** +vault-bot creates PR as today, then immediately posts an APPROVED review using its own token, +then merges. vault-bot needs Forgejo admin role so `admin_enforced: true` does not block it. +- Full PR audit trail for all tiers +- Requires granting vault-bot admin role on Forgejo + +**C. Tier-aware branch protection** +Create a separate Forgejo protection rule for `vault/*` branch pattern with `required_approvals: 0`. +Main branch protection stays unchanged. vault-bot merges low-tier PRs directly. +- No new accounts or elevated role for vault-bot +- Protection rules are in Forgejo admin UI, not code (harder to version) +- Forgejo `vault/*` glob support needs verification + +**D. Dedicated auto-approve bot** +Create a `vault-auto-bot` Forgejo account with admin role that auto-approves low-tier PRs. +Cleanest trust separation; most operational overhead. + +--- + +### Fork 2 (Secondary): Policy storage format + +Where does the formula → tier mapping live? + +**A. `vault/policy.toml` in disinto repo** +Flat TOML: `formula = "tier"`. classify.sh reads it at runtime. +Unknown formulas default to `high`. Changing policy requires a disinto PR. + +**B. `blast_radius` field in each `formulas/*.toml`** +Add `blast_radius = "low"|"medium"|"high"` to each formula TOML. +classify.sh reads the target formula TOML for its tier. +Co-located with formula — impossible to add a formula without declaring its risk. + +**C. `vault/policy.toml` in ops repo** +Same format as A but lives in the ops repo. Operators update without a disinto PR. +Useful for per-deployment overrides. + +**D. Hybrid: formula TOML default + ops override** +Formula TOML carries a default tier. Ops `vault/policy.toml` can override per-deployment. +Most flexible; classify.sh must merge two sources. + +--- + +### Fork 3 (Secondary): Medium-tier dev-loop behavior + +When dev-agent creates a vault PR for a medium-tier action, what does it do while waiting? + +**A. Non-blocking: fire and continue immediately** +Agent creates vault PR and moves to next issue without waiting. +Maximum autonomy; sequencing becomes unpredictable. + +**B. Soft-block with 2-hour timeout** +Agent waits up to 2 hours polling for vault PR merge. If no response, continues. +Balances oversight with velocity. + +**C. Status-quo block (medium = high)** +Medium-tier blocks the agent loop like high-tier today. Only low-tier actions unblocked. +Simplest behavior change — no modification to dev-agent flow needed. + +**D. Label-based approval signal** +Agent polls for a `vault-approved` label on the vault PR instead of waiting for merge. +Decouples "approved to continue" from "PR merged and executed." + +--- + +## Proposed sub-issues + +### Core (always filed regardless of fork choices) + +**Sub-issue 1: vault/classify.sh — classification engine** +Implement `vault/classify.sh`: reads formula name, secrets, optional `blast_radius` override, +applies policy rules, outputs tier (`low|medium|high`). Default-deny: unknown → `high`. +Files: `vault/classify.sh` (new), `vault/vault-env.sh` (call classify) + +**Sub-issue 2: docs/BLAST-RADIUS.md and SCHEMA.md update** +Write `docs/BLAST-RADIUS.md`. Add optional `blast_radius` field to `vault/SCHEMA.md` +and validator. +Files: `docs/BLAST-RADIUS.md` (new), `vault/SCHEMA.md`, `vault/vault-env.sh` + +**Sub-issue 3: Update prerequisites.md** +Mark vault redesign (#73-#77) as DONE (stale). Add blast-radius tiers to the tree. +Files: `disinto-ops/prerequisites.md` + +### Fork 1 variants (pick one) + +**1A** — Modify `lib/vault.sh` to skip PR for low-tier, commit directly to main. +Modify `dispatcher.sh` to skip `verify_admin_merged()` for low-tier TOMLs. + +**1B** — Modify `dispatcher.sh` to post APPROVED review + merge for low-tier. +Grant vault-bot admin role in Forgejo setup scripts. + +**1C** — Add `setup_vault_branch_protection_tiered()` to `lib/branch-protection.sh` +with `required_approvals: 0` for `vault/*` pattern (verify Forgejo glob support first). + +**1D** — Add `vault-auto-bot` account to `forge-setup.sh`. Implement approval watcher. + +### Fork 2 variants (pick one) + +**2A** — Create `vault/policy.toml` in disinto repo. classify.sh reads it. + +**2B** — Add `blast_radius` field to all 15 `formulas/*.toml`. classify.sh reads formula TOML. + +**2C** — Create `disinto-ops/vault/policy.toml`. classify.sh reads ops copy at runtime. + +**2D** — Two-pass classify.sh: formula TOML default, ops policy override. + +### Fork 3 variants (pick one) + +**3A** — Non-blocking: `lib/vault.sh` returns immediately after PR creation for all tiers. + +**3B** — Soft-block: poll medium-tier PR every 15 min for up to 2 hours. + +**3C** — No change: medium-tier behavior unchanged (only low-tier unblocked). + +**3D** — Create `vault-approved` label. Modify `lib/vault.sh` medium path to poll label.