2026-04-08 19:08:05 +00:00
1 changed files with 87 additions and 0 deletions
--- a/sprints/vault-blast-radius-tiers.md
+++ b/sprints/vault-blast-radius-tiers.md
@ -0,0 +1,87 @@
 # Sprint: Vault blast-radius tiers
 ## Vision issues
 - #419 — Vault: blast-radius based approval tiers
 ## What this enables
 After this sprint, vault operations are classified by blast radius — low-risk operations
 (docs, feature-branch edits) flow through without human gating; medium-risk operations
 (CI config, Dockerfile changes) queue for async review; high-risk operations (production
 deploys, secrets rotation, agent self-modification) hard-block as today.
 The practical effect: the dev loop no longer stalls waiting for human approval of routine
 operations. Agents can move autonomously through 80%+ of vault requests while preserving
 the safety contract on irreversible operations.
 ## What exists today
 The vault redesign (#73-#77) is complete and all five issues are closed:
 - lib/vault.sh - idempotent vault PR creation via Forgejo API
 - docker/edge/dispatcher.sh - polls merged vault PRs, verifies admin approval, launches runners
 - vault/vault-env.sh - TOML validation for vault action files
 - vault/SCHEMA.md - vault action TOML schema
 - lib/branch-protection.sh - admin-only merge enforcement on ops repo
 Currently every vault request goes through the same hard-block path regardless of risk.
 No classification layer exists. All formulas share the same single approval tier.
 Note: prerequisites.md says vault redesign is incomplete - this is stale. All #73-#77
 issues are closed as of the current state.
 ## Complexity
 Files touched: ~14 (7 new, 7 modified)
 New files:
 - vault/classify.sh - classification engine, ~150 lines: path glob matching, secret risk scoring, formula risk lookup
 - vault/policy.toml - human-editable policy rules mapping path patterns to tier assignments
 - vault/formula-risks.toml - formula-to-risk-level mapping (release=high, gardener=low)
 - docs/BLAST-RADIUS.md - documentation
 - 2-3 test helpers
 Modified files:
 - vault/SCHEMA.md - optional blast_radius override field
 - vault/vault-env.sh - call classify.sh, validate classification
 - lib/vault.sh - attach computed tier as PR label; skip PR creation for auto-approve tier
 - docker/edge/dispatcher.sh - enforce tier policy: auto-merge low, route medium, hard-block high
 - lib/branch-protection.sh - potentially vary required-approvals by tier
 Gluecode vs greenfield: ~60% gluecode, ~40% greenfield (classification engine and policy format).
 Estimated sub-issues: 4-5
 ## Risks
 1. Classification errors on consequential operations. Classification is deterministic
   (pattern matching, not AI judgment), but a misconfigured policy.toml could auto-approve
   something that should hard-block. Mitigation: default-deny all unknown patterns; policy
   changes require human review.
 2. Dispatcher complexity. dispatcher.sh is already 1005 lines. Adding three code paths adds
   ~150 lines. Mitigation: extract classification to classify.sh so dispatcher delegates,
   not decides.
 3. Branch-protection interaction. Auto-approve tier means the dispatcher merges without human
   approval. branch-protection.sh currently requires 1 approval; the dispatcher must bypass
   this for auto-approve tier. Requires admin token in vault runner, or branch protection must
   become tier-aware. This is the primary design fork.
 4. Stale prerequisites.md. Should be updated as part of execution.
 ## Cost - new infra to maintain
 - vault/policy.toml - operators must keep current as new formulas are added. Unknown formulas
  default to HIGH (safe, forces manual approval).
 - vault/classify.sh - one shell script, shellcheck-covered, no runtime daemon.
 - No new services, cron jobs, or agent roles.
 Ongoing cost is low.
 ## Recommendation
 Worth it. The vault redesign is done; blast-radius tiers are the logical next step to make
 it usable in practice. The bottleneck today forces human approval on every vault action,
 which is the primary reason agents cannot operate continuously. This sprint has clear scope
 (~14 files, 4-5 sub-issues), low new maintenance cost, and directly unblocks autonomous
 operation in the Foundation phase.
 The branch-protection and admin-token interaction (Risk 3) is the only design fork worth
 resolving before implementation. Everything else is straightforward.
 ---
 Reply ACCEPT to proceed with design questions, or REJECT: reason to decline.