87 lines
No EOL
4.2 KiB
Markdown
87 lines
No EOL
4.2 KiB
Markdown
# Sprint: Vault blast-radius tiers
|
|
|
|
## Vision issues
|
|
- #419 — Vault: blast-radius based approval tiers
|
|
|
|
## What this enables
|
|
After this sprint, vault operations are classified by blast radius — low-risk operations
|
|
(docs, feature-branch edits) flow through without human gating; medium-risk operations
|
|
(CI config, Dockerfile changes) queue for async review; high-risk operations (production
|
|
deploys, secrets rotation, agent self-modification) hard-block as today.
|
|
|
|
The practical effect: the dev loop no longer stalls waiting for human approval of routine
|
|
operations. Agents can move autonomously through 80%+ of vault requests while preserving
|
|
the safety contract on irreversible operations.
|
|
|
|
## What exists today
|
|
The vault redesign (#73-#77) is complete and all five issues are closed:
|
|
- lib/vault.sh - idempotent vault PR creation via Forgejo API
|
|
- docker/edge/dispatcher.sh - polls merged vault PRs, verifies admin approval, launches runners
|
|
- vault/vault-env.sh - TOML validation for vault action files
|
|
- vault/SCHEMA.md - vault action TOML schema
|
|
- lib/branch-protection.sh - admin-only merge enforcement on ops repo
|
|
|
|
Currently every vault request goes through the same hard-block path regardless of risk.
|
|
No classification layer exists. All formulas share the same single approval tier.
|
|
|
|
Note: prerequisites.md says vault redesign is incomplete - this is stale. All #73-#77
|
|
issues are closed as of the current state.
|
|
|
|
## Complexity
|
|
Files touched: ~14 (7 new, 7 modified)
|
|
|
|
New files:
|
|
- vault/classify.sh - classification engine, ~150 lines: path glob matching, secret risk scoring, formula risk lookup
|
|
- vault/policy.toml - human-editable policy rules mapping path patterns to tier assignments
|
|
- vault/formula-risks.toml - formula-to-risk-level mapping (release=high, gardener=low)
|
|
- docs/BLAST-RADIUS.md - documentation
|
|
- 2-3 test helpers
|
|
|
|
Modified files:
|
|
- vault/SCHEMA.md - optional blast_radius override field
|
|
- vault/vault-env.sh - call classify.sh, validate classification
|
|
- lib/vault.sh - attach computed tier as PR label; skip PR creation for auto-approve tier
|
|
- docker/edge/dispatcher.sh - enforce tier policy: auto-merge low, route medium, hard-block high
|
|
- lib/branch-protection.sh - potentially vary required-approvals by tier
|
|
|
|
Gluecode vs greenfield: ~60% gluecode, ~40% greenfield (classification engine and policy format).
|
|
|
|
Estimated sub-issues: 4-5
|
|
|
|
## Risks
|
|
1. Classification errors on consequential operations. Classification is deterministic
|
|
(pattern matching, not AI judgment), but a misconfigured policy.toml could auto-approve
|
|
something that should hard-block. Mitigation: default-deny all unknown patterns; policy
|
|
changes require human review.
|
|
|
|
2. Dispatcher complexity. dispatcher.sh is already 1005 lines. Adding three code paths adds
|
|
~150 lines. Mitigation: extract classification to classify.sh so dispatcher delegates,
|
|
not decides.
|
|
|
|
3. Branch-protection interaction. Auto-approve tier means the dispatcher merges without human
|
|
approval. branch-protection.sh currently requires 1 approval; the dispatcher must bypass
|
|
this for auto-approve tier. Requires admin token in vault runner, or branch protection must
|
|
become tier-aware. This is the primary design fork.
|
|
|
|
4. Stale prerequisites.md. Should be updated as part of execution.
|
|
|
|
## Cost - new infra to maintain
|
|
- vault/policy.toml - operators must keep current as new formulas are added. Unknown formulas
|
|
default to HIGH (safe, forces manual approval).
|
|
- vault/classify.sh - one shell script, shellcheck-covered, no runtime daemon.
|
|
- No new services, cron jobs, or agent roles.
|
|
|
|
Ongoing cost is low.
|
|
|
|
## Recommendation
|
|
Worth it. The vault redesign is done; blast-radius tiers are the logical next step to make
|
|
it usable in practice. The bottleneck today forces human approval on every vault action,
|
|
which is the primary reason agents cannot operate continuously. This sprint has clear scope
|
|
(~14 files, 4-5 sub-issues), low new maintenance cost, and directly unblocks autonomous
|
|
operation in the Foundation phase.
|
|
|
|
The branch-protection and admin-token interaction (Risk 3) is the only design fork worth
|
|
resolving before implementation. Everything else is straightforward.
|
|
|
|
---
|
|
Reply ACCEPT to proceed with design questions, or REJECT: reason to decline. |