Sprint: Vault blast-radius tiers

Vision issues

#419 — Vault: blast-radius based approval tiers

What this enables

After this sprint, vault operations are classified by blast radius — low-risk operations (docs, feature-branch edits) flow through without human gating; medium-risk operations (CI config, Dockerfile changes) queue for async review; high-risk operations (production deploys, secrets rotation, agent self-modification) hard-block as today.

The practical effect: the dev loop no longer stalls waiting for human approval of routine operations. Agents can move autonomously through 80%+ of vault requests while preserving the safety contract on irreversible operations.

What exists today

The vault redesign (#73-#77) is complete and all five issues are closed:

lib/vault.sh - idempotent vault PR creation via Forgejo API
docker/edge/dispatcher.sh - polls merged vault PRs, verifies admin approval, launches runners
vault/vault-env.sh - TOML validation for vault action files
vault/SCHEMA.md - vault action TOML schema
lib/branch-protection.sh - admin-only merge enforcement on ops repo

Currently every vault request goes through the same hard-block path regardless of risk. No classification layer exists. All formulas share the same single approval tier.

Note: prerequisites.md says vault redesign is incomplete - this is stale. All #73-#77 issues are closed as of the current state.

Complexity

Files touched: ~14 (7 new, 7 modified)

New files:

vault/classify.sh - classification engine, ~150 lines: path glob matching, secret risk scoring, formula risk lookup
vault/policy.toml - human-editable policy rules mapping path patterns to tier assignments
vault/formula-risks.toml - formula-to-risk-level mapping (release=high, gardener=low)
docs/BLAST-RADIUS.md - documentation
2-3 test helpers

Modified files:

vault/SCHEMA.md - optional blast_radius override field
vault/vault-env.sh - call classify.sh, validate classification
lib/vault.sh - attach computed tier as PR label; skip PR creation for auto-approve tier
docker/edge/dispatcher.sh - enforce tier policy: auto-merge low, route medium, hard-block high
lib/branch-protection.sh - potentially vary required-approvals by tier

Gluecode vs greenfield: ~60% gluecode, ~40% greenfield (classification engine and policy format).

Estimated sub-issues: 4-5

Risks

Classification errors on consequential operations. Classification is deterministic (pattern matching, not AI judgment), but a misconfigured policy.toml could auto-approve something that should hard-block. Mitigation: default-deny all unknown patterns; policy changes require human review.
Dispatcher complexity. dispatcher.sh is already 1005 lines. Adding three code paths adds ~150 lines. Mitigation: extract classification to classify.sh so dispatcher delegates, not decides.
Branch-protection interaction. Auto-approve tier means the dispatcher merges without human approval. branch-protection.sh currently requires 1 approval; the dispatcher must bypass this for auto-approve tier. Requires admin token in vault runner, or branch protection must become tier-aware. This is the primary design fork.
Stale prerequisites.md. Should be updated as part of execution.

Cost - new infra to maintain

vault/policy.toml - operators must keep current as new formulas are added. Unknown formulas default to HIGH (safe, forces manual approval).
vault/classify.sh - one shell script, shellcheck-covered, no runtime daemon.
No new services, cron jobs, or agent roles.

Ongoing cost is low.

Recommendation

Worth it. The vault redesign is done; blast-radius tiers are the logical next step to make it usable in practice. The bottleneck today forces human approval on every vault action, which is the primary reason agents cannot operate continuously. This sprint has clear scope (~14 files, 4-5 sub-issues), low new maintenance cost, and directly unblocks autonomous operation in the Foundation phase.

The branch-protection and admin-token interaction (Risk 3) is the only design fork worth resolving before implementation. Everything else is straightforward.

Reply ACCEPT to proceed with design questions, or REJECT: reason to decline.

4.2 KiB Raw Blame History