architect: vault blast-radius tiers #9

Merged
disinto-admin merged 2 commits from architect/vault-blast-radius-tiers into main 2026-04-08 19:08:05 +00:00
Showing only changes of commit 708901e4e2 - Show all commits

View file

@ -24,64 +24,154 @@ The vault redesign (#73-#77) is complete and all five issues are closed:
Currently every vault request goes through the same hard-block path regardless of risk. Currently every vault request goes through the same hard-block path regardless of risk.
No classification layer exists. All formulas share the same single approval tier. No classification layer exists. All formulas share the same single approval tier.
Note: prerequisites.md says vault redesign is incomplete - this is stale. All #73-#77
issues are closed as of the current state.
## Complexity ## Complexity
Files touched: ~14 (7 new, 7 modified) Files touched: ~14 (7 new, 7 modified)
Gluecode vs greenfield: ~60% gluecode, ~40% greenfield.
New files: Estimated sub-issues: 4-7 depending on fork choices.
- vault/classify.sh - classification engine, ~150 lines: path glob matching, secret risk scoring, formula risk lookup
- vault/policy.toml - human-editable policy rules mapping path patterns to tier assignments
- vault/formula-risks.toml - formula-to-risk-level mapping (release=high, gardener=low)
- docs/BLAST-RADIUS.md - documentation
- 2-3 test helpers
Modified files:
- vault/SCHEMA.md - optional blast_radius override field
- vault/vault-env.sh - call classify.sh, validate classification
- lib/vault.sh - attach computed tier as PR label; skip PR creation for auto-approve tier
- docker/edge/dispatcher.sh - enforce tier policy: auto-merge low, route medium, hard-block high
- lib/branch-protection.sh - potentially vary required-approvals by tier
Gluecode vs greenfield: ~60% gluecode, ~40% greenfield (classification engine and policy format).
Estimated sub-issues: 4-5
## Risks ## Risks
1. Classification errors on consequential operations. Classification is deterministic 1. Classification errors on consequential operations. Default-deny mitigates: unknown formula → high.
(pattern matching, not AI judgment), but a misconfigured policy.toml could auto-approve 2. Dispatcher complexity. Mitigation: extract to classify.sh, dispatcher delegates.
something that should hard-block. Mitigation: default-deny all unknown patterns; policy 3. Branch-protection interaction (primary design fork, see below).
changes require human review.
2. Dispatcher complexity. dispatcher.sh is already 1005 lines. Adding three code paths adds
~150 lines. Mitigation: extract classification to classify.sh so dispatcher delegates,
not decides.
3. Branch-protection interaction. Auto-approve tier means the dispatcher merges without human
approval. branch-protection.sh currently requires 1 approval; the dispatcher must bypass
this for auto-approve tier. Requires admin token in vault runner, or branch protection must
become tier-aware. This is the primary design fork.
4. Stale prerequisites.md. Should be updated as part of execution.
## Cost - new infra to maintain ## Cost - new infra to maintain
- vault/policy.toml - operators must keep current as new formulas are added. Unknown formulas - vault/policy.toml or blast_radius fields — operators update when adding formulas.
default to HIGH (safe, forces manual approval). - vault/classify.sh — one shell script, shellcheck-covered, no runtime daemon.
- vault/classify.sh - one shell script, shellcheck-covered, no runtime daemon.
- No new services, cron jobs, or agent roles. - No new services, cron jobs, or agent roles.
Ongoing cost is low.
## Recommendation ## Recommendation
Worth it. The vault redesign is done; blast-radius tiers are the logical next step to make Worth it. Vault redesign done; blast-radius tiers are the natural next step. Primary reason
it usable in practice. The bottleneck today forces human approval on every vault action, agents cannot operate continuously is that every vault action blocks on human availability.
which is the primary reason agents cannot operate continuously. This sprint has clear scope
(~14 files, 4-5 sub-issues), low new maintenance cost, and directly unblocks autonomous
operation in the Foundation phase.
The branch-protection and admin-token interaction (Risk 3) is the only design fork worth
resolving before implementation. Everything else is straightforward.
--- ---
Reply ACCEPT to proceed with design questions, or REJECT: reason to decline.
## Design forks
Three decisions must be made before implementation begins.
### Fork 1 (Critical): Auto-approve merge mechanism
Branch protection on the ops repo requires `required_approvals: 1` and `admin_enforced: true`.
For low-tier vault PRs, the dispatcher must merge without a human approval.
**A. Skip PR entirely for low-tier**
vault-bot commits directly to `vault/actions/` on main using admin token. No PR created.
Dispatcher detects new TOML file by absence of `.result.json`.
- Simplest dispatcher code
- No PR audit trail for low-tier executions
- `FORGE_ADMIN_TOKEN` already exists in vault env (used by `is_user_admin()`)
**B. Dispatcher self-approves low-tier PRs**
vault-bot creates PR as today, then immediately posts an APPROVED review using its own token,
then merges. vault-bot needs Forgejo admin role so `admin_enforced: true` does not block it.
- Full PR audit trail for all tiers
- Requires granting vault-bot admin role on Forgejo
**C. Tier-aware branch protection**
Create a separate Forgejo protection rule for `vault/*` branch pattern with `required_approvals: 0`.
Main branch protection stays unchanged. vault-bot merges low-tier PRs directly.
- No new accounts or elevated role for vault-bot
- Protection rules are in Forgejo admin UI, not code (harder to version)
- Forgejo `vault/*` glob support needs verification
**D. Dedicated auto-approve bot**
Create a `vault-auto-bot` Forgejo account with admin role that auto-approves low-tier PRs.
Cleanest trust separation; most operational overhead.
---
### Fork 2 (Secondary): Policy storage format
Where does the formula → tier mapping live?
**A. `vault/policy.toml` in disinto repo**
Flat TOML: `formula = "tier"`. classify.sh reads it at runtime.
Unknown formulas default to `high`. Changing policy requires a disinto PR.
**B. `blast_radius` field in each `formulas/*.toml`**
Add `blast_radius = "low"|"medium"|"high"` to each formula TOML.
classify.sh reads the target formula TOML for its tier.
Co-located with formula — impossible to add a formula without declaring its risk.
**C. `vault/policy.toml` in ops repo**
Same format as A but lives in the ops repo. Operators update without a disinto PR.
Useful for per-deployment overrides.
**D. Hybrid: formula TOML default + ops override**
Formula TOML carries a default tier. Ops `vault/policy.toml` can override per-deployment.
Most flexible; classify.sh must merge two sources.
---
### Fork 3 (Secondary): Medium-tier dev-loop behavior
When dev-agent creates a vault PR for a medium-tier action, what does it do while waiting?
**A. Non-blocking: fire and continue immediately**
Agent creates vault PR and moves to next issue without waiting.
Maximum autonomy; sequencing becomes unpredictable.
**B. Soft-block with 2-hour timeout**
Agent waits up to 2 hours polling for vault PR merge. If no response, continues.
Balances oversight with velocity.
**C. Status-quo block (medium = high)**
Medium-tier blocks the agent loop like high-tier today. Only low-tier actions unblocked.
Simplest behavior change — no modification to dev-agent flow needed.
**D. Label-based approval signal**
Agent polls for a `vault-approved` label on the vault PR instead of waiting for merge.
Decouples "approved to continue" from "PR merged and executed."
---
## Proposed sub-issues
### Core (always filed regardless of fork choices)
**Sub-issue 1: vault/classify.sh — classification engine**
Implement `vault/classify.sh`: reads formula name, secrets, optional `blast_radius` override,
applies policy rules, outputs tier (`low|medium|high`). Default-deny: unknown → `high`.
Files: `vault/classify.sh` (new), `vault/vault-env.sh` (call classify)
**Sub-issue 2: docs/BLAST-RADIUS.md and SCHEMA.md update**
Write `docs/BLAST-RADIUS.md`. Add optional `blast_radius` field to `vault/SCHEMA.md`
and validator.
Files: `docs/BLAST-RADIUS.md` (new), `vault/SCHEMA.md`, `vault/vault-env.sh`
**Sub-issue 3: Update prerequisites.md**
Mark vault redesign (#73-#77) as DONE (stale). Add blast-radius tiers to the tree.
Files: `disinto-ops/prerequisites.md`
### Fork 1 variants (pick one)
**1A** — Modify `lib/vault.sh` to skip PR for low-tier, commit directly to main.
Modify `dispatcher.sh` to skip `verify_admin_merged()` for low-tier TOMLs.
**1B** — Modify `dispatcher.sh` to post APPROVED review + merge for low-tier.
Grant vault-bot admin role in Forgejo setup scripts.
**1C** — Add `setup_vault_branch_protection_tiered()` to `lib/branch-protection.sh`
with `required_approvals: 0` for `vault/*` pattern (verify Forgejo glob support first).
**1D** — Add `vault-auto-bot` account to `forge-setup.sh`. Implement approval watcher.
### Fork 2 variants (pick one)
**2A** — Create `vault/policy.toml` in disinto repo. classify.sh reads it.
**2B** — Add `blast_radius` field to all 15 `formulas/*.toml`. classify.sh reads formula TOML.
**2C** — Create `disinto-ops/vault/policy.toml`. classify.sh reads ops copy at runtime.
**2D** — Two-pass classify.sh: formula TOML default, ops policy override.
### Fork 3 variants (pick one)
**3A** — Non-blocking: `lib/vault.sh` returns immediately after PR creation for all tiers.
**3B** — Soft-block: poll medium-tier PR every 15 min for up to 2 hours.
**3C** — No change: medium-tier behavior unchanged (only low-tier unblocked).
**3D** — Create `vault-approved` label. Modify `lib/vault.sh` medium path to poll label.