architect: vault blast-radius tiers #12
1 changed files with 58 additions and 158 deletions
|
|
@ -1,177 +1,77 @@
|
||||||
# Sprint: Vault blast-radius tiers
|
# Sprint: vault blast-radius tiers
|
||||||
|
|
||||||
## Vision issues
|
## Vision issues
|
||||||
- #419 — Vault: blast-radius based approval tiers
|
- #419 — Vault: blast-radius based approval tiers
|
||||||
|
|
||||||
## What this enables
|
## What this enables
|
||||||
After this sprint, vault operations are classified by blast radius — low-risk operations
|
After this sprint, low-tier vault actions execute without waiting for a human. The dispatcher
|
||||||
(docs, feature-branch edits) flow through without human gating; medium-risk operations
|
auto-approves and merges vault PRs classified as `low` in `policy.toml`. Medium and high tiers
|
||||||
(CI config, Dockerfile changes) queue for async review; high-risk operations (production
|
are unchanged: medium notifies and allows async review; high blocks until admin approves.
|
||||||
deploys, secrets rotation, agent self-modification) hard-block as today.
|
|
||||||
|
|
||||||
The practical effect: the dev loop no longer stalls waiting for human approval of routine
|
This removes the bottleneck on low-risk bookkeeping operations while preserving the hard gate
|
||||||
operations. Agents can move autonomously through 80%+ of vault requests while preserving
|
on production deploys, secret operations, and agent self-modification.
|
||||||
the safety contract on irreversible operations.
|
|
||||||
|
|
||||||
## What exists today
|
## What exists today
|
||||||
The vault redesign (#73-#77) is complete and all five issues are closed:
|
|
||||||
- lib/vault.sh - idempotent vault PR creation via Forgejo API
|
|
||||||
- docker/edge/dispatcher.sh - polls merged vault PRs, verifies admin approval, launches runners
|
|
||||||
- vault/vault-env.sh - TOML validation for vault action files
|
|
||||||
- vault/SCHEMA.md - vault action TOML schema
|
|
||||||
- lib/branch-protection.sh - admin-only merge enforcement on ops repo
|
|
||||||
|
|
||||||
Currently every vault request goes through the same hard-block path regardless of risk.
|
The tier infrastructure is fully built. Only the enforcement is missing.
|
||||||
No classification layer exists. All formulas share the same single approval tier.
|
|
||||||
|
- `vault/policy.toml` — Maps every formula to low/medium/high. Current low tier: groom-backlog,
|
||||||
|
triage, reproduce, review-pr. Medium: dev, run-planner, run-gardener, run-predictor,
|
||||||
|
run-supervisor, run-architect, upgrade-dependency. High: run-publish-site, run-rent-a-human,
|
||||||
|
add-rpc-method, release.
|
||||||
|
- `vault/classify.sh` — Shell classifier called by `vault-env.sh`. Returns tier for a given formula.
|
||||||
|
- `vault/SCHEMA.md` — Documents `blast_radius` override field (string: "low"/"medium"/"high")
|
||||||
|
that vault action TOMLs can use to override policy defaults.
|
||||||
|
- `vault/validate.sh` — Validates vault action TOML fields including blast_radius.
|
||||||
|
- `docker/edge/dispatcher.sh` — Edge dispatcher. Polls ops repo for merged vault PRs and executes
|
||||||
|
them. Currently fires ALL merged vault PRs without tier differentiation.
|
||||||
|
|
||||||
|
What's missing: the dispatcher does not read blast_radius, does not auto-approve low-tier PRs,
|
||||||
|
and does not differentiate notification behavior for medium vs high tier.
|
||||||
|
|
||||||
## Complexity
|
## Complexity
|
||||||
Files touched: ~14 (7 new, 7 modified)
|
|
||||||
Gluecode vs greenfield: ~60% gluecode, ~40% greenfield.
|
Files touched: 3
|
||||||
Estimated sub-issues: 4-7 depending on fork choices.
|
- `docker/edge/dispatcher.sh` — read blast_radius from vault action TOML; for low tier, call
|
||||||
|
Forgejo API to approve + merge the PR directly (admin token); for medium, post "pending async
|
||||||
|
review" comment; for high, leave pending (existing behavior)
|
||||||
|
- `lib/vault.sh` `vault_request()` — include blast_radius in the PR body so the dispatcher
|
||||||
|
can read it without re-parsing the TOML
|
||||||
|
- `docs/VAULT.md` — document the three-tier behavior for operators
|
||||||
|
|
||||||
|
Sub-issues: 3
|
||||||
|
Gluecode ratio: ~70% gluecode (dispatcher reads existing classify.sh output), ~30% new (auto-approve API call, comment logic)
|
||||||
|
|
||||||
## Risks
|
## Risks
|
||||||
1. Classification errors on consequential operations. Default-deny mitigates: unknown formula → high.
|
|
||||||
2. Dispatcher complexity. Mitigation: extract to classify.sh, dispatcher delegates.
|
|
||||||
3. Branch-protection interaction (primary design fork, see below).
|
|
||||||
|
|
||||||
## Cost - new infra to maintain
|
- Admin token for auto-approve: the dispatcher needs an admin-level Forgejo token to approve
|
||||||
- vault/policy.toml or blast_radius fields — operators update when adding formulas.
|
and merge PRs. Currently `FORGE_TOKEN` is used; branch protection has `admin_enforced: true`
|
||||||
- vault/classify.sh — one shell script, shellcheck-covered, no runtime daemon.
|
which means even admin bots are blocked from bypassing the approval gate. This is the core
|
||||||
- No new services, cron jobs, or agent roles.
|
design fork: either (a) relax admin_enforced for low-tier PRs, or (b) use a separate
|
||||||
|
Forgejo "auto-approver" account with admin rights, or (c) bypass the PR workflow entirely
|
||||||
|
for low-tier actions (execute directly without a PR).
|
||||||
|
- Policy drift: as new formulas are added, policy.toml must be updated. If a formula is missing,
|
||||||
|
classify.sh should default to "high" (fail safe). Currently the default behavior is unknown —
|
||||||
|
this needs to be hardened.
|
||||||
|
- Audit trail: low-tier auto-approvals should still leave a record. Auto-approve comment
|
||||||
|
("auto-approved: low blast radius") satisfies this.
|
||||||
|
|
||||||
|
## Cost — new infra to maintain
|
||||||
|
|
||||||
|
- One new Forgejo account or token (if auto-approver route chosen) — needs rotation policy
|
||||||
|
- `policy.toml` maintenance: every new formula must be classified before shipping
|
||||||
|
- No new services, cron jobs, or containers
|
||||||
|
|
||||||
## Recommendation
|
## Recommendation
|
||||||
Worth it. Vault redesign done; blast-radius tiers are the natural next step. Primary reason
|
|
||||||
agents cannot operate continuously is that every vault action blocks on human availability.
|
|
||||||
|
|
||||||
---
|
Worth it, but the design fork on auto-approve mechanism must be resolved before implementation
|
||||||
|
begins — this is the questions step.
|
||||||
|
|
||||||
## Design forks
|
The cleanest approach is option (c): bypass the PR workflow for low-tier actions entirely.
|
||||||
|
The dispatcher detects blast_radius=low, executes the formula immediately without creating
|
||||||
|
a PR, and writes to `vault/fired/` directly. This avoids the admin token problem, preserves
|
||||||
|
the PR workflow for medium/high, and keeps the audit trail in git. However, it changes the
|
||||||
|
blast_radius=low behavior from "PR exists but auto-merges" to "no PR, just executes" — operators
|
||||||
|
need to understand the difference.
|
||||||
|
|
||||||
Three decisions must be made before implementation begins.
|
The PR route (option b) is more visible but requires a dedicated account.
|
||||||
|
|
||||||
### Fork 1 (Critical): Auto-approve merge mechanism
|
|
||||||
|
|
||||||
Branch protection on the ops repo requires `required_approvals: 1` and `admin_enforced: true`.
|
|
||||||
For low-tier vault PRs, the dispatcher must merge without a human approval.
|
|
||||||
|
|
||||||
**A. Skip PR entirely for low-tier**
|
|
||||||
vault-bot commits directly to `vault/actions/` on main using admin token. No PR created.
|
|
||||||
Dispatcher detects new TOML file by absence of `.result.json`.
|
|
||||||
- Simplest dispatcher code
|
|
||||||
- No PR audit trail for low-tier executions
|
|
||||||
- `FORGE_ADMIN_TOKEN` already exists in vault env (used by `is_user_admin()`)
|
|
||||||
|
|
||||||
**B. Dispatcher self-approves low-tier PRs**
|
|
||||||
vault-bot creates PR as today, then immediately posts an APPROVED review using its own token,
|
|
||||||
then merges. vault-bot needs Forgejo admin role so `admin_enforced: true` does not block it.
|
|
||||||
- Full PR audit trail for all tiers
|
|
||||||
- Requires granting vault-bot admin role on Forgejo
|
|
||||||
|
|
||||||
**C. Tier-aware branch protection**
|
|
||||||
Create a separate Forgejo protection rule for `vault/*` branch pattern with `required_approvals: 0`.
|
|
||||||
Main branch protection stays unchanged. vault-bot merges low-tier PRs directly.
|
|
||||||
- No new accounts or elevated role for vault-bot
|
|
||||||
- Protection rules are in Forgejo admin UI, not code (harder to version)
|
|
||||||
- Forgejo `vault/*` glob support needs verification
|
|
||||||
|
|
||||||
**D. Dedicated auto-approve bot**
|
|
||||||
Create a `vault-auto-bot` Forgejo account with admin role that auto-approves low-tier PRs.
|
|
||||||
Cleanest trust separation; most operational overhead.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Fork 2 (Secondary): Policy storage format
|
|
||||||
|
|
||||||
Where does the formula → tier mapping live?
|
|
||||||
|
|
||||||
**A. `vault/policy.toml` in disinto repo**
|
|
||||||
Flat TOML: `formula = "tier"`. classify.sh reads it at runtime.
|
|
||||||
Unknown formulas default to `high`. Changing policy requires a disinto PR.
|
|
||||||
|
|
||||||
**B. `blast_radius` field in each `formulas/*.toml`**
|
|
||||||
Add `blast_radius = "low"|"medium"|"high"` to each formula TOML.
|
|
||||||
classify.sh reads the target formula TOML for its tier.
|
|
||||||
Co-located with formula — impossible to add a formula without declaring its risk.
|
|
||||||
|
|
||||||
**C. `vault/policy.toml` in ops repo**
|
|
||||||
Same format as A but lives in the ops repo. Operators update without a disinto PR.
|
|
||||||
Useful for per-deployment overrides.
|
|
||||||
|
|
||||||
**D. Hybrid: formula TOML default + ops override**
|
|
||||||
Formula TOML carries a default tier. Ops `vault/policy.toml` can override per-deployment.
|
|
||||||
Most flexible; classify.sh must merge two sources.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Fork 3 (Secondary): Medium-tier dev-loop behavior
|
|
||||||
|
|
||||||
When dev-agent creates a vault PR for a medium-tier action, what does it do while waiting?
|
|
||||||
|
|
||||||
**A. Non-blocking: fire and continue immediately**
|
|
||||||
Agent creates vault PR and moves to next issue without waiting.
|
|
||||||
Maximum autonomy; sequencing becomes unpredictable.
|
|
||||||
|
|
||||||
**B. Soft-block with 2-hour timeout**
|
|
||||||
Agent waits up to 2 hours polling for vault PR merge. If no response, continues.
|
|
||||||
Balances oversight with velocity.
|
|
||||||
|
|
||||||
**C. Status-quo block (medium = high)**
|
|
||||||
Medium-tier blocks the agent loop like high-tier today. Only low-tier actions unblocked.
|
|
||||||
Simplest behavior change — no modification to dev-agent flow needed.
|
|
||||||
|
|
||||||
**D. Label-based approval signal**
|
|
||||||
Agent polls for a `vault-approved` label on the vault PR instead of waiting for merge.
|
|
||||||
Decouples "approved to continue" from "PR merged and executed."
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Proposed sub-issues
|
|
||||||
|
|
||||||
### Core (always filed regardless of fork choices)
|
|
||||||
|
|
||||||
**Sub-issue 1: vault/classify.sh — classification engine**
|
|
||||||
Implement `vault/classify.sh`: reads formula name, secrets, optional `blast_radius` override,
|
|
||||||
applies policy rules, outputs tier (`low|medium|high`). Default-deny: unknown → `high`.
|
|
||||||
Files: `vault/classify.sh` (new), `vault/vault-env.sh` (call classify)
|
|
||||||
|
|
||||||
**Sub-issue 2: docs/BLAST-RADIUS.md and SCHEMA.md update**
|
|
||||||
Write `docs/BLAST-RADIUS.md`. Add optional `blast_radius` field to `vault/SCHEMA.md`
|
|
||||||
and validator.
|
|
||||||
Files: `docs/BLAST-RADIUS.md` (new), `vault/SCHEMA.md`, `vault/vault-env.sh`
|
|
||||||
|
|
||||||
**Sub-issue 3: Update prerequisites.md**
|
|
||||||
Mark vault redesign (#73-#77) as DONE (stale). Add blast-radius tiers to the tree.
|
|
||||||
Files: `disinto-ops/prerequisites.md`
|
|
||||||
|
|
||||||
### Fork 1 variants (pick one)
|
|
||||||
|
|
||||||
**1A** — Modify `lib/vault.sh` to skip PR for low-tier, commit directly to main.
|
|
||||||
Modify `dispatcher.sh` to skip `verify_admin_merged()` for low-tier TOMLs.
|
|
||||||
|
|
||||||
**1B** — Modify `dispatcher.sh` to post APPROVED review + merge for low-tier.
|
|
||||||
Grant vault-bot admin role in Forgejo setup scripts.
|
|
||||||
|
|
||||||
**1C** — Add `setup_vault_branch_protection_tiered()` to `lib/branch-protection.sh`
|
|
||||||
with `required_approvals: 0` for `vault/*` pattern (verify Forgejo glob support first).
|
|
||||||
|
|
||||||
**1D** — Add `vault-auto-bot` account to `forge-setup.sh`. Implement approval watcher.
|
|
||||||
|
|
||||||
### Fork 2 variants (pick one)
|
|
||||||
|
|
||||||
**2A** — Create `vault/policy.toml` in disinto repo. classify.sh reads it.
|
|
||||||
|
|
||||||
**2B** — Add `blast_radius` field to all 15 `formulas/*.toml`. classify.sh reads formula TOML.
|
|
||||||
|
|
||||||
**2C** — Create `disinto-ops/vault/policy.toml`. classify.sh reads ops copy at runtime.
|
|
||||||
|
|
||||||
**2D** — Two-pass classify.sh: formula TOML default, ops policy override.
|
|
||||||
|
|
||||||
### Fork 3 variants (pick one)
|
|
||||||
|
|
||||||
**3A** — Non-blocking: `lib/vault.sh` returns immediately after PR creation for all tiers.
|
|
||||||
|
|
||||||
**3B** — Soft-block: poll medium-tier PR every 15 min for up to 2 hours.
|
|
||||||
|
|
||||||
**3C** — No change: medium-tier behavior unchanged (only low-tier unblocked).
|
|
||||||
|
|
||||||
**3D** — Create `vault-approved` label. Modify `lib/vault.sh` medium path to poll label.
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue