disinto-ops/sprints/vault-blast-radius-tiers.md

7.4 KiB

Sprint: Vault blast-radius tiers

Vision issues

  • #419 — Vault: blast-radius based approval tiers

What this enables

After this sprint, vault operations are classified by blast radius — low-risk operations (docs, feature-branch edits) flow through without human gating; medium-risk operations (CI config, Dockerfile changes) queue for async review; high-risk operations (production deploys, secrets rotation, agent self-modification) hard-block as today.

The practical effect: the dev loop no longer stalls waiting for human approval of routine operations. Agents can move autonomously through 80%+ of vault requests while preserving the safety contract on irreversible operations.

What exists today

The vault redesign (#73-#77) is complete and all five issues are closed:

  • lib/vault.sh - idempotent vault PR creation via Forgejo API
  • docker/edge/dispatcher.sh - polls merged vault PRs, verifies admin approval, launches runners
  • vault/vault-env.sh - TOML validation for vault action files
  • vault/SCHEMA.md - vault action TOML schema
  • lib/branch-protection.sh - admin-only merge enforcement on ops repo

Currently every vault request goes through the same hard-block path regardless of risk. No classification layer exists. All formulas share the same single approval tier.

Complexity

Files touched: ~14 (7 new, 7 modified) Gluecode vs greenfield: ~60% gluecode, ~40% greenfield. Estimated sub-issues: 4-7 depending on fork choices.

Risks

  1. Classification errors on consequential operations. Default-deny mitigates: unknown formula → high.
  2. Dispatcher complexity. Mitigation: extract to classify.sh, dispatcher delegates.
  3. Branch-protection interaction (primary design fork, see below).

Cost - new infra to maintain

  • vault/policy.toml or blast_radius fields — operators update when adding formulas.
  • vault/classify.sh — one shell script, shellcheck-covered, no runtime daemon.
  • No new services, cron jobs, or agent roles.

Recommendation

Worth it. Vault redesign done; blast-radius tiers are the natural next step. Primary reason agents cannot operate continuously is that every vault action blocks on human availability.


Design forks

Three decisions must be made before implementation begins.

Fork 1 (Critical): Auto-approve merge mechanism

Branch protection on the ops repo requires required_approvals: 1 and admin_enforced: true. For low-tier vault PRs, the dispatcher must merge without a human approval.

A. Skip PR entirely for low-tier vault-bot commits directly to vault/actions/ on main using admin token. No PR created. Dispatcher detects new TOML file by absence of .result.json.

  • Simplest dispatcher code
  • No PR audit trail for low-tier executions
  • FORGE_ADMIN_TOKEN already exists in vault env (used by is_user_admin())

B. Dispatcher self-approves low-tier PRs vault-bot creates PR as today, then immediately posts an APPROVED review using its own token, then merges. vault-bot needs Forgejo admin role so admin_enforced: true does not block it.

  • Full PR audit trail for all tiers
  • Requires granting vault-bot admin role on Forgejo

C. Tier-aware branch protection Create a separate Forgejo protection rule for vault/* branch pattern with required_approvals: 0. Main branch protection stays unchanged. vault-bot merges low-tier PRs directly.

  • No new accounts or elevated role for vault-bot
  • Protection rules are in Forgejo admin UI, not code (harder to version)
  • Forgejo vault/* glob support needs verification

D. Dedicated auto-approve bot Create a vault-auto-bot Forgejo account with admin role that auto-approves low-tier PRs. Cleanest trust separation; most operational overhead.


Fork 2 (Secondary): Policy storage format

Where does the formula → tier mapping live?

A. vault/policy.toml in disinto repo Flat TOML: formula = "tier". classify.sh reads it at runtime. Unknown formulas default to high. Changing policy requires a disinto PR.

B. blast_radius field in each formulas/*.toml Add blast_radius = "low"|"medium"|"high" to each formula TOML. classify.sh reads the target formula TOML for its tier. Co-located with formula — impossible to add a formula without declaring its risk.

C. vault/policy.toml in ops repo Same format as A but lives in the ops repo. Operators update without a disinto PR. Useful for per-deployment overrides.

D. Hybrid: formula TOML default + ops override Formula TOML carries a default tier. Ops vault/policy.toml can override per-deployment. Most flexible; classify.sh must merge two sources.


Fork 3 (Secondary): Medium-tier dev-loop behavior

When dev-agent creates a vault PR for a medium-tier action, what does it do while waiting?

A. Non-blocking: fire and continue immediately Agent creates vault PR and moves to next issue without waiting. Maximum autonomy; sequencing becomes unpredictable.

B. Soft-block with 2-hour timeout Agent waits up to 2 hours polling for vault PR merge. If no response, continues. Balances oversight with velocity.

C. Status-quo block (medium = high) Medium-tier blocks the agent loop like high-tier today. Only low-tier actions unblocked. Simplest behavior change — no modification to dev-agent flow needed.

D. Label-based approval signal Agent polls for a vault-approved label on the vault PR instead of waiting for merge. Decouples "approved to continue" from "PR merged and executed."


Proposed sub-issues

Core (always filed regardless of fork choices)

Sub-issue 1: vault/classify.sh — classification engine Implement vault/classify.sh: reads formula name, secrets, optional blast_radius override, applies policy rules, outputs tier (low|medium|high). Default-deny: unknown → high. Files: vault/classify.sh (new), vault/vault-env.sh (call classify)

Sub-issue 2: docs/BLAST-RADIUS.md and SCHEMA.md update Write docs/BLAST-RADIUS.md. Add optional blast_radius field to vault/SCHEMA.md and validator. Files: docs/BLAST-RADIUS.md (new), vault/SCHEMA.md, vault/vault-env.sh

Sub-issue 3: Update prerequisites.md Mark vault redesign (#73-#77) as DONE (stale). Add blast-radius tiers to the tree. Files: disinto-ops/prerequisites.md

Fork 1 variants (pick one)

1A — Modify lib/vault.sh to skip PR for low-tier, commit directly to main. Modify dispatcher.sh to skip verify_admin_merged() for low-tier TOMLs.

1B — Modify dispatcher.sh to post APPROVED review + merge for low-tier. Grant vault-bot admin role in Forgejo setup scripts.

1C — Add setup_vault_branch_protection_tiered() to lib/branch-protection.sh with required_approvals: 0 for vault/* pattern (verify Forgejo glob support first).

1D — Add vault-auto-bot account to forge-setup.sh. Implement approval watcher.

Fork 2 variants (pick one)

2A — Create vault/policy.toml in disinto repo. classify.sh reads it.

2B — Add blast_radius field to all 15 formulas/*.toml. classify.sh reads formula TOML.

2C — Create disinto-ops/vault/policy.toml. classify.sh reads ops copy at runtime.

2D — Two-pass classify.sh: formula TOML default, ops policy override.

Fork 3 variants (pick one)

3A — Non-blocking: lib/vault.sh returns immediately after PR creation for all tiers.

3B — Soft-block: poll medium-tier PR every 15 min for up to 2 hours.

3C — No change: medium-tier behavior unchanged (only low-tier unblocked).

3D — Create vault-approved label. Modify lib/vault.sh medium path to poll label.