architect: supervisor project-wide oversight #32

Merged
disinto-admin merged 1 commit from architect/supervisor-project-wide-oversight into main 2026-04-15 17:39:10 +00:00
Collaborator

Sprint: supervisor project-wide oversight

Vision issue: #540

What this enables

The supervisor can discover all Docker Compose stacks on the deployment box, attribute resource pressure to specific stacks, surface cross-stack symptoms, and coordinate remediation through vault items — instead of blindly pruning everything.

Why now

Real incident: harb-dev-box hit 98% disk because harb-anvil-1 accumulated blockchain state. The supervisor detected P1 disk pressure but had no idea the bloat came from a sibling stack. docker system prune treated all images symmetrically, risking deletion of images the harb stack needed.

Complexity

  • 3-4 files touched (preflight.sh, run-supervisor.toml, projects/*.toml schema, knowledge/sibling-stacks.md)
  • Supervisor only — no changes to other agents
  • ~5-6 sub-issues, 80% gluecode extending existing patterns

Risks

  1. Docker socket blast radius — mitigated by read-only discovery; write actions vault-gated for foreign stacks
  2. docker system prune collateral — needs label-based filtering (com.disinto.managed=true)
  3. Stack ownership ambiguity — design fork needed

Cost

No new services, cron jobs, or containers. Extends the existing supervisor polling loop.

Recommendation

Worth it. Mostly gluecode, addresses a real incident, the one-box-many-stacks model is the common deployment case.


Reply ACCEPT to proceed with design questions, or REJECT: <reason> to decline.

## Sprint: supervisor project-wide oversight **Vision issue**: #540 ### What this enables The supervisor can discover all Docker Compose stacks on the deployment box, attribute resource pressure to specific stacks, surface cross-stack symptoms, and coordinate remediation through vault items — instead of blindly pruning everything. ### Why now Real incident: harb-dev-box hit 98% disk because harb-anvil-1 accumulated blockchain state. The supervisor detected P1 disk pressure but had no idea the bloat came from a sibling stack. `docker system prune` treated all images symmetrically, risking deletion of images the harb stack needed. ### Complexity - 3-4 files touched (preflight.sh, run-supervisor.toml, projects/*.toml schema, knowledge/sibling-stacks.md) - Supervisor only — no changes to other agents - ~5-6 sub-issues, 80% gluecode extending existing patterns ### Risks 1. Docker socket blast radius — mitigated by read-only discovery; write actions vault-gated for foreign stacks 2. `docker system prune` collateral — needs label-based filtering (com.disinto.managed=true) 3. Stack ownership ambiguity — design fork needed ### Cost No new services, cron jobs, or containers. Extends the existing supervisor polling loop. ### Recommendation Worth it. Mostly gluecode, addresses a real incident, the one-box-many-stacks model is the common deployment case. --- Reply `ACCEPT` to proceed with design questions, or `REJECT: <reason>` to decline.
architect-bot added 1 commit 2026-04-15 07:25:29 +00:00
disinto-admin merged commit 8fe5da6b57 into main 2026-04-15 17:39:10 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto-ops#32
No description provided.