sprint: add supervisor-project-wide-oversight.md
This commit is contained in:
parent
a357297479
commit
51a33bb0f1
1 changed files with 42 additions and 0 deletions
42
sprints/supervisor-project-wide-oversight.md
Normal file
42
sprints/supervisor-project-wide-oversight.md
Normal file
|
|
@ -0,0 +1,42 @@
|
||||||
|
# Sprint: supervisor project-wide oversight
|
||||||
|
|
||||||
|
## Vision issues
|
||||||
|
- #540 — supervisor should have project-wide oversight, not just self-monitoring
|
||||||
|
|
||||||
|
## What this enables
|
||||||
|
After this sprint, the supervisor can:
|
||||||
|
1. Discover all Docker Compose stacks on the deployment box — not just the disinto factory
|
||||||
|
2. Attribute resource pressure to specific stacks — "harb-anvil-1 grew 12 GB" instead of "disk at 98%"
|
||||||
|
3. Surface cross-stack symptoms (restarting containers, unhealthy services, volume bloat) without per-project knowledge
|
||||||
|
4. Coordinate remediation through vault items naming the stack owner, rather than blindly pruning
|
||||||
|
|
||||||
|
This turns the supervisor from a single-project health monitor into a deployment-box health monitor — critical because factory deployments coexist with the projects they supervise.
|
||||||
|
|
||||||
|
## What exists today
|
||||||
|
- preflight.sh (227 lines) — already collects RAM, disk, load, docker ps, CI, PRs, issues, locks, phase files, worktrees, vault items. Easy to extend.
|
||||||
|
- run-supervisor.toml — priority framework (P0-P4) with auto-fix vs. vault-item escalation. New cross-stack rules slot into existing tiers.
|
||||||
|
- Edge container — has docker socket access, docker CLI installed. Can run docker compose ls, docker stats, docker system df.
|
||||||
|
- projects/*.toml — per-project config with [services].containers field. Could be extended for sibling stack ownership.
|
||||||
|
- AD-006 — external actions go through vault. Supervisor reports foreign stack symptoms but does not auto-remediate.
|
||||||
|
- docker system prune -f — already runs as P1 auto-fix. Currently affects all images symmetrically (the problem this sprint solves).
|
||||||
|
|
||||||
|
## Complexity
|
||||||
|
- Files touched: 3-4 (preflight.sh, run-supervisor.toml, projects/*.toml schema, new knowledge/sibling-stacks.md)
|
||||||
|
- Subsystems: supervisor only — no changes to other agents
|
||||||
|
- Estimated sub-issues: 5-6
|
||||||
|
- Gluecode vs greenfield: 80/20 (extending existing preflight sections and priority rules vs. stack ownership model)
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
1. Docker socket blast radius — mitigated by read-only discovery commands; write actions stay vault-gated for foreign stacks.
|
||||||
|
2. docker system prune collateral — scoping prune to disinto-managed images requires label-based filtering (com.disinto.managed=true), factory images need labeling first.
|
||||||
|
3. Performance of docker stats — mitigated by --no-stream --format for a single snapshot.
|
||||||
|
4. Stack ownership ambiguity — no standard way to identify who owns a foreign compose project. Design fork needed.
|
||||||
|
|
||||||
|
## Cost — new infra to maintain
|
||||||
|
- No new services, cron jobs, or containers. Extends the existing supervisor.
|
||||||
|
- New knowledge file: knowledge/sibling-stacks.md (low maintenance).
|
||||||
|
- Optional TOML schema extension: [siblings] section in project config.
|
||||||
|
- Image labeling convention: com.disinto.managed=true on factory Dockerfiles and compose.
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
Worth it. Addresses a real incident (harb-dev-box 98% disk), mostly gluecode extending proven patterns, adds no new services, directly supports Foundation milestone. The one-box-many-stacks model is the common case for resource-constrained dev environments.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue