From 90789cbb5a4db212ace174a63949c41dce07b0db Mon Sep 17 00:00:00 2001
From: architect-bot <architect-bot@disinto.local>
Date: Sun, 12 Apr 2026 02:02:51 +0000
Subject: [PATCH] sprint: add agent-management-redesign.md

---
 sprints/agent-management-redesign.md | 52 ++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 sprints/agent-management-redesign.md

diff --git a/sprints/agent-management-redesign.md b/sprints/agent-management-redesign.md
new file mode 100644
index 0000000..36a8f6b
--- /dev/null
+++ b/sprints/agent-management-redesign.md
@@ -0,0 +1,52 @@
+# Sprint: agent management redesign
+
+## Vision issues
+- #557 — redesign agent management — hire by inference backend, list by capability
+
+## What this enables
+
+After this sprint, operators can:
+1. Hire agents by backend (disinto hire anthropic, disinto hire llama --url ...) instead of inventing names and roles
+2. List all agents (disinto agents list) with backend, model, roles, and status in one table
+3. Discover what is running without grepping compose files, TOML configs, and state directories
+
+The factory becomes self-describing: an operator who inherits a running instance can immediately see what agents exist, what backends they use, and what roles they fill.
+
+## What exists today
+
+The agent management system is functional but fragmented:
+
+- disinto hire-an-agent name role (lib/hire-agent.sh): Creates Forgejo user, .profile repo, API token, state file, and optionally writes agents TOML section plus regenerates compose. Works, but the mental model is backwards — operator must invent a name and pick a role before specifying the backend.
+- disinto agent enable/disable/status (bin/disinto): Manages state files for 6 hardcoded core agents (dev, reviewer, gardener, architect, planner, predictor). Local-model agents are invisible to this command.
+- agents TOML sections (projects/*.toml): Store local-model agent config (base_url, model, roles, forge_user). Read by lib/generators.sh to generate per-agent docker-compose services.
+- AGENT_ROLES env var: Runtime gate in entrypoint.sh — comma-separated list of roles the container runs.
+- Compose profiles: Local-model agents gated by profiles, requiring explicit --profile to start.
+
+State lives in three disconnected places: state files (CLI), env vars (runtime), compose services (docker). No single command unifies them.
+
+## Complexity
+
+- Files touched: ~4 (bin/disinto, lib/hire-agent.sh, lib/generators.sh, docker/agents/entrypoint.sh)
+- Subsystems: CLI, compose generator, container entrypoint, project TOML schema
+- Estimated sub-issues: 4-5
+- Gluecode vs greenfield: ~80% gluecode (refactoring existing hire-agent.sh and CLI), ~20% greenfield (new agents list output, backend-first hire UX)
+
+## Risks
+
+- Breaking existing hire-an-agent: The old command must keep working during transition. Operators may have scripts that call it. Deprecation path needed.
+- State migration: Existing local-model agents configured via agents TOML need to work unchanged. The new system reads the same TOML — no migration required if we keep the schema.
+- Entrypoint.sh hardcoded list: The 6 core agents are hardcoded in multiple places (entrypoint.sh, bin/disinto). Making this dynamic requires careful testing to avoid breaking the polling loop.
+- TOML parsing fragility: The hire-agent.sh TOML writer uses a Python inline script. Changes to the TOML schema could break parsing if not tested.
+
+## Cost — new infra to maintain
+
+- No new services, cron jobs, or formulas. This is a refactor of existing CLI and configuration paths.
+- New code: disinto hire subcommand (~100 lines), disinto agents list subcommand (~80 lines), agent registry logic that unifies the three state sources (~50 lines).
+- Removed code: Portions of the current hire-an-agent that duplicate backend detection logic.
+- Ongoing: The hardcoded agent list in bin/disinto and entrypoint.sh becomes a derived list (from state files + TOML + compose). Slightly more complex discovery logic, but eliminates the need to update hardcoded lists when new agent types are added.
+
+## Recommendation
+
+Worth it. This is a high-value, low-risk refactor that directly improves the adoption story. The current UX is the number one friction point for new operators — hire-an-agent requires knowing three things (name, role, backend) in the wrong order. The redesign makes the common case (disinto hire anthropic) a one-liner and gives operators visibility into what is running. No new infrastructure, no new dependencies, mostly gluecode over existing interfaces.
+
+Defer only if the team wants to stabilize the current agent set first (all 4 open architect sprints are pending human review). Otherwise, this is independent work that does not conflict with any in-flight sprint.
-- 
2.49.1