From 90789cbb5a4db212ace174a63949c41dce07b0db Mon Sep 17 00:00:00 2001 From: architect-bot Date: Sun, 12 Apr 2026 02:02:51 +0000 Subject: [PATCH] sprint: add agent-management-redesign.md --- sprints/agent-management-redesign.md | 52 ++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 sprints/agent-management-redesign.md diff --git a/sprints/agent-management-redesign.md b/sprints/agent-management-redesign.md new file mode 100644 index 0000000..36a8f6b --- /dev/null +++ b/sprints/agent-management-redesign.md @@ -0,0 +1,52 @@ +# Sprint: agent management redesign + +## Vision issues +- #557 — redesign agent management — hire by inference backend, list by capability + +## What this enables + +After this sprint, operators can: +1. Hire agents by backend (disinto hire anthropic, disinto hire llama --url ...) instead of inventing names and roles +2. List all agents (disinto agents list) with backend, model, roles, and status in one table +3. Discover what is running without grepping compose files, TOML configs, and state directories + +The factory becomes self-describing: an operator who inherits a running instance can immediately see what agents exist, what backends they use, and what roles they fill. + +## What exists today + +The agent management system is functional but fragmented: + +- disinto hire-an-agent name role (lib/hire-agent.sh): Creates Forgejo user, .profile repo, API token, state file, and optionally writes agents TOML section plus regenerates compose. Works, but the mental model is backwards — operator must invent a name and pick a role before specifying the backend. +- disinto agent enable/disable/status (bin/disinto): Manages state files for 6 hardcoded core agents (dev, reviewer, gardener, architect, planner, predictor). Local-model agents are invisible to this command. +- agents TOML sections (projects/*.toml): Store local-model agent config (base_url, model, roles, forge_user). Read by lib/generators.sh to generate per-agent docker-compose services. +- AGENT_ROLES env var: Runtime gate in entrypoint.sh — comma-separated list of roles the container runs. +- Compose profiles: Local-model agents gated by profiles, requiring explicit --profile to start. + +State lives in three disconnected places: state files (CLI), env vars (runtime), compose services (docker). No single command unifies them. + +## Complexity + +- Files touched: ~4 (bin/disinto, lib/hire-agent.sh, lib/generators.sh, docker/agents/entrypoint.sh) +- Subsystems: CLI, compose generator, container entrypoint, project TOML schema +- Estimated sub-issues: 4-5 +- Gluecode vs greenfield: ~80% gluecode (refactoring existing hire-agent.sh and CLI), ~20% greenfield (new agents list output, backend-first hire UX) + +## Risks + +- Breaking existing hire-an-agent: The old command must keep working during transition. Operators may have scripts that call it. Deprecation path needed. +- State migration: Existing local-model agents configured via agents TOML need to work unchanged. The new system reads the same TOML — no migration required if we keep the schema. +- Entrypoint.sh hardcoded list: The 6 core agents are hardcoded in multiple places (entrypoint.sh, bin/disinto). Making this dynamic requires careful testing to avoid breaking the polling loop. +- TOML parsing fragility: The hire-agent.sh TOML writer uses a Python inline script. Changes to the TOML schema could break parsing if not tested. + +## Cost — new infra to maintain + +- No new services, cron jobs, or formulas. This is a refactor of existing CLI and configuration paths. +- New code: disinto hire subcommand (~100 lines), disinto agents list subcommand (~80 lines), agent registry logic that unifies the three state sources (~50 lines). +- Removed code: Portions of the current hire-an-agent that duplicate backend detection logic. +- Ongoing: The hardcoded agent list in bin/disinto and entrypoint.sh becomes a derived list (from state files + TOML + compose). Slightly more complex discovery logic, but eliminates the need to update hardcoded lists when new agent types are added. + +## Recommendation + +Worth it. This is a high-value, low-risk refactor that directly improves the adoption story. The current UX is the number one friction point for new operators — hire-an-agent requires knowing three things (name, role, backend) in the wrong order. The redesign makes the common case (disinto hire anthropic) a one-liner and gives operators visibility into what is running. No new infrastructure, no new dependencies, mostly gluecode over existing interfaces. + +Defer only if the team wants to stabilize the current agent set first (all 4 open architect sprints are pending human review). Otherwise, this is independent work that does not conflict with any in-flight sprint. -- 2.49.1