53 lines
4.1 KiB
Markdown
53 lines
4.1 KiB
Markdown
|
|
# Sprint: agent management redesign
|
||
|
|
|
||
|
|
## Vision issues
|
||
|
|
- #557 — redesign agent management — hire by inference backend, list by capability
|
||
|
|
|
||
|
|
## What this enables
|
||
|
|
|
||
|
|
After this sprint, operators can:
|
||
|
|
1. Hire agents by backend (disinto hire anthropic, disinto hire llama --url ...) instead of inventing names and roles
|
||
|
|
2. List all agents (disinto agents list) with backend, model, roles, and status in one table
|
||
|
|
3. Discover what is running without grepping compose files, TOML configs, and state directories
|
||
|
|
|
||
|
|
The factory becomes self-describing: an operator who inherits a running instance can immediately see what agents exist, what backends they use, and what roles they fill.
|
||
|
|
|
||
|
|
## What exists today
|
||
|
|
|
||
|
|
The agent management system is functional but fragmented:
|
||
|
|
|
||
|
|
- disinto hire-an-agent name role (lib/hire-agent.sh): Creates Forgejo user, .profile repo, API token, state file, and optionally writes agents TOML section plus regenerates compose. Works, but the mental model is backwards — operator must invent a name and pick a role before specifying the backend.
|
||
|
|
- disinto agent enable/disable/status (bin/disinto): Manages state files for 6 hardcoded core agents (dev, reviewer, gardener, architect, planner, predictor). Local-model agents are invisible to this command.
|
||
|
|
- agents TOML sections (projects/*.toml): Store local-model agent config (base_url, model, roles, forge_user). Read by lib/generators.sh to generate per-agent docker-compose services.
|
||
|
|
- AGENT_ROLES env var: Runtime gate in entrypoint.sh — comma-separated list of roles the container runs.
|
||
|
|
- Compose profiles: Local-model agents gated by profiles, requiring explicit --profile to start.
|
||
|
|
|
||
|
|
State lives in three disconnected places: state files (CLI), env vars (runtime), compose services (docker). No single command unifies them.
|
||
|
|
|
||
|
|
## Complexity
|
||
|
|
|
||
|
|
- Files touched: ~4 (bin/disinto, lib/hire-agent.sh, lib/generators.sh, docker/agents/entrypoint.sh)
|
||
|
|
- Subsystems: CLI, compose generator, container entrypoint, project TOML schema
|
||
|
|
- Estimated sub-issues: 4-5
|
||
|
|
- Gluecode vs greenfield: ~80% gluecode (refactoring existing hire-agent.sh and CLI), ~20% greenfield (new agents list output, backend-first hire UX)
|
||
|
|
|
||
|
|
## Risks
|
||
|
|
|
||
|
|
- Breaking existing hire-an-agent: The old command must keep working during transition. Operators may have scripts that call it. Deprecation path needed.
|
||
|
|
- State migration: Existing local-model agents configured via agents TOML need to work unchanged. The new system reads the same TOML — no migration required if we keep the schema.
|
||
|
|
- Entrypoint.sh hardcoded list: The 6 core agents are hardcoded in multiple places (entrypoint.sh, bin/disinto). Making this dynamic requires careful testing to avoid breaking the polling loop.
|
||
|
|
- TOML parsing fragility: The hire-agent.sh TOML writer uses a Python inline script. Changes to the TOML schema could break parsing if not tested.
|
||
|
|
|
||
|
|
## Cost — new infra to maintain
|
||
|
|
|
||
|
|
- No new services, cron jobs, or formulas. This is a refactor of existing CLI and configuration paths.
|
||
|
|
- New code: disinto hire subcommand (~100 lines), disinto agents list subcommand (~80 lines), agent registry logic that unifies the three state sources (~50 lines).
|
||
|
|
- Removed code: Portions of the current hire-an-agent that duplicate backend detection logic.
|
||
|
|
- Ongoing: The hardcoded agent list in bin/disinto and entrypoint.sh becomes a derived list (from state files + TOML + compose). Slightly more complex discovery logic, but eliminates the need to update hardcoded lists when new agent types are added.
|
||
|
|
|
||
|
|
## Recommendation
|
||
|
|
|
||
|
|
Worth it. This is a high-value, low-risk refactor that directly improves the adoption story. The current UX is the number one friction point for new operators — hire-an-agent requires knowing three things (name, role, backend) in the wrong order. The redesign makes the common case (disinto hire anthropic) a one-liner and gives operators visibility into what is running. No new infrastructure, no new dependencies, mostly gluecode over existing interfaces.
|
||
|
|
|
||
|
|
Defer only if the team wants to stabilize the current agent set first (all 4 open architect sprints are pending human review). Otherwise, this is independent work that does not conflict with any in-flight sprint.
|