- Delete action/ directory and all its contents - Remove action-bot from bin/disinto bot token mapping and collaborator lists - Remove FORGE_ACTION_TOKEN from lib/env.sh and .env.example - Remove action-bot from FORGE_BOT_USERNAMES in lib/env.sh and .env.example - Update .woodpecker/agent-smoke.sh to remove action script checks - Update AGENTS.md: remove action agent from description and table - Update lib/AGENTS.md: remove action-agent references from sourced by columns - Update docs/PHASE-PROTOCOL.md: remove action-agent reference - Update docs/AGENT-DESIGN.md: remove action-agent from agent table - Update planner/AGENTS.md: update action formula execution reference - Update README.md: update formula-driven execution reference Part of #26 — retire action-agent system.
116 lines
6.9 KiB
Markdown
116 lines
6.9 KiB
Markdown
# Agent Design Principles
|
|
|
|
> **Status:** Active design principle. All agents, reviewers, and planners should follow this.
|
|
|
|
## The Determinism / Judgment Split
|
|
|
|
Every agent has two kinds of work. The architecture should separate them cleanly.
|
|
|
|
### Deterministic (bash orchestrator)
|
|
|
|
Mechanical operations that always work the same way. These belong in bash scripts:
|
|
|
|
- Create and destroy tmux sessions
|
|
- Create and destroy git worktrees
|
|
- Phase file watching (the event loop)
|
|
- Lock files and concurrency guards
|
|
- Environment setup and teardown
|
|
- Session lifecycle (start, monitor, kill)
|
|
|
|
**Properties:** No judgment required. Never fails differently based on interpretation. Easy to test. Hard to break.
|
|
|
|
### Judgment (Claude via formula)
|
|
|
|
Operations that require understanding context, making decisions, or adapting to novel situations. These belong in the formula — the prompt Claude executes inside the tmux session:
|
|
|
|
- Read and understand the task (fetch issue body + comments, parse intent)
|
|
- Assess dependencies ("does the code this depends on actually exist?")
|
|
- Implement the solution
|
|
- Create PR with meaningful title and description
|
|
- Read review feedback, decide what to address vs push back on
|
|
- Handle CI failures (read logs, decide: fix, retry, or escalate)
|
|
- Choose rebase strategy (rebase, merge, or start over)
|
|
- Decide when to refuse vs implement
|
|
|
|
**Properties:** Benefits from context. Improves when the formula is refined. Adapts to novel situations without new bash code.
|
|
|
|
## Why This Matters
|
|
|
|
### Today's problem
|
|
|
|
Agent scripts grow by accretion. Every new lesson becomes another `if/elif/else` in bash:
|
|
- "CI failed with this pattern → retry with this flag"
|
|
- "Review comment mentions X → rebase before addressing"
|
|
- "Merge conflict in this file → apply this strategy"
|
|
|
|
This makes agents brittle, hard to modify, and impossible to generalize across projects.
|
|
|
|
### The alternative
|
|
|
|
A thin bash orchestrator handles session lifecycle. Everything that requires judgment lives in the formula — a structured prompt that Claude interprets. Learnings become formula refinements, not bash patches.
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ Bash orchestrator (thin, deterministic) │
|
|
│ │
|
|
│ - tmux session lifecycle │
|
|
│ - worktree create/destroy │
|
|
│ - phase file monitoring │
|
|
│ - lock files │
|
|
│ - environment setup │
|
|
└────────────────┬────────────────────────┘
|
|
│ inject formula / invoke claude -p
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Judgment layer │
|
|
│ │
|
|
│ ┌─────────────────────────────┐ ┌───────────────────────────┐ │
|
|
│ │ Claude in tmux (interactive)│ │ claude -p (one-shot) │ │
|
|
│ │ │ │ │ │
|
|
│ │ Multi-turn sessions with │ │ Single prompt+response. │ │
|
|
│ │ phase protocol, CI/review │ │ No persistent session. │ │
|
|
│ │ feedback loops, tool use. │ │ Suited for classify-and- │ │
|
|
│ │ │ │ route decisions that │ │
|
|
│ │ Used by: dev, review, │ │ don't need interaction. │ │
|
|
│ │ gardener, action, planner, │ │ │ │
|
|
│ │ predictor, supervisor │ │ Used by: vault │ │
|
|
│ └─────────────────────────────┘ └───────────────────────────┘ │
|
|
│ │
|
|
│ Both patterns keep judgment out of bash. Choose based on │
|
|
│ whether the agent needs multi-turn interaction (tmux) or │
|
|
│ a single classify/decide pass (claude -p). │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Benefits
|
|
|
|
- **Adaptive:** Formula refinements propagate instantly. No bash deploy needed.
|
|
- **Learnable:** When an agent handles a new situation well, capture it in the formula.
|
|
- **Debuggable:** Formula steps are human-readable. Bash state machines are not.
|
|
- **Generalizable:** Same orchestrator, different formulas for different agents.
|
|
|
|
### Risks and mitigations
|
|
|
|
- **Fragility:** Claude might misinterpret a formula step → Phase protocol is the safety net. No phase signal = stall detected = supervisor escalates.
|
|
- **Cost:** More Claude turns = more tokens → Offset by eliminating bash dead-ends that waste whole sessions.
|
|
- **Non-determinism:** Same formula might produce different results → Success criteria in each step make pass/fail unambiguous.
|
|
|
|
## Applying This Principle
|
|
|
|
When reviewing PRs or designing new agents, ask:
|
|
|
|
1. **Does this bash code make a judgment call?** → Move it to the formula.
|
|
2. **Does this formula step do something mechanical?** → Move it to the orchestrator.
|
|
3. **Is a new `if/else` being added to handle an edge case?** → That's a formula learning, not an orchestrator feature.
|
|
4. **Can this agent's bash be reused by another agent type?** → Good sign — the orchestrator is properly thin.
|
|
|
|
## Current State
|
|
|
|
| Agent | Lines | Judgment in bash | Target |
|
|
|-------|-------|------------------|--------|
|
|
| dev-agent | 2246 (agent 791 + phase-handler 786 + dev-poll 669) | Heavy — deps, CI retry, review parsing, merge strategy, recovery mode; dev-poll adds dependency resolution, CI retry tracking, approved-PR merging, orphaned session recovery | Thin orchestrator + formula |
|
|
| review-agent | 870 | Heavy — diff analysis, review decision, approve/request-changes logic | Needs assessment |
|
|
| supervisor | 877 | Heavy — multi-project health checks, CI stall detection, container monitoring | Partially justified (monitoring is deterministic, but escalation decisions are judgment) |
|
|
| gardener | 1242 (agent 471 + poll 771) | Medium — backlog triage, duplicate detection, tech-debt scoring | Poll is heavy orchestration; agent is prompt-driven |
|
|
| vault | 442 (4 scripts) | Medium — approval flow, human gate decisions | Intentionally bash-heavy (security gate should be deterministic) |
|
|
| planner | 382 | Medium — AGENTS.md update, gap analysis | Tmux+formula (done, #232) |
|