disinto/docs/AGENT-DESIGN.md

# Agent Design Principles

> **Status:** Active design principle. All agents, reviewers, and planners should follow this.

## The Determinism / Judgment Split

Every agent has two kinds of work. The architecture should separate them cleanly.

### Deterministic (bash orchestrator)

Mechanical operations that always work the same way. These belong in bash scripts:

- Create and destroy tmux sessions
- Create and destroy git worktrees
- Phase file watching (the event loop)
- Lock files and concurrency guards
- Environment setup and teardown
- Session lifecycle (start, monitor, kill)

**Properties:** No judgment required. Never fails differently based on interpretation. Easy to test. Hard to break.

### Judgment (Claude via formula)

Operations that require understanding context, making decisions, or adapting to novel situations. These belong in the formula — the prompt Claude executes inside the tmux session:

- Read and understand the task (fetch issue body + comments, parse intent)
- Assess dependencies ("does the code this depends on actually exist?")
- Implement the solution
- Create PR with meaningful title and description
- Read review feedback, decide what to address vs push back on
- Handle CI failures (read logs, decide: fix, retry, or escalate)
- Choose rebase strategy (rebase, merge, or start over)
- Decide when to refuse vs implement

**Properties:** Benefits from context. Improves when the formula is refined. Adapts to novel situations without new bash code.

## Why This Matters

### Today's problem

Agent scripts grow by accretion. Every new lesson becomes another `if/elif/else` in bash:
- "CI failed with this pattern → retry with this flag"
- "Review comment mentions X → rebase before addressing"
- "Merge conflict in this file → apply this strategy"

This makes agents brittle, hard to modify, and impossible to generalize across projects.

### The alternative

A thin bash orchestrator handles session lifecycle. Everything that requires judgment lives in the formula — a structured prompt that Claude interprets. Learnings become formula refinements, not bash patches.

```
┌─────────────────────────────────────────┐
│ Bash orchestrator (thin, deterministic) │
│                                         │
│  - tmux session lifecycle               │
│  - worktree create/destroy              │
│  - phase file monitoring                │
│  - lock files                           │
│  - environment setup                    │
└────────────────┬────────────────────────┘
                 │ inject formula / invoke claude -p
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Judgment layer                             │
│                                                                 │
│  ┌─────────────────────────────┐ ┌───────────────────────────┐  │
│  │ Claude in tmux (interactive)│ │ claude -p (one-shot)      │  │
│  │                             │ │                           │  │
│  │  Multi-turn sessions with   │ │  Single prompt+response.  │  │
│  │  phase protocol, CI/review  │ │  No persistent session.   │  │
│  │  feedback loops, tool use.  │ │  Suited for classify-and- │  │
│  │                             │ │  route decisions that     │  │
│  │  Used by: dev, review,      │ │  don't need interaction.  │  │
│  │  gardener, action, planner, │ │                           │  │
│  │  predictor, supervisor      │ │  Used by: vault           │  │
│  └─────────────────────────────┘ └───────────────────────────┘  │
│                                                                 │
│  Both patterns keep judgment out of bash. Choose based on       │
│  whether the agent needs multi-turn interaction (tmux) or       │
│  a single classify/decide pass (claude -p).                     │
└─────────────────────────────────────────────────────────────────┘
```

### Benefits

- **Adaptive:** Formula refinements propagate instantly. No bash deploy needed.
- **Learnable:** When an agent handles a new situation well, capture it in the formula.
- **Debuggable:** Formula steps are human-readable. Bash state machines are not.
- **Generalizable:** Same orchestrator, different formulas for different agents.

### Risks and mitigations

- **Fragility:** Claude might misinterpret a formula step → Phase protocol is the safety net. No phase signal = stall detected = supervisor escalates.
- **Cost:** More Claude turns = more tokens → Offset by eliminating bash dead-ends that waste whole sessions.
- **Non-determinism:** Same formula might produce different results → Success criteria in each step make pass/fail unambiguous.

## Applying This Principle

When reviewing PRs or designing new agents, ask:

1. **Does this bash code make a judgment call?** → Move it to the formula.
2. **Does this formula step do something mechanical?** → Move it to the orchestrator.
3. **Is a new `if/else` being added to handle an edge case?** → That's a formula learning, not an orchestrator feature.
4. **Can this agent's bash be reused by another agent type?** → Good sign — the orchestrator is properly thin.

## Current State

| Agent | Lines | Judgment in bash | Target |
|-------|-------|------------------|--------|
| dev-agent | 2246 (agent 791 + phase-handler 786 + dev-poll 669) | Heavy — deps, CI retry, review parsing, merge strategy, recovery mode; dev-poll adds dependency resolution, CI retry tracking, approved-PR merging, orphaned session recovery | Thin orchestrator + formula |
| review-agent | 870 | Heavy — diff analysis, review decision, approve/request-changes logic | Needs assessment |
| supervisor | 877 | Heavy — multi-project health checks, CI stall detection, container monitoring | Partially justified (monitoring is deterministic, but escalation decisions are judgment) |
| gardener | 1242 (agent 471 + poll 771) | Medium — backlog triage, duplicate detection, tech-debt scoring | Poll is heavy orchestration; agent is prompt-driven |
| vault | 442 (4 scripts) | Medium — approval flow, human gate decisions | Intentionally bash-heavy (security gate should be deterministic) |
| planner | 382 | Medium — AGENTS.md update, gap analysis | Tmux+formula (done, #232) |