disinto/docs/AGENT-DESIGN.md
Agent dc545a817b
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
fix: chore(26a): delete action-agent.sh, action-poll.sh, and action/AGENTS.md (#65)
- Delete action/ directory and all its contents
- Remove action-bot from bin/disinto bot token mapping and collaborator lists
- Remove FORGE_ACTION_TOKEN from lib/env.sh and .env.example
- Remove action-bot from FORGE_BOT_USERNAMES in lib/env.sh and .env.example
- Update .woodpecker/agent-smoke.sh to remove action script checks
- Update AGENTS.md: remove action agent from description and table
- Update lib/AGENTS.md: remove action-agent references from sourced by columns
- Update docs/PHASE-PROTOCOL.md: remove action-agent reference
- Update docs/AGENT-DESIGN.md: remove action-agent from agent table
- Update planner/AGENTS.md: update action formula execution reference
- Update README.md: update formula-driven execution reference

Part of #26 — retire action-agent system.
2026-03-31 19:42:25 +00:00

6.9 KiB

Agent Design Principles

Status: Active design principle. All agents, reviewers, and planners should follow this.

The Determinism / Judgment Split

Every agent has two kinds of work. The architecture should separate them cleanly.

Deterministic (bash orchestrator)

Mechanical operations that always work the same way. These belong in bash scripts:

  • Create and destroy tmux sessions
  • Create and destroy git worktrees
  • Phase file watching (the event loop)
  • Lock files and concurrency guards
  • Environment setup and teardown
  • Session lifecycle (start, monitor, kill)

Properties: No judgment required. Never fails differently based on interpretation. Easy to test. Hard to break.

Judgment (Claude via formula)

Operations that require understanding context, making decisions, or adapting to novel situations. These belong in the formula — the prompt Claude executes inside the tmux session:

  • Read and understand the task (fetch issue body + comments, parse intent)
  • Assess dependencies ("does the code this depends on actually exist?")
  • Implement the solution
  • Create PR with meaningful title and description
  • Read review feedback, decide what to address vs push back on
  • Handle CI failures (read logs, decide: fix, retry, or escalate)
  • Choose rebase strategy (rebase, merge, or start over)
  • Decide when to refuse vs implement

Properties: Benefits from context. Improves when the formula is refined. Adapts to novel situations without new bash code.

Why This Matters

Today's problem

Agent scripts grow by accretion. Every new lesson becomes another if/elif/else in bash:

  • "CI failed with this pattern → retry with this flag"
  • "Review comment mentions X → rebase before addressing"
  • "Merge conflict in this file → apply this strategy"

This makes agents brittle, hard to modify, and impossible to generalize across projects.

The alternative

A thin bash orchestrator handles session lifecycle. Everything that requires judgment lives in the formula — a structured prompt that Claude interprets. Learnings become formula refinements, not bash patches.

┌─────────────────────────────────────────┐
│ Bash orchestrator (thin, deterministic) │
│                                         │
│  - tmux session lifecycle               │
│  - worktree create/destroy              │
│  - phase file monitoring                │
│  - lock files                           │
│  - environment setup                    │
└────────────────┬────────────────────────┘
                 │ inject formula / invoke claude -p
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Judgment layer                             │
│                                                                 │
│  ┌─────────────────────────────┐ ┌───────────────────────────┐  │
│  │ Claude in tmux (interactive)│ │ claude -p (one-shot)      │  │
│  │                             │ │                           │  │
│  │  Multi-turn sessions with   │ │  Single prompt+response.  │  │
│  │  phase protocol, CI/review  │ │  No persistent session.   │  │
│  │  feedback loops, tool use.  │ │  Suited for classify-and- │  │
│  │                             │ │  route decisions that     │  │
│  │  Used by: dev, review,      │ │  don't need interaction.  │  │
│  │  gardener, action, planner, │ │                           │  │
│  │  predictor, supervisor      │ │  Used by: vault           │  │
│  └─────────────────────────────┘ └───────────────────────────┘  │
│                                                                 │
│  Both patterns keep judgment out of bash. Choose based on       │
│  whether the agent needs multi-turn interaction (tmux) or       │
│  a single classify/decide pass (claude -p).                     │
└─────────────────────────────────────────────────────────────────┘

Benefits

  • Adaptive: Formula refinements propagate instantly. No bash deploy needed.
  • Learnable: When an agent handles a new situation well, capture it in the formula.
  • Debuggable: Formula steps are human-readable. Bash state machines are not.
  • Generalizable: Same orchestrator, different formulas for different agents.

Risks and mitigations

  • Fragility: Claude might misinterpret a formula step → Phase protocol is the safety net. No phase signal = stall detected = supervisor escalates.
  • Cost: More Claude turns = more tokens → Offset by eliminating bash dead-ends that waste whole sessions.
  • Non-determinism: Same formula might produce different results → Success criteria in each step make pass/fail unambiguous.

Applying This Principle

When reviewing PRs or designing new agents, ask:

  1. Does this bash code make a judgment call? → Move it to the formula.
  2. Does this formula step do something mechanical? → Move it to the orchestrator.
  3. Is a new if/else being added to handle an edge case? → That's a formula learning, not an orchestrator feature.
  4. Can this agent's bash be reused by another agent type? → Good sign — the orchestrator is properly thin.

Current State

Agent Lines Judgment in bash Target
dev-agent 2246 (agent 791 + phase-handler 786 + dev-poll 669) Heavy — deps, CI retry, review parsing, merge strategy, recovery mode; dev-poll adds dependency resolution, CI retry tracking, approved-PR merging, orphaned session recovery Thin orchestrator + formula
review-agent 870 Heavy — diff analysis, review decision, approve/request-changes logic Needs assessment
supervisor 877 Heavy — multi-project health checks, CI stall detection, container monitoring Partially justified (monitoring is deterministic, but escalation decisions are judgment)
gardener 1242 (agent 471 + poll 771) Medium — backlog triage, duplicate detection, tech-debt scoring Poll is heavy orchestration; agent is prompt-driven
vault 442 (4 scripts) Medium — approval flow, human gate decisions Intentionally bash-heavy (security gate should be deterministic)
planner 382 Medium — AGENTS.md update, gap analysis Tmux+formula (done, #232)