openhands d63a7402df fix: dev-poll.sh contains heavy judgment-in-bash not captured in the Current State table (#250 )

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-21 07:20:12 +00:00

7 KiB

Raw Permalink Blame History

Agent Design Principles

Status: Active design principle. All agents, reviewers, and planners should follow this.

The Determinism / Judgment Split

Every agent has two kinds of work. The architecture should separate them cleanly.

Deterministic (bash orchestrator)

Mechanical operations that always work the same way. These belong in bash scripts:

Create and destroy tmux sessions
Create and destroy git worktrees
Phase file watching (the event loop)
Lock files and concurrency guards
Environment setup and teardown
Session lifecycle (start, monitor, kill)

Properties: No judgment required. Never fails differently based on interpretation. Easy to test. Hard to break.

Judgment (Claude via formula)

Operations that require understanding context, making decisions, or adapting to novel situations. These belong in the formula — the prompt Claude executes inside the tmux session:

Read and understand the task (fetch issue body + comments, parse intent)
Assess dependencies ("does the code this depends on actually exist?")
Implement the solution
Create PR with meaningful title and description
Read review feedback, decide what to address vs push back on
Handle CI failures (read logs, decide: fix, retry, or escalate)
Choose rebase strategy (rebase, merge, or start over)
Decide when to refuse vs implement

Properties: Benefits from context. Improves when the formula is refined. Adapts to novel situations without new bash code.

Why This Matters

Today's problem

Agent scripts grow by accretion. Every new lesson becomes another if/elif/else in bash:

"CI failed with this pattern → retry with this flag"
"Review comment mentions X → rebase before addressing"
"Merge conflict in this file → apply this strategy"

This makes agents brittle, hard to modify, and impossible to generalize across projects.

The alternative

A thin bash orchestrator handles session lifecycle. Everything that requires judgment lives in the formula — a structured prompt that Claude interprets. Learnings become formula refinements, not bash patches.

┌─────────────────────────────────────────┐
│ Bash orchestrator (thin, deterministic) │
│                                         │
│  - tmux session lifecycle               │
│  - worktree create/destroy              │
│  - phase file monitoring                │
│  - lock files                           │
│  - environment setup                    │
└────────────────┬────────────────────────┘
                 │ inject formula / invoke claude -p
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Judgment layer                             │
│                                                                 │
│  ┌─────────────────────────────┐ ┌───────────────────────────┐  │
│  │ Claude in tmux (interactive)│ │ claude -p (one-shot)      │  │
│  │                             │ │                           │  │
│  │  Multi-turn sessions with   │ │  Single prompt+response.  │  │
│  │  phase protocol, CI/review  │ │  No persistent session.   │  │
│  │  feedback loops, tool use.  │ │  Suited for classify-and- │  │
│  │                             │ │  route decisions that     │  │
│  │  Used by: dev, review,      │ │  don't need interaction.  │  │
│  │  gardener, action, planner, │ │                           │  │
│  │  predictor, supervisor      │ │  Used by: vault           │  │
│  └─────────────────────────────┘ └───────────────────────────┘  │
│                                                                 │
│  Both patterns keep judgment out of bash. Choose based on       │
│  whether the agent needs multi-turn interaction (tmux) or       │
│  a single classify/decide pass (claude -p).                     │
└─────────────────────────────────────────────────────────────────┘

Benefits

Adaptive: Formula refinements propagate instantly. No bash deploy needed.
Learnable: When an agent handles a new situation well, capture it in the formula.
Debuggable: Formula steps are human-readable. Bash state machines are not.
Generalizable: Same orchestrator, different formulas for different agents.

Risks and mitigations

Fragility: Claude might misinterpret a formula step → Phase protocol is the safety net. No phase signal = stall detected = supervisor escalates.
Cost: More Claude turns = more tokens → Offset by eliminating bash dead-ends that waste whole sessions.
Non-determinism: Same formula might produce different results → Success criteria in each step make pass/fail unambiguous.

Applying This Principle

When reviewing PRs or designing new agents, ask:

Does this bash code make a judgment call? → Move it to the formula.
Does this formula step do something mechanical? → Move it to the orchestrator.
Is a new if/else being added to handle an edge case? → That's a formula learning, not an orchestrator feature.
Can this agent's bash be reused by another agent type? → Good sign — the orchestrator is properly thin.

Current State

Agent	Lines	Judgment in bash	Target
dev-agent	2246 (agent 791 + phase-handler 786 + dev-poll 669)	Heavy — deps, CI retry, review parsing, merge strategy, recovery mode; dev-poll adds dependency resolution, CI retry tracking, approved-PR merging, orphaned session recovery	Thin orchestrator + formula
review-agent	870	Heavy — diff analysis, review decision, approve/request-changes logic	Needs assessment
supervisor	877	Heavy — multi-project health checks, CI stall detection, container monitoring	Partially justified (monitoring is deterministic, but escalation decisions are judgment)
gardener	1242 (agent 471 + poll 771)	Medium — backlog triage, duplicate detection, tech-debt scoring	Poll is heavy orchestration; agent is prompt-driven
vault	442 (4 scripts)	Medium — approval flow, human gate decisions	Intentionally bash-heavy (security gate should be deterministic)
planner	382	Medium — AGENTS.md update, gap analysis	Tmux+formula (done, #232)
action-agent	192	Light — formula execution	Close to target

7 KiB Raw Permalink Blame History