fix: feat: planner reads issue comments to detect bounced/stuck work — delegates spec-out to formula (#595)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-23 12:16:33 +00:00
parent 1c909e58b3
commit 9f0a81145b
2 changed files with 134 additions and 3 deletions

View file

@ -1,16 +1,67 @@
# formulas/groom-backlog.toml — Groom the backlog: triage all tech-debt with verify loop
name = "groom-backlog"
description = "Triage and process all tech-debt issues — blockers first, then by impact score, verify to zero"
version = 1
description = "Triage tech-debt issues OR break down bounced issues dispatched by the planner"
version = 2
[context]
files = ["README.md", "AGENTS.md", "VISION.md"]
[[steps]]
id = "check-mode"
title = "Determine operating mode: grooming vs breakdown"
description = """
Check the YAML front matter of the dispatching action issue (if any) for
a `mode` field. Two modes are supported:
1. **breakdown** mode (dispatched by planner for bounced/stuck issues):
The front matter will contain:
formula: groom-backlog
vars:
target_issue: <number>
mode: breakdown
reason: "<why the issue bounced>"
In this mode, skip the normal tech-debt grooming pipeline. Instead:
a. Fetch the target issue:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<target_issue>"
b. Fetch ALL comments on the target issue to understand scope and
prior bounce reasons:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<target_issue>/comments?limit=50"
c. Read the affected files listed in the issue body to understand
the actual code scope.
d. Break the issue into 2-5 sub-issues, each sized for a single
dev-agent session. Each sub-issue MUST include:
- ## Problem (scoped piece of the parent issue)
- ## Affected files (specific files for this sub-task)
- ## Acceptance criteria (at least one checkbox)
- ## Dependencies (reference parent or sibling sub-issues if ordered)
e. Create the sub-issues via API with the `backlog` label.
f. Update the parent issue body to include a "## Sub-issues" section
linking to all created sub-issues.
g. Remove the `underspecified` label from the parent issue (if present).
h. If the parent issue is a meta-issue that is fully covered by
sub-issues, add a comment noting it is now tracked via sub-issues.
i. Signal completion:
echo "ACTION: broke down #<target_issue> into <N> sub-issues" >> "$RESULT_FILE"
echo 'PHASE:done' > "$PHASE_FILE"
After creating sub-issues in breakdown mode, the formula is DONE
do not proceed to the normal tech-debt grooming steps.
2. **grooming** mode (default no mode field, or mode: grooming):
Proceed to the inventory step as normal.
"""
[[steps]]
id = "inventory"
title = "Fetch, score, and classify all tech-debt issues"
needs = ["check-mode"]
description = """
This step only runs in grooming mode. Skip if in breakdown mode.
Fetch all open tech-debt issues:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?type=issues&state=open&limit=50" | \

View file

@ -213,6 +213,36 @@ Read these inputs:
- Planner memory (loaded in preflight)
- Promoted predictions from prediction-triage (add as prerequisites if relevant)
### Comment scanning for bounce/stuck detection
For each issue referenced in the prerequisite tree (by #number), fetch its
recent comments to detect signals that the issue is stuck or bouncing:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<number>/comments?limit=10"
Scan each comment body for these signals:
- **BOUNCED**: body contains "too large for single session", "too_large",
"underspecified", or "needs splitting" (case-insensitive). This means
the dev-agent refused the issue it needs breakdown before retry.
- **ESCALATED**: body contains "escalating for human decision", "needs
human decision", or "escalate" from a non-human author. The issue
needs steering input.
- **UNBLOCKED**: body contains "dependency .* is now closed" or
"unblocked". The issue may be ready to work.
- **LABEL_CHURN**: the issue has been relabeled between backlog and
underspecified (or blocked) 3+ times. Check via label change events
or multiple bounce comments. This indicates a ping-pong loop.
Track detected signals in a list: `stuck_issues[]` where each entry is:
{ issue: <number>, signal: BOUNCED|ESCALATED|LABEL_CHURN, count: <N>,
reason: "<summary from comments>" }
These signals feed into the file-at-constraints step to prevent the
planner from re-promoting stuck issues and to dispatch formula-based
breakdown instead.
Update the tree by applying these operations:
1. **Mark resolved prerequisites**: For each prerequisite in the tree,
@ -306,7 +336,51 @@ Action issues count toward the 3-issue constraint budget — they are
strategic investments, not maintenance. The planner decides what data
matters based on current constraints, not what formulas exist.
Filing gate for each constraint:
### Stuck issue handling — dispatch to groom-backlog formula
Before filing, cross-reference the top 3 constraints against the
`stuck_issues[]` list from the update-prerequisite-tree step.
If a constraint issue was detected as BOUNCED or LABEL_CHURN:
- Do NOT re-promote it to backlog or add the priority label this
would restart the ping-pong loop.
- Instead, dispatch the groom-backlog formula to break it down.
Create an action issue that invokes groom-backlog with the stuck
issue as target:
Title: "chore: break down #<number> — bounced <count>x, needs splitting"
Body:
---
formula: groom-backlog
vars:
target_issue: <number>
mode: breakdown
reason: "<reason from stuck_issues entry>"
---
## Problem
Issue #<number> has bounced <count> time(s) between backlog and
underspecified. The dev-agent reports it is too large for a single
session. It needs to be broken into dev-agent-sized subtasks.
## Affected files
- formulas/groom-backlog.toml
## Acceptance criteria
- [ ] #<number> is split into implementable sub-issues
- [ ] Sub-issues have acceptance criteria and affected files
- [ ] Original issue updated with links to sub-issues
Label this action issue with the `action` label (not `backlog`).
This counts toward the 3-issue-per-run limit.
If a constraint issue was detected as ESCALATED:
- Do NOT file new work. Add a comment to the issue noting the
escalation was seen, and mark the prerequisite in the tree as:
`[ ] <name> escalated awaiting human decision`
- Do NOT count this against the 3-issue limit.
Filing gate for each constraint (that is NOT stuck):
1. Check if an issue already exists for this constraint (match by issue
number reference in the tree, or search open issues by title).
@ -467,6 +541,12 @@ Format:
2. <prerequisite> blocks N objectives issue #NNN
3. <prerequisite> blocks N objectives issue #NNN
## Stuck issues detected
- #NNN: BOUNCED (Nx) — dispatched groom-backlog as #MMM
- #NNN: ESCALATED — awaiting human decision
- #NNN: LABEL_CHURN (Nx) — dispatched groom-backlog as #MMM
(or "No stuck issues detected" if none)
## Issues created
- #NNN: title — why (constraint for objectives X, Y)
(or "No new issues — constraints already have open issues" if none)