fix: feat: planner reads issue comments to detect bounced/stuck work — delegates spec-out to formula (#595)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-23 12:16:33 +00:00
parent 1c909e58b3
commit 9f0a81145b
2 changed files with 134 additions and 3 deletions

View file

@ -1,16 +1,67 @@
# formulas/groom-backlog.toml — Groom the backlog: triage all tech-debt with verify loop # formulas/groom-backlog.toml — Groom the backlog: triage all tech-debt with verify loop
name = "groom-backlog" name = "groom-backlog"
description = "Triage and process all tech-debt issues — blockers first, then by impact score, verify to zero" description = "Triage tech-debt issues OR break down bounced issues dispatched by the planner"
version = 1 version = 2
[context] [context]
files = ["README.md", "AGENTS.md", "VISION.md"] files = ["README.md", "AGENTS.md", "VISION.md"]
[[steps]]
id = "check-mode"
title = "Determine operating mode: grooming vs breakdown"
description = """
Check the YAML front matter of the dispatching action issue (if any) for
a `mode` field. Two modes are supported:
1. **breakdown** mode (dispatched by planner for bounced/stuck issues):
The front matter will contain:
formula: groom-backlog
vars:
target_issue: <number>
mode: breakdown
reason: "<why the issue bounced>"
In this mode, skip the normal tech-debt grooming pipeline. Instead:
a. Fetch the target issue:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<target_issue>"
b. Fetch ALL comments on the target issue to understand scope and
prior bounce reasons:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<target_issue>/comments?limit=50"
c. Read the affected files listed in the issue body to understand
the actual code scope.
d. Break the issue into 2-5 sub-issues, each sized for a single
dev-agent session. Each sub-issue MUST include:
- ## Problem (scoped piece of the parent issue)
- ## Affected files (specific files for this sub-task)
- ## Acceptance criteria (at least one checkbox)
- ## Dependencies (reference parent or sibling sub-issues if ordered)
e. Create the sub-issues via API with the `backlog` label.
f. Update the parent issue body to include a "## Sub-issues" section
linking to all created sub-issues.
g. Remove the `underspecified` label from the parent issue (if present).
h. If the parent issue is a meta-issue that is fully covered by
sub-issues, add a comment noting it is now tracked via sub-issues.
i. Signal completion:
echo "ACTION: broke down #<target_issue> into <N> sub-issues" >> "$RESULT_FILE"
echo 'PHASE:done' > "$PHASE_FILE"
After creating sub-issues in breakdown mode, the formula is DONE
do not proceed to the normal tech-debt grooming steps.
2. **grooming** mode (default no mode field, or mode: grooming):
Proceed to the inventory step as normal.
"""
[[steps]] [[steps]]
id = "inventory" id = "inventory"
title = "Fetch, score, and classify all tech-debt issues" title = "Fetch, score, and classify all tech-debt issues"
needs = ["check-mode"]
description = """ description = """
This step only runs in grooming mode. Skip if in breakdown mode.
Fetch all open tech-debt issues: Fetch all open tech-debt issues:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \ curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?type=issues&state=open&limit=50" | \ "$CODEBERG_API/issues?type=issues&state=open&limit=50" | \

View file

@ -213,6 +213,36 @@ Read these inputs:
- Planner memory (loaded in preflight) - Planner memory (loaded in preflight)
- Promoted predictions from prediction-triage (add as prerequisites if relevant) - Promoted predictions from prediction-triage (add as prerequisites if relevant)
### Comment scanning for bounce/stuck detection
For each issue referenced in the prerequisite tree (by #number), fetch its
recent comments to detect signals that the issue is stuck or bouncing:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<number>/comments?limit=10"
Scan each comment body for these signals:
- **BOUNCED**: body contains "too large for single session", "too_large",
"underspecified", or "needs splitting" (case-insensitive). This means
the dev-agent refused the issue it needs breakdown before retry.
- **ESCALATED**: body contains "escalating for human decision", "needs
human decision", or "escalate" from a non-human author. The issue
needs steering input.
- **UNBLOCKED**: body contains "dependency .* is now closed" or
"unblocked". The issue may be ready to work.
- **LABEL_CHURN**: the issue has been relabeled between backlog and
underspecified (or blocked) 3+ times. Check via label change events
or multiple bounce comments. This indicates a ping-pong loop.
Track detected signals in a list: `stuck_issues[]` where each entry is:
{ issue: <number>, signal: BOUNCED|ESCALATED|LABEL_CHURN, count: <N>,
reason: "<summary from comments>" }
These signals feed into the file-at-constraints step to prevent the
planner from re-promoting stuck issues and to dispatch formula-based
breakdown instead.
Update the tree by applying these operations: Update the tree by applying these operations:
1. **Mark resolved prerequisites**: For each prerequisite in the tree, 1. **Mark resolved prerequisites**: For each prerequisite in the tree,
@ -306,7 +336,51 @@ Action issues count toward the 3-issue constraint budget — they are
strategic investments, not maintenance. The planner decides what data strategic investments, not maintenance. The planner decides what data
matters based on current constraints, not what formulas exist. matters based on current constraints, not what formulas exist.
Filing gate for each constraint: ### Stuck issue handling — dispatch to groom-backlog formula
Before filing, cross-reference the top 3 constraints against the
`stuck_issues[]` list from the update-prerequisite-tree step.
If a constraint issue was detected as BOUNCED or LABEL_CHURN:
- Do NOT re-promote it to backlog or add the priority label this
would restart the ping-pong loop.
- Instead, dispatch the groom-backlog formula to break it down.
Create an action issue that invokes groom-backlog with the stuck
issue as target:
Title: "chore: break down #<number> — bounced <count>x, needs splitting"
Body:
---
formula: groom-backlog
vars:
target_issue: <number>
mode: breakdown
reason: "<reason from stuck_issues entry>"
---
## Problem
Issue #<number> has bounced <count> time(s) between backlog and
underspecified. The dev-agent reports it is too large for a single
session. It needs to be broken into dev-agent-sized subtasks.
## Affected files
- formulas/groom-backlog.toml
## Acceptance criteria
- [ ] #<number> is split into implementable sub-issues
- [ ] Sub-issues have acceptance criteria and affected files
- [ ] Original issue updated with links to sub-issues
Label this action issue with the `action` label (not `backlog`).
This counts toward the 3-issue-per-run limit.
If a constraint issue was detected as ESCALATED:
- Do NOT file new work. Add a comment to the issue noting the
escalation was seen, and mark the prerequisite in the tree as:
`[ ] <name> escalated awaiting human decision`
- Do NOT count this against the 3-issue limit.
Filing gate for each constraint (that is NOT stuck):
1. Check if an issue already exists for this constraint (match by issue 1. Check if an issue already exists for this constraint (match by issue
number reference in the tree, or search open issues by title). number reference in the tree, or search open issues by title).
@ -467,6 +541,12 @@ Format:
2. <prerequisite> blocks N objectives issue #NNN 2. <prerequisite> blocks N objectives issue #NNN
3. <prerequisite> blocks N objectives issue #NNN 3. <prerequisite> blocks N objectives issue #NNN
## Stuck issues detected
- #NNN: BOUNCED (Nx) — dispatched groom-backlog as #MMM
- #NNN: ESCALATED — awaiting human decision
- #NNN: LABEL_CHURN (Nx) — dispatched groom-backlog as #MMM
(or "No stuck issues detected" if none)
## Issues created ## Issues created
- #NNN: title — why (constraint for objectives X, Y) - #NNN: title — why (constraint for objectives X, Y)
(or "No new issues — constraints already have open issues" if none) (or "No new issues — constraints already have open issues" if none)