From 86c8ef47201cc54dd8af43df6316bdb67b4c5d86 Mon Sep 17 00:00:00 2001 From: openhands Date: Wed, 25 Mar 2026 17:16:13 +0000 Subject: [PATCH] =?UTF-8?q?fix:=20feat:=20kill=20prediction/backlog=20?= =?UTF-8?q?=E2=80=94=20planner=20must=20act=20or=20dismiss,=20with=20actio?= =?UTF-8?q?n=20budget=20(#686)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 2 +- dev/dev-poll.sh | 4 ++-- formulas/run-planner.toml | 19 ++++++++++++------- formulas/run-predictor.toml | 15 +++++---------- planner/AGENTS.md | 6 ++++-- 5 files changed, 24 insertions(+), 22 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 2c2fc37..da3bfe8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -106,7 +106,7 @@ Issues flow: `backlog` → `in-progress` → PR → CI → review → merge → | `underspecified` | Dev-agent refused the issue as too large or vague. | dev-poll.sh (on preflight `too_large`), dev-agent.sh (on mid-run `too_large` refusal) | | `vision` | Goal anchors — high-level objectives from VISION.md. | Planner, humans | | `prediction/unreviewed` | Unprocessed prediction filed by predictor. | predictor-run.sh | -| `prediction/backlog` | Prediction triaged as WATCH — not urgent, tracked. | Planner (triage-predictions step) | +| `prediction/dismissed` | Prediction triaged as DISMISS — planner disagrees, closed with reason. | Planner (triage-predictions step) | | `prediction/actioned` | Prediction promoted or dismissed by planner. | Planner (triage-predictions step) | | `action` | Operational task for the action-agent to execute via formula. | Planner, humans | diff --git a/dev/dev-poll.sh b/dev/dev-poll.sh index ef03a9b..e1aa644 100755 --- a/dev/dev-poll.sh +++ b/dev/dev-poll.sh @@ -389,7 +389,7 @@ if [ "$ORPHAN_COUNT" -gt 0 ]; then # Formula guard: formula-labeled issues should not be worked on by dev-agent. # Remove in-progress label and skip to prevent infinite respawn cycle (#115). ORPHAN_LABELS=$(echo "$ORPHANS_JSON" | jq -r '.[0].labels[].name' 2>/dev/null) || true - SKIP_LABEL=$(echo "$ORPHAN_LABELS" | grep -oE '^(formula|action|prediction/backlog|prediction/unreviewed)$' | head -1) || true + SKIP_LABEL=$(echo "$ORPHAN_LABELS" | grep -oE '^(formula|action|prediction/dismissed|prediction/unreviewed)$' | head -1) || true if [ -n "$SKIP_LABEL" ]; then log "issue #${ISSUE_NUM} has '${SKIP_LABEL}' label — removing in-progress, skipping" curl -sf -X DELETE -H "Authorization: token ${FORGE_TOKEN}" \ @@ -640,7 +640,7 @@ for i in $(seq 0 $((BACKLOG_COUNT - 1))); do # Formula guard: formula-labeled issues must not be picked up by dev-agent. # A formula issue that accidentally acquires the backlog label should be skipped. ISSUE_LABELS=$(echo "$BACKLOG_JSON" | jq -r ".[$i].labels[].name" 2>/dev/null) || true - SKIP_LABEL=$(echo "$ISSUE_LABELS" | grep -oE '^(formula|action|prediction/backlog|prediction/unreviewed)$' | head -1) || true + SKIP_LABEL=$(echo "$ISSUE_LABELS" | grep -oE '^(formula|action|prediction/dismissed|prediction/unreviewed)$' | head -1) || true if [ -n "$SKIP_LABEL" ]; then log "issue #${ISSUE_NUM} has '${SKIP_LABEL}' label — skipping in backlog scan" continue diff --git a/formulas/run-planner.toml b/formulas/run-planner.toml index 95fc0f2..af4488e 100644 --- a/formulas/run-planner.toml +++ b/formulas/run-planner.toml @@ -71,15 +71,18 @@ and file-at-constraints steps. 3. Read available formulas: $FACTORY_ROOT/formulas/*.toml and $PROJECT_REPO_ROOT/formulas/*.toml -4. For each prediction, choose one action: - - PROMOTE_ACTION: maps to a formula -> create action issue, close prediction - - PROMOTE_BACKLOG: warrants dev work -> create backlog issue, close prediction - - WATCH: not urgent -> comment why, relabel to prediction/backlog, keep open - - DISMISS: noise or covered -> comment reasoning, close prediction +4. For each prediction, choose one of two actions (no fence-sitting): + - ACTION: agree with the prediction -> create a backlog or action issue, + relabel to prediction/actioned, close the prediction + - DISMISS: disagree or noise -> comment reasoning, relabel to + prediction/dismissed, close the prediction + + If a dismissed prediction is real, the predictor will re-file it with + stronger evidence next run. The issue history is the predictor's memory. 5. Execute triage using tea helpers: - Create issues: tea_file_issue "" "<body>" "backlog" (or "action") - - Relabel: tea_relabel <num> "prediction/actioned" (or "prediction/backlog") + - Relabel: tea_relabel <num> "prediction/actioned" (or "prediction/dismissed") - Comment: tea_comment <num> "<reasoning>" - Close: tea_close <num> @@ -163,7 +166,9 @@ Vault procurement: if a constraint needs a resource not in RESOURCES.md with recurring cost, create vault/pending/<resource-id>.md instead of an issue. Rules: -- Maximum 5 items per run (issues + procurement combined) +- Action budget: the planner may create at most (predictions_addressed + 1) + new issues per run if any predictions were triaged, or 4 if no predictions. + This covers issues from Part A promotions + Part C constraint filing combined. - No issues filed past the bottleneck - Leave existing premature issues as-is - Only reference formulas that exist on disk diff --git a/formulas/run-predictor.toml b/formulas/run-predictor.toml index fcbe849..c4da028 100644 --- a/formulas/run-predictor.toml +++ b/formulas/run-predictor.toml @@ -37,7 +37,7 @@ Set up the working environment and load your prediction history. curl -sf -H "Authorization: token $FORGE_TOKEN" \ "$FORGE_API/issues?state=open&type=issues&labels=prediction%2Funreviewed&limit=50" curl -sf -H "Authorization: token $FORGE_TOKEN" \ - "$FORGE_API/issues?state=open&type=issues&labels=prediction%2Fbacklog&limit=50" + "$FORGE_API/issues?state=closed&type=issues&labels=prediction%2Fdismissed&limit=50&sort=updated&direction=desc" curl -sf -H "Authorization: token $FORGE_TOKEN" \ "$FORGE_API/issues?state=closed&type=issues&labels=prediction%2Factioned&limit=50&sort=updated&direction=desc" @@ -45,7 +45,7 @@ Set up the working environment and load your prediction history. - What you predicted (title + body) - What the planner decided (comments — look for triage reasoning) - Outcome: actioned (planner valued it), dismissed (planner rejected it), - watching (planner deferred it), unreviewed (planner hasn't seen it yet) + unreviewed (planner hasn't seen it yet) 3. Read the prerequisite tree: cat "$PROJECT_REPO_ROOT/planner/prerequisite-tree.md" @@ -75,12 +75,8 @@ Review your prediction history from the preflight step: - Which predictions did the planner action? Those are areas where your instincts were right. The planner values those signals. - Which were dismissed? You were wrong or the planner disagreed. Don't - repeat the same theory without new evidence. -- Which are watching (prediction/backlog)? Check if conditions changed. - If changed → file a new prediction superseding it (close the old one - as prediction/actioned with "superseded by #NNN"). - If stale (30+ days, unchanged) → close it. - If recent and unchanged → leave it. + repeat the same theory without new evidence. If conditions have changed + since the dismissal, you may re-file with stronger evidence. ## Finding weaknesses @@ -169,7 +165,7 @@ tea is pre-configured with login "$TEA_LOGIN" and repo "$FORGE_REPO". --body "Superseded by #NNN" 5. Do NOT duplicate existing open predictions. If your theory matches - an open prediction/unreviewed or prediction/backlog issue, skip it. + an open prediction/unreviewed issue, skip it. ## Rules @@ -177,7 +173,6 @@ tea is pre-configured with login "$TEA_LOGIN" and repo "$FORGE_REPO". - Each exploit counts as 2 (prediction + action dispatch) - So: 5 explores, or 2 exploits + 1 explore, or 1 exploit + 3 explores - Never re-file a dismissed prediction without new evidence -- When superseding a prediction/backlog issue, close the old one properly - Action issues must reference existing formulas — don't invent formulas - Be specific: name the file, the metric, the threshold, the formula - If no weaknesses found, file nothing — that's a strong signal the project is healthy diff --git a/planner/AGENTS.md b/planner/AGENTS.md index 08b6260..9f0a56f 100644 --- a/planner/AGENTS.md +++ b/planner/AGENTS.md @@ -8,8 +8,10 @@ tree from `planner/MEMORY.md` and `planner/prerequisite-tree.md`. Also reads all available formulas: factory formulas (`$FACTORY_ROOT/formulas/*.toml`) and project-specific formulas (`$PROJECT_REPO_ROOT/formulas/*.toml`). Phase 1 (prediction-triage): triage `prediction/unreviewed` issues filed by the -Predictor — for each prediction: promote to action, promote to backlog, watch -(relabel to prediction/backlog), or dismiss with reasoning. Phase 2 +Predictor — for each prediction: action (create issue, relabel to +prediction/actioned, close) or dismiss (comment reason, relabel to +prediction/dismissed, close). No fence-sitting — dismissed predictions get +re-filed by the predictor with stronger evidence if still valid. Phase 2 (update-prerequisite-tree): scan repo state + open/closed issues, mark resolved prerequisites, discover new ones, update the tree. **Also scans comments on referenced issues for bounce/stuck signals** (BOUNCED, ESCALATED, LABEL_CHURN)