From a225b05070fbfb2d88ac009bb4f745d0f862a974 Mon Sep 17 00:00:00 2001 From: openhands Date: Mon, 23 Mar 2026 11:51:43 +0000 Subject: [PATCH] =?UTF-8?q?fix:=20feat:=20predictor=20re-evaluates=20predi?= =?UTF-8?q?ction/backlog=20issues=20=E2=80=94=20evolve=20stale=20watches?= =?UTF-8?q?=20into=20targeted=20warnings=20(#588)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a re-evaluate-backlog step to the predictor formula between collect-signals and analyze-and-predict. For each open prediction/backlog issue, the predictor now reads the original context and planner comments, extracts the assumptions that made it "watch, don't act", and re-checks those conditions against current system state. Three outcomes: - CONDITIONS_CHANGED → file new prediction/unreviewed, close old as superseded - STALE (30+ days, conditions stable) → close as prediction/actioned - UNCHANGED_RECENT → skip (existing behavior) Co-Authored-By: Claude Opus 4.6 (1M context) --- formulas/run-predictor.toml | 121 ++++++++++++++++++++++++++++++++++-- predictor/AGENTS.md | 14 +++-- 2 files changed, 125 insertions(+), 10 deletions(-) diff --git a/formulas/run-predictor.toml b/formulas/run-predictor.toml index 2508d72..ac78e4a 100644 --- a/formulas/run-predictor.toml +++ b/formulas/run-predictor.toml @@ -4,7 +4,7 @@ # predictor-run.sh creates a tmux session with Claude (sonnet) and injects # this formula as context. Claude executes all steps autonomously. # -# Steps: preflight → collect-signals → analyze-and-predict +# Steps: preflight → collect-signals → re-evaluate-backlog → analyze-and-predict # # Signal sources (three categories): # Health signals: @@ -178,11 +178,123 @@ Look for: """ needs = ["preflight"] +[[steps]] +id = "re-evaluate-backlog" +title = "Re-evaluate open prediction/backlog watches" +description = """ +Re-check prediction/backlog issues to detect changed conditions or stale watches. +The collect-signals step already fetched prediction/backlog issues (step 5). +Now actively re-evaluate each one instead of just using them for dedup. + +For each open prediction/backlog issue: + +### 1. Read context + +Fetch the issue body and all comments: + curl -sf -H "Authorization: token $CODEBERG_TOKEN" \ + "$CODEBERG_API/issues/" + curl -sf -H "Authorization: token $CODEBERG_TOKEN" \ + "$CODEBERG_API/issues//comments" + +Pay attention to: +- The original prediction body (signal source, confidence, suggested action) +- The planner's triage comment (the "Watching — ..." comment with reasoning) +- Any subsequent comments with updated context +- The issue's created_at and updated_at timestamps + +### 2. Extract conditions + +From the planner's triage comment and original prediction body, identify the +specific assumptions that made this a "watch, don't act" decision. Examples: +- "static site config, no FastCGI" (Caddy CVE watch) +- "RAM stable above 3GB" (resource pressure watch) +- "no reverse proxy configured" (security exposure watch) +- "dependency not in use yet" (CVE watch for unused feature) + +### 3. Re-check conditions + +Verify each assumption still holds by checking current system state: +- Config files: read relevant configs in $PROJECT_REPO_ROOT +- Versions: check installed versions of referenced tools/dependencies +- Infrastructure: re-run relevant resource/health checks from collect-signals +- Code changes: check git log for changes to affected files since the issue was created: + git log --oneline --since="" -- + +### 4. Decide + +For each prediction/backlog issue, choose one action: + +**CONDITIONS_CHANGED** — one or more assumptions no longer hold: + a. Resolve the prediction/backlog and prediction/unreviewed label IDs: + curl -sf -H "Authorization: token $CODEBERG_TOKEN" \ + "$CODEBERG_API/labels" | jq '.[] | select(.name == "prediction/unreviewed") | .id' + curl -sf -H "Authorization: token $CODEBERG_TOKEN" \ + "$CODEBERG_API/labels" | jq '.[] | select(.name == "prediction/actioned") | .id' + b. File a NEW prediction/unreviewed issue with updated context: + curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/issues" \ + -d '{"title":" — CONDITIONS CHANGED", + "body":"Re-evaluation of #: conditions have changed.\\n\\n\\n\\nOriginal prediction: #\\n\\n---\\n**Signal source:** re-evaluation of prediction/backlog #\\n**Confidence:** \\n**Suggested action:** ", + "labels":[]}' + c. Comment on the OLD issue explaining what changed: + curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/issues//comments" \ + -d '{"body":"Superseded by # — conditions changed: "}' + d. Relabel old issue: remove prediction/backlog, add prediction/actioned: + curl -sf -X DELETE -H "Authorization: token $CODEBERG_TOKEN" \ + "$CODEBERG_API/issues//labels/" + curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/issues//labels" \ + -d '{"labels":[]}' + e. Close the old issue: + curl -sf -X PATCH -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/issues/" \ + -d '{"state":"closed"}' + +**STALE** — 30+ days since last update AND conditions unchanged: + a. Comment explaining the closure: + curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/issues//comments" \ + -d '{"body":"Closing stale watch — conditions stable for 30+ days. Will re-file if conditions change."}' + b. Relabel: remove prediction/backlog, add prediction/actioned: + curl -sf -X DELETE -H "Authorization: token $CODEBERG_TOKEN" \ + "$CODEBERG_API/issues//labels/" + curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/issues//labels" \ + -d '{"labels":[]}' + c. Close the issue: + curl -sf -X PATCH -H "Authorization: token $CODEBERG_TOKEN" \ + -H "Content-Type: application/json" \ + "$CODEBERG_API/issues/" \ + -d '{"state":"closed"}' + +**UNCHANGED_RECENT** — conditions unchanged AND last update < 30 days ago: + Skip — no action needed. This is the current behavior. + +## Rules +- Process ALL open prediction/backlog issues (already fetched in collect-signals step 5) +- New predictions filed here count toward the 5-prediction cap in analyze-and-predict +- Track how many new predictions were filed so analyze-and-predict can adjust its cap +- Be conservative: only mark CONDITIONS_CHANGED when you have concrete evidence +- Use the updated_at timestamp from the issue API to determine staleness +""" +needs = ["collect-signals"] + [[steps]] id = "analyze-and-predict" title = "Analyze signals and file prediction issues" description = """ -Analyze the collected signals for patterns and file up to 5 prediction issues. +Analyze the collected signals for patterns and file prediction issues. + +The re-evaluate-backlog step may have already filed new predictions from changed +conditions. Subtract those from the 5-prediction cap: if re-evaluation filed N +predictions, you may file at most (5 - N) new predictions in this step. ## What to look for @@ -259,14 +371,15 @@ For each prediction, create a Codeberg issue with the `prediction/unreviewed` la Use matrix_send if available, or skip if MATRIX_TOKEN is not set. ## Rules -- Max 5 predictions total +- Max 5 predictions total (including any filed during re-evaluate-backlog) - Do NOT predict feature work — only health observations, outcome measurements, and external risk/opportunity signals - Do NOT duplicate existing open predictions (checked in collect-signals) +- Do NOT duplicate predictions just filed by re-evaluate-backlog for changed conditions - Be specific: name the metric, the value, the threshold - Prefer high-confidence predictions backed by concrete data - External signals must name the specific dependency/tool and the advisory/change - If no meaningful patterns found, file zero issues — that is a valid outcome """ -needs = ["collect-signals"] +needs = ["re-evaluate-backlog"] diff --git a/predictor/AGENTS.md b/predictor/AGENTS.md index ddb764d..6a637ce 100644 --- a/predictor/AGENTS.md +++ b/predictor/AGENTS.md @@ -1,9 +1,9 @@ # Predictor Agent -**Role**: Risk oracle and opportunity spotter (the "goblin"). Runs a 3-step -formula (preflight → collect-signals → analyze-and-predict) via interactive -tmux Claude session (sonnet). Collects three categories of signals: +**Role**: Risk oracle and opportunity spotter (the "goblin"). Runs a 4-step +formula (preflight → collect-signals → re-evaluate-backlog → analyze-and-predict) +via interactive tmux Claude session (sonnet). Collects three categories of signals: 1. **Health signals** — CI pipeline trends (Woodpecker), stale issues, agent health (tmux sessions + logs), resource patterns (RAM, disk, load, containers) @@ -27,9 +27,10 @@ memory check (skips if available RAM < 2000 MB). sources disinto project config, builds prompt with formula + Codeberg API reference, creates tmux session (sonnet), monitors phase file, handles crash recovery via `run_formula_and_monitor` -- `formulas/run-predictor.toml` — Execution spec: three steps (preflight, - collect-signals, analyze-and-predict) with `needs` dependencies. Claude - collects signals and files prediction issues in a single interactive session +- `formulas/run-predictor.toml` — Execution spec: four steps (preflight, + collect-signals, re-evaluate-backlog, analyze-and-predict) with `needs` + dependencies. Claude collects signals, re-evaluates watched predictions, + and files prediction issues in a single interactive session **Environment variables consumed**: - `CODEBERG_TOKEN`, `CODEBERG_REPO`, `CODEBERG_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT` @@ -42,5 +43,6 @@ load formula + context → create tmux session → Claude collects signals (health: CI trends, stale issues, agent health, resources; outcomes: output freshness, capacity utilization, throughput; external: dependency advisories, ecosystem changes via web search) → dedup against existing open predictions → +re-evaluate prediction/backlog watches (close stale, supersede changed) → file `prediction/unreviewed` issues → `PHASE:done`. The planner's Phase 1 later triages these predictions.