disinto/formulas/run-predictor.toml
openhands a225b05070 fix: feat: predictor re-evaluates prediction/backlog issues — evolve stale watches into targeted warnings (#588)
Add a re-evaluate-backlog step to the predictor formula between
collect-signals and analyze-and-predict. For each open prediction/backlog
issue, the predictor now reads the original context and planner comments,
extracts the assumptions that made it "watch, don't act", and re-checks
those conditions against current system state.

Three outcomes:
- CONDITIONS_CHANGED → file new prediction/unreviewed, close old as superseded
- STALE (30+ days, conditions stable) → close as prediction/actioned
- UNCHANGED_RECENT → skip (existing behavior)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:51:43 +00:00

385 lines
17 KiB
TOML

# formulas/run-predictor.toml — Predictor formula (disinto-specific signals)
#
# Executed by predictor/predictor-run.sh via cron — no action issues.
# predictor-run.sh creates a tmux session with Claude (sonnet) and injects
# this formula as context. Claude executes all steps autonomously.
#
# Steps: preflight → collect-signals → re-evaluate-backlog → analyze-and-predict
#
# Signal sources (three categories):
# Health signals:
# - CI pipeline trends (Woodpecker)
# - Stale issues (open issues with no recent activity)
# - Agent health (tmux sessions, recent logs)
# - Resource patterns (RAM, disk, load, containers)
# Outcome signals:
# - Output freshness (formula evidence/artifacts)
# - Capacity utilization (idle agents vs dispatchable work)
# - Throughput (recently closed issues, merged PRs)
# External signals:
# - Dependency security advisories
# - Upstream breaking changes and deprecations
name = "run-predictor"
description = "Evidence-based prediction: health, outcome measurement, external environment signals"
version = 2
model = "sonnet"
[context]
files = ["AGENTS.md", "RESOURCES.md"]
[[steps]]
id = "preflight"
title = "Pull latest code and gather environment"
description = """
Set up the working environment for this prediction run.
1. Change to the project repository:
cd "$PROJECT_REPO_ROOT"
2. Pull the latest code:
git fetch origin "$PRIMARY_BRANCH" --quiet
git checkout "$PRIMARY_BRANCH" --quiet
git pull --ff-only origin "$PRIMARY_BRANCH" --quiet
"""
[[steps]]
id = "collect-signals"
title = "Collect disinto-specific signals"
description = """
Gather raw signal data for pattern analysis. Collect each signal category
and store the results for the analysis step.
### 1. CI pipeline trends (Woodpecker)
Fetch recent builds from Woodpecker CI:
curl -sf -H "Authorization: Bearer $WOODPECKER_TOKEN" \
"${WOODPECKER_SERVER}/api/repos/${WOODPECKER_REPO_ID}/pipelines?page=1&perPage=20"
Look for:
- Build failure rate over last 20 builds
- Repeated failures on the same step
- Builds stuck in running/pending state
- Time since last successful build
If WOODPECKER_TOKEN or WOODPECKER_SERVER are not set, skip CI signals and note
"CI signals unavailable WOODPECKER_TOKEN not configured".
### 2. Stale issues
Fetch all open issues:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?state=open&type=issues&limit=50&sort=updated&direction=asc"
Identify:
- Issues with no update in 14+ days (stale)
- Issues with no update in 30+ days (very stale)
- Issues labeled 'action' or 'backlog' that are stale (work not progressing)
- Blocked issues where the blocker may have been resolved
### 3. Agent health
Check active tmux sessions:
tmux list-sessions 2>/dev/null || echo "no sessions"
Check recent agent logs (last 24h of activity):
for log in supervisor/supervisor.log planner/planner.log planner/prediction.log \
gardener/gardener.log dev/dev.log review/review.log; do
if [ -f "$PROJECT_REPO_ROOT/$log" ]; then
echo "=== $log (last 20 lines) ==="
tail -20 "$PROJECT_REPO_ROOT/$log"
fi
done
Look for:
- Agents that haven't run recently (missing log entries in last 24h)
- Repeated errors or failures in logs
- Sessions stuck or crashed (tmux sessions present but no recent activity)
- Lock files that may be stale: /tmp/*-poll.lock, /tmp/*-run.lock
### 4. Resource patterns
Collect current resource state:
free -m # RAM
df -h / # Disk
cat /proc/loadavg # Load average
docker ps --format '{{.Names}} {{.Status}}' 2>/dev/null || true # Containers
Look for:
- Available RAM < 2000MB (agents will skip runs)
- Disk usage > 80% (approaching danger zone)
- Load average > 3.0 (box overloaded)
- Containers in unhealthy or restarting state
### 5. Already-open predictions (deduplication)
Fetch existing open predictions to avoid duplicates:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?state=open&type=issues&labels=prediction%2Funreviewed&limit=50"
Also check prediction/backlog (watched but not yet actioned):
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?state=open&type=issues&labels=prediction%2Fbacklog&limit=50"
Record their titles so you can avoid duplicating them.
### 6. Outcome measurement
Check whether the factory is producing results, not just running:
- Read RESOURCES.md for available formulas and capabilities
- Read $PROJECT_REPO_ROOT/formulas/*.toml for dispatchable work
- Check evidence/output directories for freshness:
find "$PROJECT_REPO_ROOT" -maxdepth 3 -name "*.log" -o -name "journal" -type d | \
while read -r f; do
echo "=== $f ==="
find "$f" -maxdepth 1 -type f -printf '%T+ %p\n' 2>/dev/null | sort -r | head -5
done
- Check recently closed issues — is work completing or just cycling?
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?state=closed&type=issues&limit=20&sort=updated&direction=desc"
- Check recently merged PRs — what's the throughput?
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/pulls?state=closed&sort=updated&direction=desc&limit=20" | \
jq '[.[] | select(.merged)]'
- Compare available capacity vs actual utilization:
tmux list-sessions 2>/dev/null | wc -l # active sessions
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?state=open&type=issues&labels=backlog&limit=50" | jq 'length'
Look for:
- Formulas that haven't produced output recently (stale journals/logs)
- Idle compute when dispatchable work exists (backlog items but no active sessions)
- High churn (issues opened and closed rapidly without merged PRs)
- Low throughput relative to available agents
### 7. External environment scan
Look outside the box for signals that could affect the project:
- Identify key dependencies from the project (package.json, go.mod, Cargo.toml,
requirements.txt, or similar — whatever exists in $PROJECT_REPO_ROOT)
- Identify key tools (Claude CLI version, Woodpecker CI, Caddy, Docker, etc.)
- For each major dependency or tool, use web search to check for:
- Security advisories or CVEs
- Breaking changes in recent releases
- Deprecation notices
- Major version bumps that could require migration
Use WebSearch to gather these signals. Be targeted — search for specific
dependencies and tools used by the project, not general news.
Limit to 5 web searches maximum to keep the run fast.
Look for:
- CVEs or security advisories mentioning project dependencies
- Major version releases of key tools (could break CI, require migration)
- Deprecation notices for APIs or services in use
- Ecosystem shifts that could obsolete current approaches
"""
needs = ["preflight"]
[[steps]]
id = "re-evaluate-backlog"
title = "Re-evaluate open prediction/backlog watches"
description = """
Re-check prediction/backlog issues to detect changed conditions or stale watches.
The collect-signals step already fetched prediction/backlog issues (step 5).
Now actively re-evaluate each one instead of just using them for dedup.
For each open prediction/backlog issue:
### 1. Read context
Fetch the issue body and all comments:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<issue_number>"
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<issue_number>/comments"
Pay attention to:
- The original prediction body (signal source, confidence, suggested action)
- The planner's triage comment (the "Watching ..." comment with reasoning)
- Any subsequent comments with updated context
- The issue's created_at and updated_at timestamps
### 2. Extract conditions
From the planner's triage comment and original prediction body, identify the
specific assumptions that made this a "watch, don't act" decision. Examples:
- "static site config, no FastCGI" (Caddy CVE watch)
- "RAM stable above 3GB" (resource pressure watch)
- "no reverse proxy configured" (security exposure watch)
- "dependency not in use yet" (CVE watch for unused feature)
### 3. Re-check conditions
Verify each assumption still holds by checking current system state:
- Config files: read relevant configs in $PROJECT_REPO_ROOT
- Versions: check installed versions of referenced tools/dependencies
- Infrastructure: re-run relevant resource/health checks from collect-signals
- Code changes: check git log for changes to affected files since the issue was created:
git log --oneline --since="<issue_created_at>" -- <affected_files>
### 4. Decide
For each prediction/backlog issue, choose one action:
**CONDITIONS_CHANGED** — one or more assumptions no longer hold:
a. Resolve the prediction/backlog and prediction/unreviewed label IDs:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/labels" | jq '.[] | select(.name == "prediction/unreviewed") | .id'
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/labels" | jq '.[] | select(.name == "prediction/actioned") | .id'
b. File a NEW prediction/unreviewed issue with updated context:
curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues" \
-d '{"title":"<original title> — CONDITIONS CHANGED",
"body":"Re-evaluation of #<old_number>: conditions have changed.\\n\\n<what changed and why risk level is different now>\\n\\nOriginal prediction: #<old_number>\\n\\n---\\n**Signal source:** re-evaluation of prediction/backlog #<old_number>\\n**Confidence:** <high|medium|low>\\n**Suggested action:** <concrete next step>",
"labels":[<unreviewed_label_id>]}'
c. Comment on the OLD issue explaining what changed:
curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues/<old_number>/comments" \
-d '{"body":"Superseded by #<new_number> — conditions changed: <summary>"}'
d. Relabel old issue: remove prediction/backlog, add prediction/actioned:
curl -sf -X DELETE -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<old_number>/labels/<backlog_label_id>"
curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues/<old_number>/labels" \
-d '{"labels":[<actioned_label_id>]}'
e. Close the old issue:
curl -sf -X PATCH -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues/<old_number>" \
-d '{"state":"closed"}'
**STALE** — 30+ days since last update AND conditions unchanged:
a. Comment explaining the closure:
curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues/<issue_number>/comments" \
-d '{"body":"Closing stale watch — conditions stable for 30+ days. Will re-file if conditions change."}'
b. Relabel: remove prediction/backlog, add prediction/actioned:
curl -sf -X DELETE -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues/<issue_number>/labels/<backlog_label_id>"
curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues/<issue_number>/labels" \
-d '{"labels":[<actioned_label_id>]}'
c. Close the issue:
curl -sf -X PATCH -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues/<issue_number>" \
-d '{"state":"closed"}'
**UNCHANGED_RECENT** — conditions unchanged AND last update < 30 days ago:
Skip — no action needed. This is the current behavior.
## Rules
- Process ALL open prediction/backlog issues (already fetched in collect-signals step 5)
- New predictions filed here count toward the 5-prediction cap in analyze-and-predict
- Track how many new predictions were filed so analyze-and-predict can adjust its cap
- Be conservative: only mark CONDITIONS_CHANGED when you have concrete evidence
- Use the updated_at timestamp from the issue API to determine staleness
"""
needs = ["collect-signals"]
[[steps]]
id = "analyze-and-predict"
title = "Analyze signals and file prediction issues"
description = """
Analyze the collected signals for patterns and file prediction issues.
The re-evaluate-backlog step may have already filed new predictions from changed
conditions. Subtract those from the 5-prediction cap: if re-evaluation filed N
predictions, you may file at most (5 - N) new predictions in this step.
## What to look for
**CI regression** — Build failure rate increasing or repeated failures:
- Failure rate > 30% over last 20 builds → high confidence
- Same step failing 3+ times in a row → high confidence
- No successful build in 24+ hours → medium confidence
**Stale work** — Issues not progressing:
- Action issues stale 7+ days → the action agent may be stuck
- Backlog issues stale 14+ days → work not being picked up
- Blocked issues whose blockers are now closed → can be unblocked
**Agent health** — Agents not running or failing:
- Agent log with no entries in 24+ hours → agent may be down
- Repeated errors in agent logs → systemic problem
- Stale lock files (process not running but lock exists)
**Resource pressure** — System approaching limits:
- RAM < 2000MB → agents will start skipping runs
- Disk > 80% → approaching critical threshold
- Load sustained > 3.0 → box is overloaded, queued work backing up
**Opportunity** — Good conditions for expensive work:
- Box idle (RAM > 3000MB, load < 1.0, few active sessions) → good time
for expensive operations if any are pending
**Low throughput** — Factory running but not producing:
- No issues closed in 7+ days despite available backlog → pipeline may be stuck
- PRs merged but no issues closed → work not tracked properly
- Agent sessions active but no PRs created → agents may be spinning
- Formulas with no recent journal entries → agent may not be running
**Idle capacity** — Dispatchable work not being picked up:
- Backlog items available but no in-progress issues → dev-poll may be stuck
- Multiple agents idle (few tmux sessions) with work queued → scheduling problem
- High churn: issues opened and closed quickly without PRs → busy but not productive
**External risk** — Threats or opportunities from outside:
- CVE or security advisory for a project dependency → patch urgently
- Major version release of a key tool → may require migration planning
- Deprecation notice for an API or service in use → plan transition
- Breaking change upstream that could affect CI or builds → investigate
**External opportunity** — Beneficial changes in the ecosystem:
- New tool release that could accelerate work → consider adoption
- Upstream improvement that simplifies current workarounds → refactor opportunity
- Security patch available for a known vulnerability → apply proactively
## Filing predictions
For each prediction, create a Codeberg issue with the `prediction/unreviewed` label.
1. Look up the label ID:
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/labels" | jq '.[] | select(.name == "prediction/unreviewed") | .id'
2. For each prediction, create an issue:
curl -sf -X POST -H "Authorization: token $CODEBERG_TOKEN" \
-H "Content-Type: application/json" \
"$CODEBERG_API/issues" \
-d '{"title":"<title>","body":"<body>","labels":[<label_id>]}'
Body format:
<2-4 sentence description of what was observed, why it matters,
what the planner should consider>
---
**Signal source:** <which signal triggered this>
**Confidence:** <high|medium|low>
**Suggested action:** <concrete next step for the planner>
3. Send a Matrix notification for each prediction created (optional):
Use matrix_send if available, or skip if MATRIX_TOKEN is not set.
## Rules
- Max 5 predictions total (including any filed during re-evaluate-backlog)
- Do NOT predict feature work only health observations, outcome measurements,
and external risk/opportunity signals
- Do NOT duplicate existing open predictions (checked in collect-signals)
- Do NOT duplicate predictions just filed by re-evaluate-backlog for changed conditions
- Be specific: name the metric, the value, the threshold
- Prefer high-confidence predictions backed by concrete data
- External signals must name the specific dependency/tool and the advisory/change
- If no meaningful patterns found, file zero issues that is a valid outcome
"""
needs = ["re-evaluate-backlog"]