Merge pull request 'fix: feat: predictor v2 — outcome measurement + external signal scanning (#547)' (#551) from fix/issue-547 into main

This commit is contained in:
johba 2026-03-22 13:19:02 +01:00
commit b017b37e94
3 changed files with 120 additions and 19 deletions

View file

@ -6,15 +6,23 @@
# #
# Steps: preflight → collect-signals → analyze-and-predict # Steps: preflight → collect-signals → analyze-and-predict
# #
# Disinto-specific signal sources: # Signal sources (three categories):
# Health signals:
# - CI pipeline trends (Woodpecker) # - CI pipeline trends (Woodpecker)
# - Stale issues (open issues with no recent activity) # - Stale issues (open issues with no recent activity)
# - Agent health (tmux sessions, recent logs) # - Agent health (tmux sessions, recent logs)
# - Resource patterns (RAM, disk, load, containers) # - Resource patterns (RAM, disk, load, containers)
# Outcome signals:
# - Output freshness (formula evidence/artifacts)
# - Capacity utilization (idle agents vs dispatchable work)
# - Throughput (recently closed issues, merged PRs)
# External signals:
# - Dependency security advisories
# - Upstream breaking changes and deprecations
name = "run-predictor" name = "run-predictor"
description = "Evidence-based prediction: CI trends, stale issues, agent health, resource patterns" description = "Evidence-based prediction: health, outcome measurement, external environment signals"
version = 1 version = 2
model = "sonnet" model = "sonnet"
[context] [context]
@ -114,6 +122,59 @@ Also check prediction/backlog (watched but not yet actioned):
"$CODEBERG_API/issues?state=open&type=issues&labels=prediction%2Fbacklog&limit=50" "$CODEBERG_API/issues?state=open&type=issues&labels=prediction%2Fbacklog&limit=50"
Record their titles so you can avoid duplicating them. Record their titles so you can avoid duplicating them.
### 6. Outcome measurement
Check whether the factory is producing results, not just running:
- Read RESOURCES.md for available formulas and capabilities
- Read $PROJECT_REPO_ROOT/formulas/*.toml for dispatchable work
- Check evidence/output directories for freshness:
find "$PROJECT_REPO_ROOT" -maxdepth 3 -name "*.log" -o -name "journal" -type d | \
while read -r f; do
echo "=== $f ==="
find "$f" -maxdepth 1 -type f -printf '%T+ %p\n' 2>/dev/null | sort -r | head -5
done
- Check recently closed issues is work completing or just cycling?
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?state=closed&type=issues&limit=20&sort=updated&direction=desc"
- Check recently merged PRs what's the throughput?
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/pulls?state=closed&sort=updated&direction=desc&limit=20" | \
jq '[.[] | select(.merged)]'
- Compare available capacity vs actual utilization:
tmux list-sessions 2>/dev/null | wc -l # active sessions
curl -sf -H "Authorization: token $CODEBERG_TOKEN" \
"$CODEBERG_API/issues?state=open&type=issues&labels=backlog&limit=50" | jq 'length'
Look for:
- Formulas that haven't produced output recently (stale journals/logs)
- Idle compute when dispatchable work exists (backlog items but no active sessions)
- High churn (issues opened and closed rapidly without merged PRs)
- Low throughput relative to available agents
### 7. External environment scan
Look outside the box for signals that could affect the project:
- Identify key dependencies from the project (package.json, go.mod, Cargo.toml,
requirements.txt, or similar whatever exists in $PROJECT_REPO_ROOT)
- Identify key tools (Claude CLI version, Woodpecker CI, Caddy, Docker, etc.)
- For each major dependency or tool, use web search to check for:
- Security advisories or CVEs
- Breaking changes in recent releases
- Deprecation notices
- Major version bumps that could require migration
Use WebSearch to gather these signals. Be targeted search for specific
dependencies and tools used by the project, not general news.
Limit to 5 web searches maximum to keep the run fast.
Look for:
- CVEs or security advisories mentioning project dependencies
- Major version releases of key tools (could break CI, require migration)
- Deprecation notices for APIs or services in use
- Ecosystem shifts that could obsolete current approaches
""" """
needs = ["preflight"] needs = ["preflight"]
@ -149,6 +210,28 @@ Analyze the collected signals for patterns and file up to 5 prediction issues.
- Box idle (RAM > 3000MB, load < 1.0, few active sessions) good time - Box idle (RAM > 3000MB, load < 1.0, few active sessions) good time
for expensive operations if any are pending for expensive operations if any are pending
**Low throughput** Factory running but not producing:
- No issues closed in 7+ days despite available backlog pipeline may be stuck
- PRs merged but no issues closed work not tracked properly
- Agent sessions active but no PRs created agents may be spinning
- Formulas with no recent journal entries agent may not be running
**Idle capacity** Dispatchable work not being picked up:
- Backlog items available but no in-progress issues dev-poll may be stuck
- Multiple agents idle (few tmux sessions) with work queued scheduling problem
- High churn: issues opened and closed quickly without PRs busy but not productive
**External risk** Threats or opportunities from outside:
- CVE or security advisory for a project dependency patch urgently
- Major version release of a key tool may require migration planning
- Deprecation notice for an API or service in use plan transition
- Breaking change upstream that could affect CI or builds investigate
**External opportunity** Beneficial changes in the ecosystem:
- New tool release that could accelerate work consider adoption
- Upstream improvement that simplifies current workarounds refactor opportunity
- Security patch available for a known vulnerability apply proactively
## Filing predictions ## Filing predictions
For each prediction, create a Codeberg issue with the `prediction/unreviewed` label. For each prediction, create a Codeberg issue with the `prediction/unreviewed` label.
@ -177,10 +260,12 @@ For each prediction, create a Codeberg issue with the `prediction/unreviewed` la
## Rules ## Rules
- Max 5 predictions total - Max 5 predictions total
- Do NOT predict feature work only infrastructure/health/metric observations - Do NOT predict feature work only health observations, outcome measurements,
and external risk/opportunity signals
- Do NOT duplicate existing open predictions (checked in collect-signals) - Do NOT duplicate existing open predictions (checked in collect-signals)
- Be specific: name the metric, the value, the threshold - Be specific: name the metric, the value, the threshold
- Prefer high-confidence predictions backed by concrete data - Prefer high-confidence predictions backed by concrete data
- External signals must name the specific dependency/tool and the advisory/change
- If no meaningful patterns found, file zero issues that is a valid outcome - If no meaningful patterns found, file zero issues that is a valid outcome
""" """

View file

@ -1,14 +1,22 @@
<!-- last-reviewed: 251d160e213b19a4fcc0cd8f8e3be9ea3283887f --> <!-- last-reviewed: 251d160e213b19a4fcc0cd8f8e3be9ea3283887f -->
# Predictor Agent # Predictor Agent
**Role**: Infrastructure pattern detection (the "goblin"). Runs a 3-step **Role**: Risk oracle and opportunity spotter (the "goblin"). Runs a 3-step
formula (preflight → collect-signals → analyze-and-predict) via interactive formula (preflight → collect-signals → analyze-and-predict) via interactive
tmux Claude session (sonnet). Collects disinto-specific signals: CI pipeline tmux Claude session (sonnet). Collects three categories of signals:
trends (Woodpecker), stale issues, agent health (tmux sessions + logs), and
resource patterns (RAM, disk, load, containers). Files up to 5 1. **Health signals** — CI pipeline trends (Woodpecker), stale issues, agent
`prediction/unreviewed` issues for the Planner to triage. The predictor MUST health (tmux sessions + logs), resource patterns (RAM, disk, load, containers)
NOT emit feature work — only observations about CI health, issue staleness, 2. **Outcome signals** — output freshness (formula journals/artifacts), capacity
agent status, and system conditions. utilization (idle agents vs dispatchable backlog), throughput (closed issues,
merged PRs, churn detection)
3. **External signals** — dependency security advisories, upstream breaking
changes, deprecation notices, ecosystem shifts (via targeted web search)
Files up to 5 `prediction/unreviewed` issues for the Planner to triage.
Predictions cover both "things going wrong" and "opportunities being missed".
The predictor MUST NOT emit feature work — only observations about health,
outcomes, and external risks/opportunities.
**Trigger**: `predictor-run.sh` runs daily at 06:00 UTC via cron (1h before **Trigger**: `predictor-run.sh` runs daily at 06:00 UTC via cron (1h before
the planner at 07:00). Guarded by PID lock (`/tmp/predictor-run.lock`) and the planner at 07:00). Guarded by PID lock (`/tmp/predictor-run.lock`) and
@ -31,6 +39,8 @@ memory check (skips if available RAM < 2000 MB).
**Lifecycle**: predictor-run.sh (daily 06:00 cron) → lock + memory guard → **Lifecycle**: predictor-run.sh (daily 06:00 cron) → lock + memory guard →
load formula + context → create tmux session → Claude collects signals load formula + context → create tmux session → Claude collects signals
(CI trends, stale issues, agent health, resources) → dedup against existing (health: CI trends, stale issues, agent health, resources; outcomes: output
open predictions → file `prediction/unreviewed` issues → `PHASE:done`. freshness, capacity utilization, throughput; external: dependency advisories,
ecosystem changes via web search) → dedup against existing open predictions →
file `prediction/unreviewed` issues → `PHASE:done`.
The planner's Phase 1 later triages these predictions. The planner's Phase 1 later triages these predictions.

View file

@ -57,10 +57,16 @@ build_prompt_footer
# shellcheck disable=SC2034 # consumed by run_formula_and_monitor # shellcheck disable=SC2034 # consumed by run_formula_and_monitor
PROMPT="You are the prediction agent (goblin) for ${CODEBERG_REPO}. Work through the formula below. You MUST write PHASE:done to '${PHASE_FILE}' when finished — the orchestrator will time you out if you return to the prompt without signalling. PROMPT="You are the prediction agent (goblin) for ${CODEBERG_REPO}. Work through the formula below. You MUST write PHASE:done to '${PHASE_FILE}' when finished — the orchestrator will time you out if you return to the prompt without signalling.
Your role: spot patterns in infrastructure signals and file them as prediction issues. Your role: spot patterns across three signal categories and file them as prediction issues:
1. Health signals — CI trends, agent status, resource pressure, stale issues
2. Outcome signals — output freshness, capacity utilization, throughput
3. External signals — dependency advisories, upstream changes, ecosystem shifts
The planner (adult) will triage every prediction before acting. The planner (adult) will triage every prediction before acting.
You MUST NOT emit feature work or implementation issues — only predictions You MUST NOT emit feature work or implementation issues — only predictions
about CI health, issue staleness, agent status, and system conditions. about health, outcomes, and external risks/opportunities.
Use WebSearch for external signal scanning — be targeted (project dependencies
and tools only, not general news). Limit to 5 web searches per run.
## Project context ## Project context
${CONTEXT_BLOCK} ${CONTEXT_BLOCK}