# formulas/run-predictor.toml — Predictor v3: abstract adversary # # Goal: find the project's biggest weakness. Explore when uncertain, # exploit when confident (dispatch a formula to prove the theory). # # Memory: previous predictions on the forge ARE the memory. # No separate memory file — the issue tracker is the source of truth. # # Executed by predictor/predictor-run.sh via cron — no action issues. # predictor-run.sh creates a tmux session with Claude (sonnet) and injects # this formula as context. Claude executes all steps autonomously. # # Steps: preflight → find-weakness-and-act name = "run-predictor" description = "Abstract adversary: find weaknesses, challenge the planner, generate evidence" version = 3 model = "sonnet" [context] files = ["AGENTS.md", "VISION.md"] # RESOURCES.md and prerequisites.md loaded from ops repo (ops: prefix) graph_report = "Structural analysis JSON from lib/build-graph.py — orphans, cycles, thin objectives, bottlenecks" [[steps]] id = "preflight" title = "Pull latest and gather history" description = """ Set up the working environment and load your prediction history. 1. Pull latest code: cd "$PROJECT_REPO_ROOT" git fetch origin "$PRIMARY_BRANCH" --quiet git checkout "$PRIMARY_BRANCH" --quiet git pull --ff-only origin "$PRIMARY_BRANCH" --quiet 2. Fetch ALL your previous predictions (open + recently closed): curl -sf -H "Authorization: token $FORGE_TOKEN" \ "$FORGE_API/issues?state=open&type=issues&labels=prediction%2Funreviewed&limit=50" curl -sf -H "Authorization: token $FORGE_TOKEN" \ "$FORGE_API/issues?state=closed&type=issues&labels=prediction%2Fdismissed&limit=50&sort=updated&direction=desc" curl -sf -H "Authorization: token $FORGE_TOKEN" \ "$FORGE_API/issues?state=closed&type=issues&labels=prediction%2Factioned&limit=50&sort=updated&direction=desc" For each prediction, note: - What you predicted (title + body) - What the planner decided (comments — look for triage reasoning) - Outcome: actioned (planner valued it), dismissed (planner rejected it), unreviewed (planner hasn't seen it yet) 3. Read the prerequisite tree: cat "$OPS_REPO_ROOT/prerequisites.md" 4. Count evidence per claim area: for dir in evidence/red-team evidence/holdout evidence/evolution evidence/user-test; do echo "=== $dir ===$(find "$OPS_REPO_ROOT/$dir" -name '*.json' 2>/dev/null | wc -l) files" find "$OPS_REPO_ROOT/$dir" -name '*.json' -printf '%T+ %p\n' 2>/dev/null | sort -r | head -3 done 5. Check current system state (lightweight — don't over-collect): free -m | head -2 df -h / | tail -1 tmux list-sessions 2>/dev/null || echo "no sessions" """ [[steps]] id = "find-weakness-and-act" title = "Find the biggest weakness and act on it" description = """ You are an adversary. Your job is to find what's wrong, weak, or untested in this project. Not to help — to challenge. ## Your track record Review your prediction history from the preflight step: - Which predictions did the planner action? Those are areas where your instincts were right. The planner values those signals. - Which were dismissed? You were wrong or the planner disagreed. Don't repeat the same theory without new evidence. If conditions have changed since the dismissal, you may re-file with stronger evidence. ## Finding weaknesses Look at EVERYTHING available to you: - The prerequisite tree — what does the planner claim is DONE? How much evidence backs that claim? A DONE item with 2 data points is weak. - Evidence directories — which are empty? Which are stale? - Portfolio — ## Addressables and ## Observables in AGENTS.md. Challenge addressables that have no observation path. Are there addressables the planner ignores? Observables with stale or missing evidence? - VISION.md — what does "launched" require? Is the project on track? - RESOURCES.md — what capabilities exist? What's missing? - Open issues — are things stuck? Bouncing? Starved? - Agent logs — is the factory healthy? - External world — are there CVEs, breaking changes, or ecosystem shifts affecting project dependencies? (Use web search — max 3 searches.) Don't scan everything every time. Use your history to focus: - If you've never looked at evidence gaps → explore there - If you found a crack last time → exploit it deeper - If the planner just marked something DONE → challenge it ## Acting You have up to 5 actions per run (predictions + dispatches combined). For each weakness you identify, choose one: **EXPLORE** — low confidence, need more information: File a prediction/unreviewed issue. The planner will triage it. Body format: --- **Theory:** **Confidence:** **Evidence checked:** **Suggested action:** **EXPLOIT** — high confidence, have a theory you can test: File a prediction/unreviewed issue AND an action issue that dispatches a formula to generate evidence. The prediction explains the theory. The action generates the proof. When the planner runs next, evidence is already there. Action issue body format (label: action): Dispatched by predictor to test theory in #. ## Task Run with focus on . ## Expected evidence Results in evidence//-.json ## Acceptance criteria - [ ] Formula ran to completion - [ ] Evidence file written with structured results ## Affected files - evidence// Available formulas (check $PROJECT_REPO_ROOT/formulas/*.toml for current list): cat "$PROJECT_REPO_ROOT/formulas/"*.toml | grep '^name' | head -10 **SKIP** — nothing worth acting on: Valid outcome. Not every run needs to produce a prediction. But if you skip, write a brief note to your scratch file about why. ## Filing (use tea CLI — labels by name, no ID lookup needed) tea is pre-configured with login "$TEA_LOGIN" and repo "$FORGE_REPO". 1. File predictions (labels by name, no ID lookup): tea issues create --login "$TEA_LOGIN" --repo "$FORGE_REPO" \ --title "" --body "<body>" --labels "prediction/unreviewed" 2. File action dispatches (if exploiting): tea issues create --login "$TEA_LOGIN" --repo "$FORGE_REPO" \ --title "action: test prediction #NNN — <formula> <focus>" \ --body "<body>" --labels "action" 3. Close superseded predictions: tea issues close <number> --login "$TEA_LOGIN" --repo "$FORGE_REPO" 4. Add a comment when closing (optional): tea comment create <number> --login "$TEA_LOGIN" --repo "$FORGE_REPO" \ --body "Superseded by #NNN" 5. Do NOT duplicate existing open predictions. If your theory matches an open prediction/unreviewed issue, skip it. ## Rules - Max 5 actions total (predictions + action dispatches combined) - Each exploit counts as 2 (prediction + action dispatch) - So: 5 explores, or 2 exploits + 1 explore, or 1 exploit + 3 explores - Never re-file a dismissed prediction without new evidence - Action issues must reference existing formulas — don't invent formulas - Be specific: name the file, the metric, the threshold, the formula - If no weaknesses found, file nothing — that's a strong signal the project is healthy After filing (or deciding to skip), write PHASE:done to the phase file: echo "PHASE:done" > "$PHASE_FILE" """ needs = ["preflight"]