disinto/formulas/review-pr.toml
Agent 045df63d07
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
fix: fix: review formula misses cross-cutting consequences and under-files tech-debt (#483)
2026-04-09 08:01:29 +00:00

252 lines
11 KiB
TOML

# formulas/review-pr.toml — PR review formula
#
# Defines the review agent's judgment: understand the change, assess quality,
# decide verdict, write structured output. The bash orchestrator
# (review/review-pr.sh) handles session lifecycle, metadata fetching,
# API posting, and cleanup.
#
# The orchestrator injects PR context (diff, metadata, previous review)
# alongside this formula. Claude follows the steps in a single session.
name = "review-pr"
description = "AI-powered PR review: understand change, assess quality, decide verdict"
version = 1
model = "sonnet"
[context]
files = ["AGENTS.md"]
[[steps]]
id = "review"
title = "Review the PR and write structured output"
description = """
You have full repo access — you are in a checkout of the PR branch.
Use this to verify claims, check existing code, and understand context
before flagging issues. Read files rather than guessing.
## 1. Understand the change
Read the diff and PR description injected by the orchestrator.
What is this PR doing and why? Identify the scope:
- Contracts, frontend, backend, docs, infra, formulas, mixed?
- New feature, bug fix, refactor, config change?
## 2. CI relevance
If CI has not passed, decide whether CI matters for this PR.
Non-code changes (docs, formulas, TOML config, markdown-only) do NOT
need CI — note this in your review rather than blocking. The orchestrator
already skips the CI gate for non-code PRs, but you should also mention
CI relevance in the review body when it applies.
## 3. Review checklist
Adapt based on scope. Check for:
- **Bugs & logic errors**: off-by-one, nil/null dereference, missing edge cases,
wrong return type, incorrect conditionals
- **Security**: command injection, unquoted bash variables, path traversal, XSS,
secret leakage in logs or comments
- **Imports & dependencies**: broken imports, undefined variables, missing deps
- **Architecture**: verify patterns match AGENTS.md and project conventions
- **Bash specifics** (for .sh files): ShellCheck compliance, set -euo pipefail,
proper quoting, error handling with || true where appropriate, no echo of secrets
- **Dead code**: unused variables, unreachable branches, leftover debug prints
- **Claim verification**: if docs/README/AGENTS.md changed, verify claims
against actual code — read the files to confirm
Do NOT flag:
- Style preferences with no correctness impact
- Missing tests unless the change is clearly untested and risky
- Things that look wrong but actually work — verify by reading the code first
- Files that were truncated from the diff (the orchestrator notes truncation)
## 3b. Architecture and documentation consistency
For each BEHAVIORAL change in the diff (not pure bug fixes or formatting):
1. Identify what behavior changed (e.g., scheduling mechanism, auth flow,
container lifecycle, secret handling)
2. Search AGENTS.md for claims about that behavior:
grep -n '<keyword>' AGENTS.md
Also check docs/ and any per-directory AGENTS.md files.
3. Search for Architecture Decision references (AD-001 through AD-006):
grep -n 'AD-0' AGENTS.md
Read each AD and check if the PR's changes contradict it.
4. If the PR changes behavior described in AGENTS.md or contradicts an AD
but does NOT update the documentation in the same PR:
REQUEST_CHANGES — require the documentation update in the same PR.
This check is SKIPPED for pure bug fixes where the intended behavior is
unchanged (the code was wrong, not the documentation).
## 4. Vault item quality (conditional)
If the PR adds or modifies vault item files (`vault/pending/*.md` in the ops repo), apply these
additional checks. These criteria apply ON TOP of the normal review —
a vault PR must also pass the standard checklist above.
### Required sections
Every vault item must contain all five of these sections with non-empty
content:
- **What** — what is needed
- **Why** — what this unblocks and why it matters now
- **Unblocks** — specific issue numbers (e.g. #123)
- **Human Action** — specific steps the human should take
- **Factory Will Then** — what happens after approval
If any section is missing or empty, REQUEST_CHANGES. Quote the missing
section names in your review.
### Human Action must be specific
The Human Action section must contain concrete, actionable steps — not
open-ended decisions or vague asks. The filing agent must have done
the research before escalating to the human.
PASS examples:
- "Go to github.com/organizations/new, create org named disinto"
- "Add SMTP_HOST=mail.example.com to .env.enc via sops"
- "Run: ssh root@prod apt install redis-server"
FAIL examples:
- "Decide the approach for #466"
- "Make a decision about branding"
- "Figure out which provider to use"
If the human action is vague or pushes decision-making to the human,
REQUEST_CHANGES. Explain that the filing agent must research and
propose a specific action.
### Dedup check
Check whether `$OPS_REPO_ROOT/vault/pending/`, `$OPS_REPO_ROOT/vault/approved/`, or `$OPS_REPO_ROOT/vault/fired/`
already contains a similar item (same resource, same ask). List the
vault directories in the ops repo to inspect existing items. If a duplicate or
near-duplicate exists, REQUEST_CHANGES and reference the existing item.
## 5. External action detection (token separation)
Agents must NEVER execute external actions directly. Any action that touches
an external system (publish, deploy, post, push to external registry, API
calls to third-party services) MUST go through vault dispatch i.e., the
agent files a vault item (`$OPS_REPO_ROOT/vault/pending/*.json`) and the runner
container executes it with injected secrets.
Scan the diff for these patterns:
- **Direct publish commands**: `clawhub publish`, `npm publish`,
`cargo publish`, `docker push`, `gh release create`, or similar
- **Direct deploy commands**: `ssh ... deploy`, `rsync` to external hosts,
`kubectl apply`, `helm install`, cloud CLI deploy commands
- **Direct external API calls**: `curl`/`wget` to external services with
auth tokens, posting to social media APIs, sending emails
- **Token usage**: Direct references to `GITHUB_TOKEN`, `CLAWHUB_TOKEN`,
or other vault-only secrets in agent code (outside of vault/ directory)
If ANY of these patterns appear in agent code (scripts in `dev/`, `action/`,
`planner/`, `gardener/`, `supervisor/`, `predictor/`, `review/`, `formulas/`,
`lib/`) WITHOUT routing through vault dispatch (file a vault PR on ops repo see #73-#77), **REQUEST_CHANGES**.
Explain that external actions must use vault dispatch per AD-006. The agent
should file a vault item instead of executing directly.
**Exceptions** (do NOT flag these):
- Code inside `vault/` the vault system itself is allowed to handle secrets
- References in comments or documentation explaining the architecture
- `bin/disinto` setup commands that manage `.env.vault.enc` and the `run` subcommand
- Local operations (git push to forge, forge API calls with `FORGE_TOKEN`)
## 6. Re-review (if previous review is provided)
If the orchestrator injected a previous review and incremental diff:
1. For each finding in the previous review, check if it was addressed
2. Report status: fixed / not_fixed / partial with explanation
3. Check for new issues introduced by the fix commits
4. Be fair if feedback was addressed well, acknowledge it
Focus on the incremental diff for finding-by-finding status, but use the
full diff to check overall correctness.
## 7. Follow-up issues (pre-existing tech debt)
If you discover pre-existing issues (NOT introduced by this PR), create
tech-debt issues via API so they are tracked separately:
# Look up tech-debt label ID (create if missing):
TECH_DEBT_ID=$(curl -sf -H "Authorization: token $FORGE_TOKEN" \
"$FORGE_API/labels" | jq -r '.[] | select(.name=="tech-debt") | .id')
if [ -z "$TECH_DEBT_ID" ]; then
TECH_DEBT_ID=$(curl -sf -X POST \
-H "Authorization: token $FORGE_TOKEN" \
-H "Content-Type: application/json" \
"$FORGE_API/labels" \
-d '{"name":"tech-debt","color":"#6B7280","description":"Pre-existing tech debt flagged by AI review"}' | jq -r '.id')
fi
# Check for duplicate before creating:
EXISTING=$(curl -sf -H "Authorization: token $FORGE_TOKEN" \
"$FORGE_API/issues?state=open&labels=tech-debt&limit=50" | \
jq --arg t "TITLE" '[.[] | select(.title == $t)] | length')
# Create only if no duplicate:
curl -sf -X POST -H "Authorization: token $FORGE_TOKEN" \
-H "Content-Type: application/json" "$FORGE_API/issues" \
-d '{"title":"...","body":"Flagged by AI reviewer in PR #NNN.\n\n## Problem\n...\n\n---\n*Auto-created from AI review*","labels":[TECH_DEBT_ID]}'
File a tech-debt issue for every finding rated **medium** or higher that
is pre-existing (not introduced by this PR). Also file for **low** findings
that represent correctness risks (dead code that masks bugs, misleading
documentation, unguarded variables under set -u).
Do NOT file for: style preferences, naming opinions, missing comments,
or speculative improvements with no concrete failure mode.
When in doubt, file. A closed-as-wontfix tech-debt issue costs nothing;
an unfiled bug costs a future debugging session.
## 8. Verdict
Choose one:
- **APPROVE**: Change is correct, complete, and follows conventions
- **REQUEST_CHANGES**: Has real issues that must be fixed before merge
- **DISCUSS**: Unclear intent, design question, or needs human conversation
Bias toward APPROVE for small, correct changes. Use REQUEST_CHANGES only
for actual problems (bugs, security issues, broken functionality, missing
required behavior). Use DISCUSS sparingly.
Note: The bias toward APPROVE applies to code correctness and style decisions.
It does NOT apply to documentation consistency (step 3b) or tech-debt filing
(step 7) those are separate concerns that should be handled regardless of
the change's correctness.
## 9. Output
Write a single JSON object to the file path from REVIEW_OUTPUT_FILE.
Use jq to ensure proper JSON escaping of the markdown content:
jq -n \
--arg verdict "APPROVE" \
--arg verdict_reason "one-line explanation" \
--arg review_markdown "### Section\\n- **severity** \\`location\\`: description" \
'{verdict: $verdict, verdict_reason: $verdict_reason, review_markdown: $review_markdown}' \
> "$REVIEW_OUTPUT_FILE"
The review_markdown field must contain the complete, formatted review in
markdown. Use ### headings for sections. Use this format for findings:
- **severity** `file:line`: description
For a re-review, structure the markdown as:
### Previous Findings
- finding summary FIXED / NOT FIXED / PARTIAL: explanation
### New Issues (if any)
- **severity** `location`: description
After writing the JSON file, signal completion:
echo "PHASE:done" > "$PHASE_FILE"
Then STOP and wait. The orchestrator will post your review to the forge.
"""