investigation: reviewer agent approved destructive compose rewrite in PR #683 — why? #685
Labels
No labels
action
backlog
blocked
bug-report
cannot-reproduce
in-progress
in-triage
needs-triage
prediction/actioned
prediction/dismissed
prediction/unreviewed
priority
rejected
reproduced
tech-debt
underspecified
vision
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: disinto-admin/disinto#685
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Description
Investigation: PR #683 was reviewed and APPROVED by the reviewer agent (Claude in
disinto-agents) at 2026-04-11 17:59:57 UTC, despite containing a destructive rewrite ofdocker-compose.ymlthat would break the production deployment if anyone randocker compose up -d(see follow-up issue for the compose-file regression). The reviewer should have caught this. Why didn't it?Reviewer outcome to explain
From
/home/agent/data/logs/review/review.log:The reviewer Claude session ran for ~1 minute and approved without flagging that the compose-file changes:
restart: unless-stoppedandsecurity_opt: apparmor=unconfinedagent-data,project-repos) with bind mounts to nonexistent pathsCLAUDE_SHARED_DIR,.claude.json,.ssh, sops age key, woodpecker-data, the versioned claude binary)FORGE_PASS,FORGE_PASS_LLAMA,FORGE_TOKEN_LLAMA,WOODPECKER_REPO_ID,FORGE_BOT_USERNAMESDISINTO_AGENTS=review,gardenerenv var the entrypoint doesn't readEach of these is a regression that a careful reviewer with context about the running deployment would have flagged.
Things to investigate
1. Did the reviewer see the full diff?
/tmp/project-review-graph-681.json-style artifacts for PR #683 — does the graph report include the docker-compose.yml hunks, or did it summarize/elide them?review/review-pr.shfor how the diff is fetched and presented to Claude. Is there a size cap or truncation? Does it pass the full unified diff or a summary?2. Did the reviewer have ground-truth context?
docker inspectfor truth.docker inspectfor the affected service when reviewing changes todocker-compose.yml, or consult anops/expected-config.jsonfile as ground truth.3. What's in the reviewer's prompt / lessons-learned?
dev-bot/.profile/lessons-learned.mdis loaded each run (we sawloaded lessons-learned.md (2048 bytes)in the log). What's in there?4. Did the reviewer apply scope discipline?
PLANNER_INTERVALplumbing. That's roughly: ~3 lines in compose, ~1 line in entrypoint.sh, ~1 line in lib/generators.sh, ~5 lines of doc. Total well under 20 lines.review/review-pr.sh(or its prompt) compare diff size against issue scope? If not, that's a structural gap.5. Was the reviewer rushing?
agent_runlog for #683 review session — how many turns, what tools used, what context window utilization?Suggested investigation steps
review/review-pr.shto understand what gets passed to Claude (full diff vs summary, any caps)/home/agent/data/.profile/dev-bot/lessons-learned.mdto see existing reviewer guidance~/.claude/projects/<hash>/<sid>.jsonl) and read what the reviewer actually said and what tools it usedreview/review.logfor the full session summary including turn count and tool callsOutput
A short writeup of root cause + at least one concrete improvement to either:
review-pr.shdocker compose configagainst the running container'sdocker inspectand fails on incompatible changes)Why this matters
The factory's whole self-development premise depends on reviewers catching dev-agent mistakes. Today's incident demonstrates that the reviewer has at least one major class of blind spot — out-of-scope rewrites of infrastructure files. This isn't a one-off; the next time dev-qwen "improves" the compose file or a Dockerfile or a CI config, it'll happen again unless the reviewer learns to defend against it.