Root cause: review formula had no infrastructure-file-specific checklist and no scope discipline check. The reviewer treated a docker-compose.yml rewrite the same as any code change, and lessons-learned biased toward approval. Changes: - Add step 3c (infrastructure file review) to formulas/review-pr.toml: compose-specific checklist for volumes, bind mounts, env vars, restart policy, security options - Add step 3d (scope discipline) to formulas/review-pr.toml: compare actual diff size against issue scope, block on infra-file scope violations - Add investigation writeup in docs/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
526928dca8
commit
527731da53
2 changed files with 181 additions and 0 deletions
123
docs/investigation-685-reviewer-approved-destructive-compose.md
Normal file
123
docs/investigation-685-reviewer-approved-destructive-compose.md
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
# Investigation: Reviewer approved destructive compose rewrite in PR #683
|
||||
|
||||
**Issue**: #685
|
||||
**Date**: 2026-04-11
|
||||
**PR under investigation**: #683 (fix: config: gardener=1h, architect=9m, planner=11m)
|
||||
|
||||
## Summary
|
||||
|
||||
The reviewer agent approved PR #683 in ~1 minute without flagging that it
|
||||
contained a destructive rewrite of `docker-compose.yml` — dropping named
|
||||
volumes, bind mounts, env vars, restart policy, and security options. Five
|
||||
structural gaps in the review pipeline allowed this to pass.
|
||||
|
||||
## Root causes
|
||||
|
||||
### 1. No infrastructure-file-specific review checklist
|
||||
|
||||
The review formula (`formulas/review-pr.toml`) has a generic review checklist
|
||||
(bugs, security, imports, architecture, bash specifics, dead code). It has
|
||||
**no special handling for infrastructure files** — `docker-compose.yml`,
|
||||
`Dockerfile`, CI configs, or `entrypoint.sh` are reviewed with the same
|
||||
checklist as application code.
|
||||
|
||||
Infrastructure files have a different failure mode: a single dropped line
|
||||
(a volume mount, an env var, a restart policy) can break a running deployment
|
||||
without any syntax error or linting failure. The generic checklist doesn't
|
||||
prompt the reviewer to check for these regressions.
|
||||
|
||||
**Fix applied**: Added step 3c "Infrastructure file review" to
|
||||
`formulas/review-pr.toml` with a compose-specific checklist covering named
|
||||
volumes, bind mounts, env vars, restart policy, and security options.
|
||||
|
||||
### 2. No scope discipline
|
||||
|
||||
Issue #682 asked for ~3 env var changes + `PLANNER_INTERVAL` plumbing — roughly
|
||||
10-15 lines across 3-4 files. PR #683's diff rewrote the entire compose service
|
||||
block (~50+ lines changed in `docker-compose.yml` alone).
|
||||
|
||||
The review formula **does not instruct the reviewer to compare diff size against
|
||||
issue scope**. A scope-aware reviewer would flag: "this PR changes more lines
|
||||
than the issue scope warrants — request justification for out-of-scope changes."
|
||||
|
||||
**Fix applied**: Added step 3d "Scope discipline" to `formulas/review-pr.toml`
|
||||
requiring the reviewer to compare actual changes against stated issue scope and
|
||||
flag out-of-scope modifications to infrastructure files.
|
||||
|
||||
### 3. Lessons-learned bias toward approval
|
||||
|
||||
The reviewer's `.profile/knowledge/lessons-learned.md` contains multiple entries
|
||||
that systematically bias toward approval:
|
||||
|
||||
- "Approval means 'ready to ship,' not 'perfect.'"
|
||||
- "'Different from how I'd write it' is not a blocker."
|
||||
- "Reserve request_changes for genuinely blocking concerns."
|
||||
|
||||
These lessons are well-intentioned (they prevent nit-picking and false blocks)
|
||||
but they create a blind spot: the reviewer suppresses its instinct to flag
|
||||
suspicious-looking changes because the lessons tell it not to block on
|
||||
"taste-based" concerns. A compose service block rewrite *looks* like a style
|
||||
preference ("the dev reorganized the file") but is actually a correctness
|
||||
regression.
|
||||
|
||||
**Recommendation**: The lessons-learned are not wrong — they should stay. But
|
||||
the review formula now explicitly carves out infrastructure files from the
|
||||
"bias toward APPROVE" guidance, making it clear that dropped infra
|
||||
configuration is a blocking concern, not a style preference.
|
||||
|
||||
### 4. No ground-truth for infrastructure files
|
||||
|
||||
The reviewer only sees the diff. It has no way to compare against the running
|
||||
container's actual volume/env config. When dev-qwen rewrote a 30-line service
|
||||
block from scratch, the reviewer saw a 30-line addition and a 30-line deletion
|
||||
with no reference point.
|
||||
|
||||
**Recommendation (future work)**: Maintain a `docker/expected-compose-config.yml`
|
||||
or have the reviewer fetch `docker compose config` output as ground truth when
|
||||
reviewing compose changes. This would let the reviewer diff the proposed config
|
||||
against the known-good config.
|
||||
|
||||
### 5. Structural analysis blind spot
|
||||
|
||||
`lib/build-graph.py` tracks changes to files in `formulas/`, agent directories
|
||||
(`dev/`, `review/`, etc.), and `evidence/`. It does **not track infrastructure
|
||||
files** (`docker-compose.yml`, `docker/`, `.woodpecker/`). Changes to these
|
||||
files produce no alerts in the graph report — the reviewer gets no
|
||||
"affected objectives" signal for infrastructure changes.
|
||||
|
||||
**Recommendation (future work)**: Add infrastructure file tracking to
|
||||
`build-graph.py` so that compose/Dockerfile/CI changes surface in the
|
||||
structural analysis.
|
||||
|
||||
### 6. Model and time budget
|
||||
|
||||
Reviews use Sonnet (`CLAUDE_MODEL="sonnet"` at `review-pr.sh:229`) with a
|
||||
15-minute timeout. The PR #683 review completed in ~1 minute. Sonnet is
|
||||
optimized for speed, which is appropriate for most code reviews, but
|
||||
infrastructure changes benefit from the deeper reasoning of a more capable
|
||||
model.
|
||||
|
||||
**Recommendation (future work)**: Consider escalating to a more capable model
|
||||
when the diff includes infrastructure files (compose, Dockerfiles, CI configs).
|
||||
|
||||
## Changes made
|
||||
|
||||
1. **`formulas/review-pr.toml`** — Added two new review steps:
|
||||
- **Step 3c: Infrastructure file review** — When the diff touches
|
||||
`docker-compose.yml`, `Dockerfile*`, `.woodpecker/`, or `docker/`,
|
||||
requires checking for dropped volumes, bind mounts, env vars, restart
|
||||
policy, security options, and network config. Instructs the reviewer to
|
||||
read the full file (not just the diff) and compare against the base branch.
|
||||
- **Step 3d: Scope discipline** — Requires comparing the actual diff
|
||||
footprint against the stated issue scope. Flags out-of-scope rewrites of
|
||||
infrastructure files as blocking concerns.
|
||||
|
||||
## What would have caught this
|
||||
|
||||
With the changes above, the reviewer would have:
|
||||
|
||||
1. Seen step 3c trigger for `docker-compose.yml` changes
|
||||
2. Read the full compose file and compared against the base branch
|
||||
3. Noticed the dropped named volumes, bind mounts, env vars, restart policy
|
||||
4. Seen step 3d flag that a 3-env-var issue produced a 50+ line compose rewrite
|
||||
5. Issued REQUEST_CHANGES citing specific dropped configuration
|
||||
|
|
@ -80,6 +80,64 @@ For each BEHAVIORAL change in the diff (not pure bug fixes or formatting):
|
|||
This check is SKIPPED for pure bug fixes where the intended behavior is
|
||||
unchanged (the code was wrong, not the documentation).
|
||||
|
||||
## 3c. Infrastructure file review (conditional)
|
||||
|
||||
If the diff touches ANY of these files, apply this additional checklist:
|
||||
- `docker-compose.yml` or `docker-compose.*.yml`
|
||||
- `Dockerfile` or `docker/*`
|
||||
- `.woodpecker/` CI configs
|
||||
- `docker/agents/entrypoint.sh`
|
||||
|
||||
Infrastructure files have a different failure mode from application code:
|
||||
a single dropped line (a volume mount, an env var, a restart policy) can
|
||||
break a running deployment with no syntax error. Treat dropped
|
||||
infrastructure configuration as a **blocking defect**, not a style choice.
|
||||
|
||||
### For docker-compose.yml changes:
|
||||
|
||||
1. **Read the full file** in the PR branch — do not rely only on the diff.
|
||||
2. Run `git diff <base>..HEAD -- docker-compose.yml` to see the complete
|
||||
change, not just the truncated diff.
|
||||
3. Check that NONE of the following were dropped without explicit
|
||||
justification in the PR description:
|
||||
- Named volumes (e.g. `agent-data`, `project-repos`)
|
||||
- Bind mounts (especially for config, secrets, SSH keys, shared dirs)
|
||||
- Environment variables (compare the full `environment:` block against
|
||||
the base branch)
|
||||
- `restart:` policy (should be `unless-stopped` for production services)
|
||||
- `security_opt:` settings
|
||||
- Network configuration
|
||||
- Resource limits / deploy constraints
|
||||
4. If ANY production configuration was dropped and the PR description does
|
||||
not explain why, **REQUEST_CHANGES**. List each dropped item explicitly.
|
||||
|
||||
### For Dockerfile / entrypoint changes:
|
||||
|
||||
1. Check that base image, installed packages, and runtime deps are preserved.
|
||||
2. Verify that entrypoint/CMD changes don't break the container startup.
|
||||
|
||||
### For CI config changes:
|
||||
|
||||
1. Check that pipeline steps aren't silently removed.
|
||||
2. Verify that secret references still match available secrets.
|
||||
|
||||
## 3d. Scope discipline
|
||||
|
||||
Compare the actual diff footprint against the stated issue scope:
|
||||
|
||||
1. Read the PR title and description to identify what the issue asked for.
|
||||
2. Estimate the expected diff size (e.g., "add 3 env vars" = ~5-10 lines
|
||||
in compose + ~5 lines in scripts).
|
||||
3. If the actual diff in ANY single file exceeds 3x the expected scope,
|
||||
flag it: "this file changed N lines but the issue scope suggests ~M."
|
||||
|
||||
For infrastructure files (compose, Dockerfiles, CI), scope violations are
|
||||
**blocking**: REQUEST_CHANGES and ask the author to split out-of-scope
|
||||
changes into a separate PR or justify them in the description.
|
||||
|
||||
For non-infrastructure files, scope violations are advisory: leave a
|
||||
non-blocking COMMENT noting the scope creep.
|
||||
|
||||
## 4. Vault item quality (conditional)
|
||||
|
||||
If the PR adds or modifies vault item files (`vault/pending/*.md` in the ops repo), apply these
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue