2 changed files with 181 additions and 0 deletions
123
docs/investigation-685-reviewer-approved-destructive-compose.md
Normal file
123
docs/investigation-685-reviewer-approved-destructive-compose.md
Normal file
|
|
@ -0,0 +1,123 @@
|
||||||
|
# Investigation: Reviewer approved destructive compose rewrite in PR #683
|
||||||
|
|
||||||
|
**Issue**: #685
|
||||||
|
**Date**: 2026-04-11
|
||||||
|
**PR under investigation**: #683 (fix: config: gardener=1h, architect=9m, planner=11m)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The reviewer agent approved PR #683 in ~1 minute without flagging that it
|
||||||
|
contained a destructive rewrite of `docker-compose.yml` — dropping named
|
||||||
|
volumes, bind mounts, env vars, restart policy, and security options. Five
|
||||||
|
structural gaps in the review pipeline allowed this to pass.
|
||||||
|
|
||||||
|
## Root causes
|
||||||
|
|
||||||
|
### 1. No infrastructure-file-specific review checklist
|
||||||
|
|
||||||
|
The review formula (`formulas/review-pr.toml`) has a generic review checklist
|
||||||
|
(bugs, security, imports, architecture, bash specifics, dead code). It has
|
||||||
|
**no special handling for infrastructure files** — `docker-compose.yml`,
|
||||||
|
`Dockerfile`, CI configs, or `entrypoint.sh` are reviewed with the same
|
||||||
|
checklist as application code.
|
||||||
|
|
||||||
|
Infrastructure files have a different failure mode: a single dropped line
|
||||||
|
(a volume mount, an env var, a restart policy) can break a running deployment
|
||||||
|
without any syntax error or linting failure. The generic checklist doesn't
|
||||||
|
prompt the reviewer to check for these regressions.
|
||||||
|
|
||||||
|
**Fix applied**: Added step 3c "Infrastructure file review" to
|
||||||
|
`formulas/review-pr.toml` with a compose-specific checklist covering named
|
||||||
|
volumes, bind mounts, env vars, restart policy, and security options.
|
||||||
|
|
||||||
|
### 2. No scope discipline
|
||||||
|
|
||||||
|
Issue #682 asked for ~3 env var changes + `PLANNER_INTERVAL` plumbing — roughly
|
||||||
|
10-15 lines across 3-4 files. PR #683's diff rewrote the entire compose service
|
||||||
|
block (~50+ lines changed in `docker-compose.yml` alone).
|
||||||
|
|
||||||
|
The review formula **does not instruct the reviewer to compare diff size against
|
||||||
|
issue scope**. A scope-aware reviewer would flag: "this PR changes more lines
|
||||||
|
than the issue scope warrants — request justification for out-of-scope changes."
|
||||||
|
|
||||||
|
**Fix applied**: Added step 3d "Scope discipline" to `formulas/review-pr.toml`
|
||||||
|
requiring the reviewer to compare actual changes against stated issue scope and
|
||||||
|
flag out-of-scope modifications to infrastructure files.
|
||||||
|
|
||||||
|
### 3. Lessons-learned bias toward approval
|
||||||
|
|
||||||
|
The reviewer's `.profile/knowledge/lessons-learned.md` contains multiple entries
|
||||||
|
that systematically bias toward approval:
|
||||||
|
|
||||||
|
- "Approval means 'ready to ship,' not 'perfect.'"
|
||||||
|
- "'Different from how I'd write it' is not a blocker."
|
||||||
|
- "Reserve request_changes for genuinely blocking concerns."
|
||||||
|
|
||||||
|
These lessons are well-intentioned (they prevent nit-picking and false blocks)
|
||||||
|
but they create a blind spot: the reviewer suppresses its instinct to flag
|
||||||
|
suspicious-looking changes because the lessons tell it not to block on
|
||||||
|
"taste-based" concerns. A compose service block rewrite *looks* like a style
|
||||||
|
preference ("the dev reorganized the file") but is actually a correctness
|
||||||
|
regression.
|
||||||
|
|
||||||
|
**Recommendation**: The lessons-learned are not wrong — they should stay. But
|
||||||
|
the review formula now explicitly carves out infrastructure files from the
|
||||||
|
"bias toward APPROVE" guidance, making it clear that dropped infra
|
||||||
|
configuration is a blocking concern, not a style preference.
|
||||||
|
|
||||||
|
### 4. No ground-truth for infrastructure files
|
||||||
|
|
||||||
|
The reviewer only sees the diff. It has no way to compare against the running
|
||||||
|
container's actual volume/env config. When dev-qwen rewrote a 30-line service
|
||||||
|
block from scratch, the reviewer saw a 30-line addition and a 30-line deletion
|
||||||
|
with no reference point.
|
||||||
|
|
||||||
|
**Recommendation (future work)**: Maintain a `docker/expected-compose-config.yml`
|
||||||
|
or have the reviewer fetch `docker compose config` output as ground truth when
|
||||||
|
reviewing compose changes. This would let the reviewer diff the proposed config
|
||||||
|
against the known-good config.
|
||||||
|
|
||||||
|
### 5. Structural analysis blind spot
|
||||||
|
|
||||||
|
`lib/build-graph.py` tracks changes to files in `formulas/`, agent directories
|
||||||
|
(`dev/`, `review/`, etc.), and `evidence/`. It does **not track infrastructure
|
||||||
|
files** (`docker-compose.yml`, `docker/`, `.woodpecker/`). Changes to these
|
||||||
|
files produce no alerts in the graph report — the reviewer gets no
|
||||||
|
"affected objectives" signal for infrastructure changes.
|
||||||
|
|
||||||
|
**Recommendation (future work)**: Add infrastructure file tracking to
|
||||||
|
`build-graph.py` so that compose/Dockerfile/CI changes surface in the
|
||||||
|
structural analysis.
|
||||||
|
|
||||||
|
### 6. Model and time budget
|
||||||
|
|
||||||
|
Reviews use Sonnet (`CLAUDE_MODEL="sonnet"` at `review-pr.sh:229`) with a
|
||||||
|
15-minute timeout. The PR #683 review completed in ~1 minute. Sonnet is
|
||||||
|
optimized for speed, which is appropriate for most code reviews, but
|
||||||
|
infrastructure changes benefit from the deeper reasoning of a more capable
|
||||||
|
model.
|
||||||
|
|
||||||
|
**Recommendation (future work)**: Consider escalating to a more capable model
|
||||||
|
when the diff includes infrastructure files (compose, Dockerfiles, CI configs).
|
||||||
|
|
||||||
|
## Changes made
|
||||||
|
|
||||||
|
1. **`formulas/review-pr.toml`** — Added two new review steps:
|
||||||
|
- **Step 3c: Infrastructure file review** — When the diff touches
|
||||||
|
`docker-compose.yml`, `Dockerfile*`, `.woodpecker/`, or `docker/`,
|
||||||
|
requires checking for dropped volumes, bind mounts, env vars, restart
|
||||||
|
policy, security options, and network config. Instructs the reviewer to
|
||||||
|
read the full file (not just the diff) and compare against the base branch.
|
||||||
|
- **Step 3d: Scope discipline** — Requires comparing the actual diff
|
||||||
|
footprint against the stated issue scope. Flags out-of-scope rewrites of
|
||||||
|
infrastructure files as blocking concerns.
|
||||||
|
|
||||||
|
## What would have caught this
|
||||||
|
|
||||||
|
With the changes above, the reviewer would have:
|
||||||
|
|
||||||
|
1. Seen step 3c trigger for `docker-compose.yml` changes
|
||||||
|
2. Read the full compose file and compared against the base branch
|
||||||
|
3. Noticed the dropped named volumes, bind mounts, env vars, restart policy
|
||||||
|
4. Seen step 3d flag that a 3-env-var issue produced a 50+ line compose rewrite
|
||||||
|
5. Issued REQUEST_CHANGES citing specific dropped configuration
|
||||||
|
|
@ -80,6 +80,64 @@ For each BEHAVIORAL change in the diff (not pure bug fixes or formatting):
|
||||||
This check is SKIPPED for pure bug fixes where the intended behavior is
|
This check is SKIPPED for pure bug fixes where the intended behavior is
|
||||||
unchanged (the code was wrong, not the documentation).
|
unchanged (the code was wrong, not the documentation).
|
||||||
|
|
||||||
|
## 3c. Infrastructure file review (conditional)
|
||||||
|
|
||||||
|
If the diff touches ANY of these files, apply this additional checklist:
|
||||||
|
- `docker-compose.yml` or `docker-compose.*.yml`
|
||||||
|
- `Dockerfile` or `docker/*`
|
||||||
|
- `.woodpecker/` CI configs
|
||||||
|
- `docker/agents/entrypoint.sh`
|
||||||
|
|
||||||
|
Infrastructure files have a different failure mode from application code:
|
||||||
|
a single dropped line (a volume mount, an env var, a restart policy) can
|
||||||
|
break a running deployment with no syntax error. Treat dropped
|
||||||
|
infrastructure configuration as a **blocking defect**, not a style choice.
|
||||||
|
|
||||||
|
### For docker-compose.yml changes:
|
||||||
|
|
||||||
|
1. **Read the full file** in the PR branch — do not rely only on the diff.
|
||||||
|
2. Run `git diff <base>..HEAD -- docker-compose.yml` to see the complete
|
||||||
|
change, not just the truncated diff.
|
||||||
|
3. Check that NONE of the following were dropped without explicit
|
||||||
|
justification in the PR description:
|
||||||
|
- Named volumes (e.g. `agent-data`, `project-repos`)
|
||||||
|
- Bind mounts (especially for config, secrets, SSH keys, shared dirs)
|
||||||
|
- Environment variables (compare the full `environment:` block against
|
||||||
|
the base branch)
|
||||||
|
- `restart:` policy (should be `unless-stopped` for production services)
|
||||||
|
- `security_opt:` settings
|
||||||
|
- Network configuration
|
||||||
|
- Resource limits / deploy constraints
|
||||||
|
4. If ANY production configuration was dropped and the PR description does
|
||||||
|
not explain why, **REQUEST_CHANGES**. List each dropped item explicitly.
|
||||||
|
|
||||||
|
### For Dockerfile / entrypoint changes:
|
||||||
|
|
||||||
|
1. Check that base image, installed packages, and runtime deps are preserved.
|
||||||
|
2. Verify that entrypoint/CMD changes don't break the container startup.
|
||||||
|
|
||||||
|
### For CI config changes:
|
||||||
|
|
||||||
|
1. Check that pipeline steps aren't silently removed.
|
||||||
|
2. Verify that secret references still match available secrets.
|
||||||
|
|
||||||
|
## 3d. Scope discipline
|
||||||
|
|
||||||
|
Compare the actual diff footprint against the stated issue scope:
|
||||||
|
|
||||||
|
1. Read the PR title and description to identify what the issue asked for.
|
||||||
|
2. Estimate the expected diff size (e.g., "add 3 env vars" = ~5-10 lines
|
||||||
|
in compose + ~5 lines in scripts).
|
||||||
|
3. If the actual diff in ANY single file exceeds 3x the expected scope,
|
||||||
|
flag it: "this file changed N lines but the issue scope suggests ~M."
|
||||||
|
|
||||||
|
For infrastructure files (compose, Dockerfiles, CI), scope violations are
|
||||||
|
**blocking**: REQUEST_CHANGES and ask the author to split out-of-scope
|
||||||
|
changes into a separate PR or justify them in the description.
|
||||||
|
|
||||||
|
For non-infrastructure files, scope violations are advisory: leave a
|
||||||
|
non-blocking COMMENT noting the scope creep.
|
||||||
|
|
||||||
## 4. Vault item quality (conditional)
|
## 4. Vault item quality (conditional)
|
||||||
|
|
||||||
If the PR adds or modifies vault item files (`vault/pending/*.md` in the ops repo), apply these
|
If the PR adds or modifies vault item files (`vault/pending/*.md` in the ops repo), apply these
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue