diff --git a/docs/investigation-685-reviewer-approved-destructive-compose.md b/docs/investigation-685-reviewer-approved-destructive-compose.md new file mode 100644 index 0000000..d8f9d2f --- /dev/null +++ b/docs/investigation-685-reviewer-approved-destructive-compose.md @@ -0,0 +1,123 @@ +# Investigation: Reviewer approved destructive compose rewrite in PR #683 + +**Issue**: #685 +**Date**: 2026-04-11 +**PR under investigation**: #683 (fix: config: gardener=1h, architect=9m, planner=11m) + +## Summary + +The reviewer agent approved PR #683 in ~1 minute without flagging that it +contained a destructive rewrite of `docker-compose.yml` — dropping named +volumes, bind mounts, env vars, restart policy, and security options. Six +structural gaps in the review pipeline allowed this to pass. + +## Root causes + +### 1. No infrastructure-file-specific review checklist + +The review formula (`formulas/review-pr.toml`) has a generic review checklist +(bugs, security, imports, architecture, bash specifics, dead code). It has +**no special handling for infrastructure files** — `docker-compose.yml`, +`Dockerfile`, CI configs, or `entrypoint.sh` are reviewed with the same +checklist as application code. + +Infrastructure files have a different failure mode: a single dropped line +(a volume mount, an env var, a restart policy) can break a running deployment +without any syntax error or linting failure. The generic checklist doesn't +prompt the reviewer to check for these regressions. + +**Fix applied**: Added step 3c "Infrastructure file review" to +`formulas/review-pr.toml` with a compose-specific checklist covering named +volumes, bind mounts, env vars, restart policy, and security options. + +### 2. No scope discipline + +Issue #682 asked for ~3 env var changes + `PLANNER_INTERVAL` plumbing — roughly +10-15 lines across 3-4 files. PR #683's diff rewrote the entire compose service +block (~50+ lines changed in `docker-compose.yml` alone). + +The review formula **does not instruct the reviewer to compare diff size against +issue scope**. A scope-aware reviewer would flag: "this PR changes more lines +than the issue scope warrants — request justification for out-of-scope changes." + +**Fix applied**: Added step 3d "Scope discipline" to `formulas/review-pr.toml` +requiring the reviewer to compare actual changes against stated issue scope and +flag out-of-scope modifications to infrastructure files. + +### 3. Lessons-learned bias toward approval + +The reviewer's `.profile/knowledge/lessons-learned.md` contains multiple entries +that systematically bias toward approval: + +- "Approval means 'ready to ship,' not 'perfect.'" +- "'Different from how I'd write it' is not a blocker." +- "Reserve request_changes for genuinely blocking concerns." + +These lessons are well-intentioned (they prevent nit-picking and false blocks) +but they create a blind spot: the reviewer suppresses its instinct to flag +suspicious-looking changes because the lessons tell it not to block on +"taste-based" concerns. A compose service block rewrite *looks* like a style +preference ("the dev reorganized the file") but is actually a correctness +regression. + +**Recommendation**: The lessons-learned are not wrong — they should stay. But +the review formula now explicitly carves out infrastructure files from the +"bias toward APPROVE" guidance, making it clear that dropped infra +configuration is a blocking concern, not a style preference. + +### 4. No ground-truth for infrastructure files + +The reviewer only sees the diff. It has no way to compare against the running +container's actual volume/env config. When dev-qwen rewrote a 30-line service +block from scratch, the reviewer saw a 30-line addition and a 30-line deletion +with no reference point. + +**Recommendation (future work)**: Maintain a `docker/expected-compose-config.yml` +or have the reviewer fetch `docker compose config` output as ground truth when +reviewing compose changes. This would let the reviewer diff the proposed config +against the known-good config. + +### 5. Structural analysis blind spot + +`lib/build-graph.py` tracks changes to files in `formulas/`, agent directories +(`dev/`, `review/`, etc.), and `evidence/`. It does **not track infrastructure +files** (`docker-compose.yml`, `docker/`, `.woodpecker/`). Changes to these +files produce no alerts in the graph report — the reviewer gets no +"affected objectives" signal for infrastructure changes. + +**Recommendation (future work)**: Add infrastructure file tracking to +`build-graph.py` so that compose/Dockerfile/CI changes surface in the +structural analysis. + +### 6. Model and time budget + +Reviews use Sonnet (`CLAUDE_MODEL="sonnet"` at `review-pr.sh:229`) with a +15-minute timeout. The PR #683 review completed in ~1 minute. Sonnet is +optimized for speed, which is appropriate for most code reviews, but +infrastructure changes benefit from the deeper reasoning of a more capable +model. + +**Recommendation (future work)**: Consider escalating to a more capable model +when the diff includes infrastructure files (compose, Dockerfiles, CI configs). + +## Changes made + +1. **`formulas/review-pr.toml`** — Added two new review steps: + - **Step 3c: Infrastructure file review** — When the diff touches + `docker-compose.yml`, `Dockerfile*`, `.woodpecker/`, or `docker/`, + requires checking for dropped volumes, bind mounts, env vars, restart + policy, security options, and network config. Instructs the reviewer to + read the full file (not just the diff) and compare against the base branch. + - **Step 3d: Scope discipline** — Requires comparing the actual diff + footprint against the stated issue scope. Flags out-of-scope rewrites of + infrastructure files as blocking concerns. + +## What would have caught this + +With the changes above, the reviewer would have: + +1. Seen step 3c trigger for `docker-compose.yml` changes +2. Read the full compose file and compared against the base branch +3. Noticed the dropped named volumes, bind mounts, env vars, restart policy +4. Seen step 3d flag that a 3-env-var issue produced a 50+ line compose rewrite +5. Issued REQUEST_CHANGES citing specific dropped configuration diff --git a/formulas/review-pr.toml b/formulas/review-pr.toml index e522552..fe62a89 100644 --- a/formulas/review-pr.toml +++ b/formulas/review-pr.toml @@ -80,6 +80,64 @@ For each BEHAVIORAL change in the diff (not pure bug fixes or formatting): This check is SKIPPED for pure bug fixes where the intended behavior is unchanged (the code was wrong, not the documentation). +## 3c. Infrastructure file review (conditional) + +If the diff touches ANY of these files, apply this additional checklist: +- `docker-compose.yml` or `docker-compose.*.yml` +- `Dockerfile` or `docker/*` +- `.woodpecker/` CI configs +- `docker/agents/entrypoint.sh` + +Infrastructure files have a different failure mode from application code: +a single dropped line (a volume mount, an env var, a restart policy) can +break a running deployment with no syntax error. Treat dropped +infrastructure configuration as a **blocking defect**, not a style choice. + +### For docker-compose.yml changes: + +1. **Read the full file** in the PR branch — do not rely only on the diff. +2. Run `git diff ..HEAD -- docker-compose.yml` to see the complete + change, not just the truncated diff. +3. Check that NONE of the following were dropped without explicit + justification in the PR description: + - Named volumes (e.g. `agent-data`, `project-repos`) + - Bind mounts (especially for config, secrets, SSH keys, shared dirs) + - Environment variables (compare the full `environment:` block against + the base branch) + - `restart:` policy (should be `unless-stopped` for production services) + - `security_opt:` settings + - Network configuration + - Resource limits / deploy constraints +4. If ANY production configuration was dropped and the PR description does + not explain why, **REQUEST_CHANGES**. List each dropped item explicitly. + +### For Dockerfile / entrypoint changes: + +1. Check that base image, installed packages, and runtime deps are preserved. +2. Verify that entrypoint/CMD changes don't break the container startup. + +### For CI config changes: + +1. Check that pipeline steps aren't silently removed. +2. Verify that secret references still match available secrets. + +## 3d. Scope discipline + +Compare the actual diff footprint against the stated issue scope: + +1. Read the PR title and description to identify what the issue asked for. +2. Estimate the expected diff size (e.g., "add 3 env vars" = ~5-10 lines + in compose + ~5 lines in scripts). +3. If the actual diff in ANY single file exceeds 3x the expected scope, + flag it: "this file changed N lines but the issue scope suggests ~M." + +For infrastructure files (compose, Dockerfiles, CI), scope violations are +**blocking**: REQUEST_CHANGES and ask the author to split out-of-scope +changes into a separate PR or justify them in the description. + +For non-infrastructure files, scope violations are advisory: leave a +non-blocking COMMENT noting the scope creep. + ## 4. Vault item quality (conditional) If the PR adds or modifies vault item files (`vault/pending/*.md` in the ops repo), apply these @@ -219,9 +277,11 @@ for actual problems (bugs, security issues, broken functionality, missing required behavior). Use DISCUSS sparingly. Note: The bias toward APPROVE applies to code correctness and style decisions. -It does NOT apply to documentation consistency (step 3b) or tech-debt filing -(step 7) — those are separate concerns that should be handled regardless of -the change's correctness. +It does NOT apply to documentation consistency (step 3b), infrastructure file +findings (step 3c), or tech-debt filing (step 7) — those are separate concerns +that should be handled regardless of the change's correctness. In particular, +dropped production configuration (volumes, bind mounts, env vars, restart +policy) is a blocking defect, not a style preference. ## 9. Output