disinto/docs/investigation-685-reviewer-approved-destructive-compose.md

# Investigation: Reviewer approved destructive compose rewrite in PR #683

**Issue**: #685
**Date**: 2026-04-11
**PR under investigation**: #683 (fix: config: gardener=1h, architect=9m, planner=11m)

## Summary

The reviewer agent approved PR #683 in ~1 minute without flagging that it
contained a destructive rewrite of `docker-compose.yml` — dropping named
volumes, bind mounts, env vars, restart policy, and security options. Five
structural gaps in the review pipeline allowed this to pass.

## Root causes

### 1. No infrastructure-file-specific review checklist

The review formula (`formulas/review-pr.toml`) has a generic review checklist
(bugs, security, imports, architecture, bash specifics, dead code). It has
**no special handling for infrastructure files** — `docker-compose.yml`,
`Dockerfile`, CI configs, or `entrypoint.sh` are reviewed with the same
checklist as application code.

Infrastructure files have a different failure mode: a single dropped line
(a volume mount, an env var, a restart policy) can break a running deployment
without any syntax error or linting failure. The generic checklist doesn't
prompt the reviewer to check for these regressions.

**Fix applied**: Added step 3c "Infrastructure file review" to
`formulas/review-pr.toml` with a compose-specific checklist covering named
volumes, bind mounts, env vars, restart policy, and security options.

### 2. No scope discipline

Issue #682 asked for ~3 env var changes + `PLANNER_INTERVAL` plumbing — roughly
10-15 lines across 3-4 files. PR #683's diff rewrote the entire compose service
block (~50+ lines changed in `docker-compose.yml` alone).

The review formula **does not instruct the reviewer to compare diff size against
issue scope**. A scope-aware reviewer would flag: "this PR changes more lines
than the issue scope warrants — request justification for out-of-scope changes."

**Fix applied**: Added step 3d "Scope discipline" to `formulas/review-pr.toml`
requiring the reviewer to compare actual changes against stated issue scope and
flag out-of-scope modifications to infrastructure files.

### 3. Lessons-learned bias toward approval

The reviewer's `.profile/knowledge/lessons-learned.md` contains multiple entries
that systematically bias toward approval:

- "Approval means 'ready to ship,' not 'perfect.'"
- "'Different from how I'd write it' is not a blocker."
- "Reserve request_changes for genuinely blocking concerns."

These lessons are well-intentioned (they prevent nit-picking and false blocks)
but they create a blind spot: the reviewer suppresses its instinct to flag
suspicious-looking changes because the lessons tell it not to block on
"taste-based" concerns. A compose service block rewrite *looks* like a style
preference ("the dev reorganized the file") but is actually a correctness
regression.

**Recommendation**: The lessons-learned are not wrong — they should stay. But
the review formula now explicitly carves out infrastructure files from the
"bias toward APPROVE" guidance, making it clear that dropped infra
configuration is a blocking concern, not a style preference.

### 4. No ground-truth for infrastructure files

The reviewer only sees the diff. It has no way to compare against the running
container's actual volume/env config. When dev-qwen rewrote a 30-line service
block from scratch, the reviewer saw a 30-line addition and a 30-line deletion
with no reference point.

**Recommendation (future work)**: Maintain a `docker/expected-compose-config.yml`
or have the reviewer fetch `docker compose config` output as ground truth when
reviewing compose changes. This would let the reviewer diff the proposed config
against the known-good config.

### 5. Structural analysis blind spot

`lib/build-graph.py` tracks changes to files in `formulas/`, agent directories
(`dev/`, `review/`, etc.), and `evidence/`. It does **not track infrastructure
files** (`docker-compose.yml`, `docker/`, `.woodpecker/`). Changes to these
files produce no alerts in the graph report — the reviewer gets no
"affected objectives" signal for infrastructure changes.

**Recommendation (future work)**: Add infrastructure file tracking to
`build-graph.py` so that compose/Dockerfile/CI changes surface in the
structural analysis.

### 6. Model and time budget

Reviews use Sonnet (`CLAUDE_MODEL="sonnet"` at `review-pr.sh:229`) with a
15-minute timeout. The PR #683 review completed in ~1 minute. Sonnet is
optimized for speed, which is appropriate for most code reviews, but
infrastructure changes benefit from the deeper reasoning of a more capable
model.

**Recommendation (future work)**: Consider escalating to a more capable model
when the diff includes infrastructure files (compose, Dockerfiles, CI configs).

## Changes made

1. **`formulas/review-pr.toml`** — Added two new review steps:
   - **Step 3c: Infrastructure file review** — When the diff touches
     `docker-compose.yml`, `Dockerfile*`, `.woodpecker/`, or `docker/`,
     requires checking for dropped volumes, bind mounts, env vars, restart
     policy, security options, and network config. Instructs the reviewer to
     read the full file (not just the diff) and compare against the base branch.
   - **Step 3d: Scope discipline** — Requires comparing the actual diff
     footprint against the stated issue scope. Flags out-of-scope rewrites of
     infrastructure files as blocking concerns.

## What would have caught this

With the changes above, the reviewer would have:

1. Seen step 3c trigger for `docker-compose.yml` changes
2. Read the full compose file and compared against the base branch
3. Noticed the dropped named volumes, bind mounts, env vars, restart policy
4. Seen step 3d flag that a 3-env-var issue produced a 50+ line compose rewrite
5. Issued REQUEST_CHANGES citing specific dropped configuration
fix: investigation: reviewer agent approved destructive compose rewrite in PR #683 — why? (#685) Root cause: review formula had no infrastructure-file-specific checklist and no scope discipline check. The reviewer treated a docker-compose.yml rewrite the same as any code change, and lessons-learned biased toward approval. Changes: - Add step 3c (infrastructure file review) to formulas/review-pr.toml: compose-specific checklist for volumes, bind mounts, env vars, restart policy, security options - Add step 3d (scope discipline) to formulas/review-pr.toml: compare actual diff size against issue scope, block on infra-file scope violations - Add investigation writeup in docs/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-11 19:44:30 +00:00			`# Investigation: Reviewer approved destructive compose rewrite in PR #683`

			`Issue: #685`
			`Date: 2026-04-11`
			`PR under investigation: #683 (fix: config: gardener=1h, architect=9m, planner=11m)`

			`## Summary`

			`The reviewer agent approved PR #683 in ~1 minute without flagging that it`
			contained a destructive rewrite of `docker-compose.yml` — dropping named
			`volumes, bind mounts, env vars, restart policy, and security options. Five`
			`structural gaps in the review pipeline allowed this to pass.`

			`## Root causes`

			`### 1. No infrastructure-file-specific review checklist`

			The review formula (`formulas/review-pr.toml`) has a generic review checklist
			`(bugs, security, imports, architecture, bash specifics, dead code). It has`
			no special handling for infrastructure files — `docker-compose.yml`,
			`Dockerfile`, CI configs, or `entrypoint.sh` are reviewed with the same
			`checklist as application code.`

			`Infrastructure files have a different failure mode: a single dropped line`
			`(a volume mount, an env var, a restart policy) can break a running deployment`
			`without any syntax error or linting failure. The generic checklist doesn't`
			`prompt the reviewer to check for these regressions.`

			`Fix applied: Added step 3c "Infrastructure file review" to`
			`formulas/review-pr.toml` with a compose-specific checklist covering named
			`volumes, bind mounts, env vars, restart policy, and security options.`

			`### 2. No scope discipline`

			Issue #682 asked for ~3 env var changes + `PLANNER_INTERVAL` plumbing — roughly
			`10-15 lines across 3-4 files. PR #683's diff rewrote the entire compose service`
			block (~50+ lines changed in `docker-compose.yml` alone).

			`The review formula **does not instruct the reviewer to compare diff size against`
			`issue scope**. A scope-aware reviewer would flag: "this PR changes more lines`
			`than the issue scope warrants — request justification for out-of-scope changes."`

			Fix applied: Added step 3d "Scope discipline" to `formulas/review-pr.toml`
			`requiring the reviewer to compare actual changes against stated issue scope and`
			`flag out-of-scope modifications to infrastructure files.`

			`### 3. Lessons-learned bias toward approval`

			The reviewer's `.profile/knowledge/lessons-learned.md` contains multiple entries
			`that systematically bias toward approval:`

			`- "Approval means 'ready to ship,' not 'perfect.'"`
			`- "'Different from how I'd write it' is not a blocker."`
			`- "Reserve request_changes for genuinely blocking concerns."`

			`These lessons are well-intentioned (they prevent nit-picking and false blocks)`
			`but they create a blind spot: the reviewer suppresses its instinct to flag`
			`suspicious-looking changes because the lessons tell it not to block on`
			`"taste-based" concerns. A compose service block rewrite looks like a style`
			`preference ("the dev reorganized the file") but is actually a correctness`
			`regression.`

			`Recommendation: The lessons-learned are not wrong — they should stay. But`
			`the review formula now explicitly carves out infrastructure files from the`
			`"bias toward APPROVE" guidance, making it clear that dropped infra`
			`configuration is a blocking concern, not a style preference.`

			`### 4. No ground-truth for infrastructure files`

			`The reviewer only sees the diff. It has no way to compare against the running`
			`container's actual volume/env config. When dev-qwen rewrote a 30-line service`
			`block from scratch, the reviewer saw a 30-line addition and a 30-line deletion`
			`with no reference point.`

			Recommendation (future work): Maintain a `docker/expected-compose-config.yml`
			or have the reviewer fetch `docker compose config` output as ground truth when
			`reviewing compose changes. This would let the reviewer diff the proposed config`
			`against the known-good config.`

			`### 5. Structural analysis blind spot`

			`lib/build-graph.py` tracks changes to files in `formulas/`, agent directories
			(`dev/`, `review/`, etc.), and `evidence/`. It does **not track infrastructure
			files** (`docker-compose.yml`, `docker/`, `.woodpecker/`). Changes to these
			`files produce no alerts in the graph report — the reviewer gets no`
			`"affected objectives" signal for infrastructure changes.`

			`Recommendation (future work): Add infrastructure file tracking to`
			`build-graph.py` so that compose/Dockerfile/CI changes surface in the
			`structural analysis.`

			`### 6. Model and time budget`

			Reviews use Sonnet (`CLAUDE_MODEL="sonnet"` at `review-pr.sh:229`) with a
			`15-minute timeout. The PR #683 review completed in ~1 minute. Sonnet is`
			`optimized for speed, which is appropriate for most code reviews, but`
			`infrastructure changes benefit from the deeper reasoning of a more capable`
			`model.`

			`Recommendation (future work): Consider escalating to a more capable model`
			`when the diff includes infrastructure files (compose, Dockerfiles, CI configs).`

			`## Changes made`

			1. `formulas/review-pr.toml` — Added two new review steps:
			`- Step 3c: Infrastructure file review — When the diff touches`
			`docker-compose.yml`, `Dockerfile*`, `.woodpecker/`, or `docker/`,
			`requires checking for dropped volumes, bind mounts, env vars, restart`
			`policy, security options, and network config. Instructs the reviewer to`
			`read the full file (not just the diff) and compare against the base branch.`
			`- Step 3d: Scope discipline — Requires comparing the actual diff`
			`footprint against the stated issue scope. Flags out-of-scope rewrites of`
			`infrastructure files as blocking concerns.`

			`## What would have caught this`

			`With the changes above, the reviewer would have:`

			1. Seen step 3c trigger for `docker-compose.yml` changes
			`2. Read the full compose file and compared against the base branch`
			`3. Noticed the dropped named volumes, bind mounts, env vars, restart policy`
			`4. Seen step 3d flag that a 3-env-var issue produced a 50+ line compose rewrite`
			`5. Issued REQUEST_CHANGES citing specific dropped configuration`