fix: reproduce-agent formula — primary goal is reproduction, not root cause #320

Closed
opened 2026-04-06 19:34:57 +00:00 by dev-bot · 1 comment
Collaborator

Problem

The reproduce-agent formula currently treats reproduction and root cause analysis as equal goals. In practice, the agent spends all its turns investigating instead of confirming the bug first. When it hits the turn limit, we do not even know if the bug is real.

Reframe

The reproduce-agent has one primary job and one secondary, minor job:

Primary: Can the bug be reproduced?

This is the exit gate. The agent must answer YES or NO before doing anything else.

  1. Read the issue, understand the claimed behavior
  2. Navigate the app via Playwright MCP, follow the reported steps
  3. Observe: does the symptom match the report?
  4. Take screenshots as evidence
  5. Conclude: reproduced or cannot reproduce

If cannot reproduce → label bug-report + rejected, post findings, done.
If inconclusive (timeout, env issues) → label bug-report + blocked, post what was tried, done.

Secondary (minor): Is the cause obvious?

Only after reproduction is confirmed. Quick check only — do not go deep.

  1. Check container logs (docker compose logs) for stack traces or error messages
  2. Check browser console output captured during reproduction
  3. If the cause jumps out (wrong address, missing config, parse error) → note it

If obvious cause → label bug-report + in-progress, create backlog issue with cause, done.
If not obvious → label bug-report + in-triage, post reproduction evidence + logs examined. Triage-agent takes over.

Exit gates

The formula must enforce this order:

1. Can I reproduce it?  →  NO  → rejected / blocked → EXIT
                        →  YES → continue
2. Is the cause obvious? → YES → in-progress + backlog issue → EXIT
                         → NO  → in-triage → EXIT

The agent should spend at most 60% of its turn budget on step 1, reserving 40% for step 2 if reproduction succeeds. If step 1 uses all turns, that is fine — the answer is blocked.

Label combinations

Outcome Labels applied Next actor
Reproduced + obvious cause bug-report + in-progress Dev-agent
Reproduced + cause unclear bug-report + in-triage Triage-agent
Cannot reproduce bug-report + rejected Human review
Inconclusive (timeout/error) bug-report + blocked Gardener retries or human

Files

  • formulas/reproduce.toml — rewrite with primary/secondary structure and exit gates
  • docker/reproduce/entrypoint-reproduce.sh — update label logic to use combinations
## Problem The reproduce-agent formula currently treats reproduction and root cause analysis as equal goals. In practice, the agent spends all its turns investigating instead of confirming the bug first. When it hits the turn limit, we do not even know if the bug is real. ## Reframe The reproduce-agent has one primary job and one secondary, minor job: ### Primary: Can the bug be reproduced? This is the exit gate. The agent must answer YES or NO before doing anything else. 1. Read the issue, understand the claimed behavior 2. Navigate the app via Playwright MCP, follow the reported steps 3. Observe: does the symptom match the report? 4. Take screenshots as evidence 5. Conclude: **reproduced** or **cannot reproduce** If cannot reproduce → label `bug-report` + `rejected`, post findings, done. If inconclusive (timeout, env issues) → label `bug-report` + `blocked`, post what was tried, done. ### Secondary (minor): Is the cause obvious? Only after reproduction is confirmed. Quick check only — do not go deep. 1. Check container logs (`docker compose logs`) for stack traces or error messages 2. Check browser console output captured during reproduction 3. If the cause jumps out (wrong address, missing config, parse error) → note it If obvious cause → label `bug-report` + `in-progress`, create backlog issue with cause, done. If not obvious → label `bug-report` + `in-triage`, post reproduction evidence + logs examined. Triage-agent takes over. ### Exit gates The formula must enforce this order: ``` 1. Can I reproduce it? → NO → rejected / blocked → EXIT → YES → continue 2. Is the cause obvious? → YES → in-progress + backlog issue → EXIT → NO → in-triage → EXIT ``` The agent should spend at most 60% of its turn budget on step 1, reserving 40% for step 2 if reproduction succeeds. If step 1 uses all turns, that is fine — the answer is `blocked`. ## Label combinations | Outcome | Labels applied | Next actor | |---------|---------------|------------| | Reproduced + obvious cause | `bug-report` + `in-progress` | Dev-agent | | Reproduced + cause unclear | `bug-report` + `in-triage` | Triage-agent | | Cannot reproduce | `bug-report` + `rejected` | Human review | | Inconclusive (timeout/error) | `bug-report` + `blocked` | Gardener retries or human | ## Files - `formulas/reproduce.toml` — rewrite with primary/secondary structure and exit gates - `docker/reproduce/entrypoint-reproduce.sh` — update label logic to use combinations
dev-bot added the
backlog
label 2026-04-06 19:34:57 +00:00
dev-qwen self-assigned this 2026-04-06 20:52:15 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-06 20:52:16 +00:00
dev-bot added
blocked
and removed
in-progress
labels 2026-04-06 20:54:02 +00:00
Author
Collaborator

Stale in-progress issue detected

Field Value
Detection reason no_active_session_no_open_pr
Timestamp 2026-04-06T20:54:02Z

Status: This issue was labeled in-progress but no active tmux session exists.
Action required: A maintainer should triage this issue.

### Stale in-progress issue detected | Field | Value | |---|---| | Detection reason | `no_active_session_no_open_pr` | | Timestamp | `2026-04-06T20:54:02Z` | **Status:** This issue was labeled `in-progress` but no active tmux session exists. **Action required:** A maintainer should triage this issue.
dev-qwen removed their assignment 2026-04-06 21:07:45 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#320
No description provided.