feat: extend edge container with Playwright and docker compose for bug reproduction #256

Closed
opened 2026-04-05 19:29:05 +00:00 by dev-bot · 4 comments
Collaborator

What

Add a reproduce-agent that runs as a sidecar container with docker socket access, host networking, and Playwright MCP, to automatically reproduce bug reports and do a quick log-based root cause check.

Design decisions (resolved)

Question Decision
Container approach Sidecar — new container, edge stays as Caddy
Network network_mode: host — so localhost reaches harb stack ports, matches Woodpecker agent pattern
Poll trigger Dispatcher loop — fold reproduce check into dispatcher 60s poll
Playwright MCP Bundled in image — npm install at build time
Claude CLI Mount from host — same bind-mount pattern as agents container
Stack target Formula declares itstack_script field in formula TOML (e.g. scripts/dev.sh) OR use existing staging environment. Agent chooses based on formula config

Scope

1. Reproduce

  • Navigate the app via Playwright MCP, follow repro steps from the issue
  • Confirm the symptom matches the report, take screenshots
  • If cannot reproduce → label cannot-reproduce, post findings, done

2. Quick log analysis

  • If reproduced, check container logs (docker compose logs), browser console, network errors
  • Look for obvious causes: stack traces, error messages, wrong addresses, missing config
  • If root cause immediately apparent → post on issue, create backlog issue, label reproduced
  • If inconclusive → document all steps taken, logs examined, observations. Label needs-triage for the triage agent (#258)

What the reproduce-agent does NOT do

  • Deep codebase analysis or multi-layer root cause decomposition (triage agent #258)
  • Modify code, change log levels, or re-run with different config
  • Fix anything — dev-agent picks up backlog issues

Architecture

Gardener labels issue `bug-report`
  → Dispatcher (60s poll) sees `bug-report` without `reproduced`/`cannot-reproduce`
  → Dispatcher starts sidecar: docker run --network host disinto-reproduce ...
  → Sidecar acquires stack lock (lib/stack-lock.sh)
  → If formula has stack_script: run it (e.g. scripts/dev.sh restart --full)
  → If formula uses staging: connect to existing staging environment
  → Sidecar waits for healthy
  → Claude + Playwright MCP navigates app, follows repro steps
  → If reproduced: check docker compose logs, look for obvious cause
  →   Cause found → create backlog issue, label `reproduced`
  →   Inconclusive → document steps + logs, label `needs-triage`
  → If not reproduced → label `cannot-reproduce`
  → Post findings + screenshots on issue
  → Release stack lock

Container spec

reproduce:
  image: disinto-reproduce:latest
  network_mode: host
  profiles: ["reproduce"]
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
    - agent-data:/home/agent/data
    - project-repos:/home/agent/repos
    - $HOME/.claude:/home/agent/.claude
    - <claude-cli-binary>:/usr/local/bin/claude:ro
    - $HOME/.ssh:/home/agent/.ssh:ro
  env_file:
    - .env

Dockerfile (docker/reproduce/Dockerfile):

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
    bash curl git jq docker.io docker-compose-plugin \
    nodejs npm chromium \
    && npm install -g @anthropic-ai/mcp-playwright \
    && rm -rf /var/lib/apt/lists/*
RUN useradd -m -u 1000 -s /bin/bash agent
WORKDIR /home/agent

Formula spec

formulas/reproduce.toml declares:

name = "reproduce"
stack_script = "scripts/dev.sh restart --full"  # or omit to use staging
tools = ["playwright"]
timeout_minutes = 15

Dependencies

  • #253bug-report and needs-triage labels created during init
  • #252 — gardener labels issues as bug-report
  • #255 — stack lock protocol

Files

  • New: docker/reproduce/Dockerfile
  • New: formulas/reproduce.toml
  • New: docker/reproduce/entrypoint-reproduce.sh
  • Modify: docker/edge/dispatcher.sh — add reproduce dispatch
  • Modify: docker-compose.yml template — add reproduce service
## What Add a reproduce-agent that runs as a sidecar container with docker socket access, host networking, and Playwright MCP, to automatically reproduce bug reports and do a quick log-based root cause check. ## Design decisions (resolved) | Question | Decision | |----------|----------| | Container approach | **Sidecar** — new container, edge stays as Caddy | | Network | **`network_mode: host`** — so `localhost` reaches harb stack ports, matches Woodpecker agent pattern | | Poll trigger | **Dispatcher loop** — fold reproduce check into dispatcher 60s poll | | Playwright MCP | **Bundled in image** — npm install at build time | | Claude CLI | **Mount from host** — same bind-mount pattern as agents container | | Stack target | **Formula declares it** — `stack_script` field in formula TOML (e.g. `scripts/dev.sh`) OR use existing staging environment. Agent chooses based on formula config | ## Scope ### 1. Reproduce - Navigate the app via Playwright MCP, follow repro steps from the issue - Confirm the symptom matches the report, take screenshots - If cannot reproduce → label `cannot-reproduce`, post findings, done ### 2. Quick log analysis - If reproduced, check container logs (`docker compose logs`), browser console, network errors - Look for obvious causes: stack traces, error messages, wrong addresses, missing config - If root cause immediately apparent → post on issue, create `backlog` issue, label `reproduced` - If inconclusive → document all steps taken, logs examined, observations. Label `needs-triage` for the triage agent (#258) ## What the reproduce-agent does NOT do - Deep codebase analysis or multi-layer root cause decomposition (triage agent #258) - Modify code, change log levels, or re-run with different config - Fix anything — dev-agent picks up backlog issues ## Architecture ``` Gardener labels issue `bug-report` → Dispatcher (60s poll) sees `bug-report` without `reproduced`/`cannot-reproduce` → Dispatcher starts sidecar: docker run --network host disinto-reproduce ... → Sidecar acquires stack lock (lib/stack-lock.sh) → If formula has stack_script: run it (e.g. scripts/dev.sh restart --full) → If formula uses staging: connect to existing staging environment → Sidecar waits for healthy → Claude + Playwright MCP navigates app, follows repro steps → If reproduced: check docker compose logs, look for obvious cause → Cause found → create backlog issue, label `reproduced` → Inconclusive → document steps + logs, label `needs-triage` → If not reproduced → label `cannot-reproduce` → Post findings + screenshots on issue → Release stack lock ``` ## Container spec ```yaml reproduce: image: disinto-reproduce:latest network_mode: host profiles: ["reproduce"] volumes: - /var/run/docker.sock:/var/run/docker.sock - agent-data:/home/agent/data - project-repos:/home/agent/repos - $HOME/.claude:/home/agent/.claude - <claude-cli-binary>:/usr/local/bin/claude:ro - $HOME/.ssh:/home/agent/.ssh:ro env_file: - .env ``` **Dockerfile** (`docker/reproduce/Dockerfile`): ```dockerfile FROM debian:bookworm-slim RUN apt-get update && apt-get install -y --no-install-recommends \ bash curl git jq docker.io docker-compose-plugin \ nodejs npm chromium \ && npm install -g @anthropic-ai/mcp-playwright \ && rm -rf /var/lib/apt/lists/* RUN useradd -m -u 1000 -s /bin/bash agent WORKDIR /home/agent ``` ## Formula spec `formulas/reproduce.toml` declares: ```toml name = "reproduce" stack_script = "scripts/dev.sh restart --full" # or omit to use staging tools = ["playwright"] timeout_minutes = 15 ``` ## Dependencies - #253 — `bug-report` and `needs-triage` labels created during init - #252 — gardener labels issues as `bug-report` - #255 — stack lock protocol ## Files - New: `docker/reproduce/Dockerfile` - New: `formulas/reproduce.toml` - New: `docker/reproduce/entrypoint-reproduce.sh` - Modify: `docker/edge/dispatcher.sh` — add reproduce dispatch - Modify: `docker-compose.yml` template — add `reproduce` service
dev-bot added the
vision
label 2026-04-05 19:29:05 +00:00
Author
Collaborator

Design note: sibling containers + network access

The sidecar has /var/run/docker.sock mounted, so docker compose commands go to the host daemon. Harb stack containers start as siblings on the host, not children — this is correct.

However, scripts/dev.sh health checks use localhost URLs (e.g. http://localhost:5173). From inside the sidecar container, localhost is the sidecar itself, not the host where harb containers bind.

Resolution: sidecar should use network_mode: host so its localhost is the host network. This also means Playwright can reach the webapp at localhost:5173 directly. The sidecar is already privileged (docker socket), so network_mode: host does not change the security posture.

reproduce:
  image: disinto-reproduce:latest
  network_mode: host
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
    - project-repos:/home/agent/repos
    ...
## Design note: sibling containers + network access The sidecar has `/var/run/docker.sock` mounted, so `docker compose` commands go to the host daemon. Harb stack containers start as **siblings** on the host, not children — this is correct. However, `scripts/dev.sh` health checks use `localhost` URLs (e.g. `http://localhost:5173`). From inside the sidecar container, `localhost` is the sidecar itself, not the host where harb containers bind. **Resolution**: sidecar should use `network_mode: host` so its `localhost` is the host network. This also means Playwright can reach the webapp at `localhost:5173` directly. The sidecar is already privileged (docker socket), so `network_mode: host` does not change the security posture. ```yaml reproduce: image: disinto-reproduce:latest network_mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock - project-repos:/home/agent/repos ... ```
disinto-admin added the
backlog
label 2026-04-06 06:25:00 +00:00
disinto-admin removed the
vision
label 2026-04-06 06:27:44 +00:00
dev-qwen self-assigned this 2026-04-06 07:18:51 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-06 07:18:51 +00:00
dev-bot added
blocked
and removed
in-progress
labels 2026-04-06 07:19:01 +00:00
Author
Collaborator

Stale in-progress issue detected

Field Value
Detection reason no_active_session_no_open_pr
Timestamp 2026-04-06T07:19:01Z

Status: This issue was labeled in-progress but no active tmux session exists.
Action required: A maintainer should triage this issue.

### Stale in-progress issue detected | Field | Value | |---|---| | Detection reason | `no_active_session_no_open_pr` | | Timestamp | `2026-04-06T07:19:01Z` | **Status:** This issue was labeled `in-progress` but no active tmux session exists. **Action required:** A maintainer should triage this issue.
Collaborator

Blocked — issue #256

Field Value
Exit reason no_push
Timestamp 2026-04-06T07:22:03Z
Diagnostic output
Claude did not push branch fix/issue-256
### Blocked — issue #256 | Field | Value | |---|---| | Exit reason | `no_push` | | Timestamp | `2026-04-06T07:22:03Z` | <details><summary>Diagnostic output</summary> ``` Claude did not push branch fix/issue-256 ``` </details>
dev-bot added
backlog
and removed
blocked
labels 2026-04-06 07:34:54 +00:00
dev-qwen was unassigned by dev-bot 2026-04-06 07:34:54 +00:00
dev-bot self-assigned this 2026-04-06 07:39:03 +00:00
dev-bot added
in-progress
and removed
backlog
labels 2026-04-06 07:39:03 +00:00
Author
Collaborator

Blocked — issue #256

Field Value
Exit reason no_push
Timestamp 2026-04-06T07:39:06Z
Diagnostic output
Claude did not push branch fix/issue-256
### Blocked — issue #256 | Field | Value | |---|---| | Exit reason | `no_push` | | Timestamp | `2026-04-06T07:39:06Z` | <details><summary>Diagnostic output</summary> ``` Claude did not push branch fix/issue-256 ``` </details>
dev-bot added
blocked
and removed
in-progress
labels 2026-04-06 07:39:07 +00:00
dev-bot removed their assignment 2026-04-06 07:51:41 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#256
No description provided.