disinto-ops/sprints/edge-subpath-chat.md

106 lines
6.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sprint: edge-subpath-chat
## Vision issues
- #623 — vision: subpath routing + Forgejo-OAuth-gated Claude chat inside the edge container
## What this enables
After this sprint, an operator running `disinto edge register` gets a single URL — `<project>.disinto.ai` — with Forgejo at `/forge/`, Woodpecker CI at `/ci/`, a staging preview at `/staging/`, and an OAuth-gated Claude Code chat at `/chat/`, all under one wildcard cert and one bootstrap password. The factory talks back to its operator through a chat window that sits next to the forge, CI, and live preview it is driving.
## What exists today
The majority of this vision is already implemented across issues #704#711:
- **Subpath routing**: Caddyfile generator produces `/forge/*`, `/ci/*`, `/staging/*`, `/chat/*` handlers (`lib/generators.sh:780822`). Forgejo `ROOT_URL` and Woodpecker `WOODPECKER_HOST` are set to subpath values when `EDGE_TUNNEL_FQDN` is present (`bin/disinto:842847`).
- **Chat container**: Full OAuth flow via Forgejo, HttpOnly session cookies, forward_auth defense-in-depth with `FORWARD_AUTH_SECRET`, per-user rate limiting (hourly/daily/token caps), conversation history in NDJSON (`docker/chat/server.py`).
- **Sandbox hardening**: Read-only rootfs, `cap_drop: ALL`, `no-new-privileges`, `pids_limit: 128`, `mem_limit: 512m`, no Docker socket. Verification script at `tools/edge-control/verify-chat-sandbox.sh`.
- **Edge control plane**: Tunnel registration, port allocation, Caddy admin API routing, wildcard `*.disinto.ai` cert via DNS-01 (`tools/edge-control/`).
- **Dependencies #620/#621/#622**: Admin password prompt, edge control plane, and reverse tunnel — all implemented and merged.
- **Subdomain fallback plan**: Fully documented at `docs/edge-routing-fallback.md` with pivot criteria.
## Complexity
- ~6 files touched across 3 subsystems (Caddy routing, chat backend, compose generation)
- Estimated 4 sub-issues
- ~90% gluecode (wiring existing pieces), ~10% greenfield (WebSocket streaming, end-to-end smoke test)
## Risks
- **Forgejo/Woodpecker subpath breakage**: Neither service is battle-tested under subpaths in this stack. Redirect loops, OAuth callback mismatches, or asset 404s are plausible. Mitigation: the fallback plan (`docs/edge-routing-fallback.md`) is already documented and estimated at under one day to pivot.
- **Cookie/CSRF collision**: Forgejo and chat share the same origin — cookie names or CSRF tokens could collide. Mitigation: chat uses a namespaced cookie (`disinto_chat_session`) and a separate OAuth app.
- **Streaming latency**: One-shot `claude --print` blocks until completion. Long responses leave the operator staring at a spinner. Not a correctness risk, but a UX risk that WebSocket streaming would fix.
## Cost — new infra to maintain
- **No new services** — all containers already exist in the compose stack
- **No new scheduled tasks or formulas** — chat is a passive request handler
- **One new smoke test** (CI) — end-to-end subpath routing verification
- **Ongoing**: monitoring Forgejo/Woodpecker upstream for subpath regressions on upgrades
## Recommendation
Worth it. The vision is ~80% implemented. The remaining work is integration hardening (confirming subpath routing works end-to-end with real Forgejo/Woodpecker) and one UX improvement (WebSocket streaming). The risk is low because a documented fallback to per-service subdomains exists. Ship this sprint to close the loop on the edge experience.
## Sub-issues
<!-- filer:begin -->
- id: subpath-routing-smoke-test
title: "vision(#623): end-to-end subpath routing smoke test for Forgejo + Woodpecker + chat"
labels: [backlog]
depends_on: []
body: |
## Goal
Verify that Forgejo, Woodpecker, and chat all function correctly when served
under /forge/, /ci/, and /chat/ subpaths on a single domain. Catch redirect
loops, OAuth callback failures, and asset 404s before they hit production.
## Acceptance criteria
- [ ] Forgejo login at /forge/ completes without redirect loops
- [ ] Forgejo OAuth callback for Woodpecker succeeds under subpath
- [ ] Woodpecker dashboard loads all assets at /ci/ (no 404s on JS/CSS)
- [ ] Chat OAuth login flow works at /chat/login
- [ ] Forward_auth on /chat/* rejects unauthenticated requests with 401
- [ ] Staging content loads at /staging/
- [ ] Root / redirects to /forge/
- [ ] CI pipeline added to .woodpecker/ to run this test on edge-related changes
- id: websocket-streaming-chat
title: "vision(#623): WebSocket streaming for chat UI to replace one-shot claude --print"
labels: [backlog]
depends_on: [subpath-routing-smoke-test]
body: |
## Goal
Replace the blocking one-shot claude --print invocation in the chat backend with
a WebSocket connection that streams tokens to the UI as they arrive.
## Acceptance criteria
- [ ] /chat/ws endpoint accepts WebSocket upgrade with valid session cookie
- [ ] /chat/ws rejects upgrade if session cookie is missing or expired
- [ ] Chat backend streams claude output over WebSocket as text frames
- [ ] UI renders tokens incrementally as they arrive
- [ ] Rate limiting still enforced on WebSocket messages
- [ ] Caddy proxies WebSocket upgrade correctly through /chat/ws with forward_auth
- id: chat-working-dir-scoping
title: "vision(#623): scope Claude chat working directory to project staging checkout"
labels: [backlog]
depends_on: [subpath-routing-smoke-test]
body: |
## Goal
Give the chat container Claude session read-write access to the project working
tree so the operator can inspect, explain, or modify code — scoped to that tree
only, with no access to factory internals, secrets, or Docker socket.
## Acceptance criteria
- [ ] Chat container bind-mounts the project working tree as a named volume
- [ ] Claude invocation in server.py sets cwd to the workspace directory
- [ ] Claude permission mode is acceptEdits (not bypassPermissions)
- [ ] verify-chat-sandbox.sh updated to assert workspace mount exists
- [ ] Compose generator adds the workspace volume conditionally
- id: subpath-fallback-automation
title: "vision(#623): automate subdomain fallback pivot if subpath routing fails"
labels: [backlog]
depends_on: [subpath-routing-smoke-test]
body: |
## Goal
If the smoke test reveals unfixable subpath issues, automate the pivot to
per-service subdomains so the switch is a single config change.
## Acceptance criteria
- [ ] generators.sh _generate_caddyfile_impl accepts EDGE_ROUTING_MODE env var
- [ ] In subdomain mode, Caddyfile emits four host blocks per edge-routing-fallback.md
- [ ] register.sh registers additional subdomain routes when EDGE_ROUTING_MODE=subdomain
- [ ] OAuth redirect URIs in ci-setup.sh respect routing mode
- [ ] .env template documents EDGE_ROUTING_MODE with a comment referencing the fallback doc
<!-- filer:end -->