disinto-ops/sprints/edge-subpath-chat.md

6.8 KiB
Raw Blame History

Sprint: edge-subpath-chat

Vision issues

  • #623 — vision: subpath routing + Forgejo-OAuth-gated Claude chat inside the edge container

What this enables

After this sprint, an operator running disinto edge register gets a single URL — <project>.disinto.ai — with Forgejo at /forge/, Woodpecker CI at /ci/, a staging preview at /staging/, and an OAuth-gated Claude Code chat at /chat/, all under one wildcard cert and one bootstrap password. The factory talks back to its operator through a chat window that sits next to the forge, CI, and live preview it is driving.

What exists today

The majority of this vision is already implemented across issues #704#711:

  • Subpath routing: Caddyfile generator produces /forge/*, /ci/*, /staging/*, /chat/* handlers (lib/generators.sh:780822). Forgejo ROOT_URL and Woodpecker WOODPECKER_HOST are set to subpath values when EDGE_TUNNEL_FQDN is present (bin/disinto:842847).
  • Chat container: Full OAuth flow via Forgejo, HttpOnly session cookies, forward_auth defense-in-depth with FORWARD_AUTH_SECRET, per-user rate limiting (hourly/daily/token caps), conversation history in NDJSON (docker/chat/server.py).
  • Sandbox hardening: Read-only rootfs, cap_drop: ALL, no-new-privileges, pids_limit: 128, mem_limit: 512m, no Docker socket. Verification script at tools/edge-control/verify-chat-sandbox.sh.
  • Edge control plane: Tunnel registration, port allocation, Caddy admin API routing, wildcard *.disinto.ai cert via DNS-01 (tools/edge-control/).
  • Dependencies #620/#621/#622: Admin password prompt, edge control plane, and reverse tunnel — all implemented and merged.
  • Subdomain fallback plan: Fully documented at docs/edge-routing-fallback.md with pivot criteria.

Complexity

  • ~6 files touched across 3 subsystems (Caddy routing, chat backend, compose generation)
  • Estimated 4 sub-issues
  • ~90% gluecode (wiring existing pieces), ~10% greenfield (WebSocket streaming, end-to-end smoke test)

Risks

  • Forgejo/Woodpecker subpath breakage: Neither service is battle-tested under subpaths in this stack. Redirect loops, OAuth callback mismatches, or asset 404s are plausible. Mitigation: the fallback plan (docs/edge-routing-fallback.md) is already documented and estimated at under one day to pivot.
  • Cookie/CSRF collision: Forgejo and chat share the same origin — cookie names or CSRF tokens could collide. Mitigation: chat uses a namespaced cookie (disinto_chat_session) and a separate OAuth app.
  • Streaming latency: One-shot claude --print blocks until completion. Long responses leave the operator staring at a spinner. Not a correctness risk, but a UX risk that WebSocket streaming would fix.

Cost — new infra to maintain

  • No new services — all containers already exist in the compose stack
  • No new scheduled tasks or formulas — chat is a passive request handler
  • One new smoke test (CI) — end-to-end subpath routing verification
  • Ongoing: monitoring Forgejo/Woodpecker upstream for subpath regressions on upgrades

Recommendation

Worth it. The vision is ~80% implemented. The remaining work is integration hardening (confirming subpath routing works end-to-end with real Forgejo/Woodpecker) and one UX improvement (WebSocket streaming). The risk is low because a documented fallback to per-service subdomains exists. Ship this sprint to close the loop on the edge experience.

Sub-issues

  • id: subpath-routing-smoke-test title: "vision(#623): end-to-end subpath routing smoke test for Forgejo + Woodpecker + chat" labels: [backlog] depends_on: [] body: |

    Goal

    Verify that Forgejo, Woodpecker, and chat all function correctly when served under /forge/, /ci/, and /chat/ subpaths on a single domain. Catch redirect loops, OAuth callback failures, and asset 404s before they hit production.

    Acceptance criteria

    • Forgejo login at /forge/ completes without redirect loops
    • Forgejo OAuth callback for Woodpecker succeeds under subpath
    • Woodpecker dashboard loads all assets at /ci/ (no 404s on JS/CSS)
    • Chat OAuth login flow works at /chat/login
    • Forward_auth on /chat/* rejects unauthenticated requests with 401
    • Staging content loads at /staging/
    • Root / redirects to /forge/
    • CI pipeline added to .woodpecker/ to run this test on edge-related changes
  • id: websocket-streaming-chat title: "vision(#623): WebSocket streaming for chat UI to replace one-shot claude --print" labels: [backlog] depends_on: [subpath-routing-smoke-test] body: |

    Goal

    Replace the blocking one-shot claude --print invocation in the chat backend with a WebSocket connection that streams tokens to the UI as they arrive.

    Acceptance criteria

    • /chat/ws endpoint accepts WebSocket upgrade with valid session cookie
    • /chat/ws rejects upgrade if session cookie is missing or expired
    • Chat backend streams claude output over WebSocket as text frames
    • UI renders tokens incrementally as they arrive
    • Rate limiting still enforced on WebSocket messages
    • Caddy proxies WebSocket upgrade correctly through /chat/ws with forward_auth
  • id: chat-working-dir-scoping title: "vision(#623): scope Claude chat working directory to project staging checkout" labels: [backlog] depends_on: [subpath-routing-smoke-test] body: |

    Goal

    Give the chat container Claude session read-write access to the project working tree so the operator can inspect, explain, or modify code — scoped to that tree only, with no access to factory internals, secrets, or Docker socket.

    Acceptance criteria

    • Chat container bind-mounts the project working tree as a named volume
    • Claude invocation in server.py sets cwd to the workspace directory
    • Claude permission mode is acceptEdits (not bypassPermissions)
    • verify-chat-sandbox.sh updated to assert workspace mount exists
    • Compose generator adds the workspace volume conditionally
  • id: subpath-fallback-automation title: "vision(#623): automate subdomain fallback pivot if subpath routing fails" labels: [backlog] depends_on: [subpath-routing-smoke-test] body: |

    Goal

    If the smoke test reveals unfixable subpath issues, automate the pivot to per-service subdomains so the switch is a single config change.

    Acceptance criteria

    • generators.sh _generate_caddyfile_impl accepts EDGE_ROUTING_MODE env var
    • In subdomain mode, Caddyfile emits four host blocks per edge-routing-fallback.md
    • register.sh registers additional subdomain routes when EDGE_ROUTING_MODE=subdomain
    • OAuth redirect URIs in ci-setup.sh respect routing mode
    • .env template documents EDGE_ROUTING_MODE with a comment referencing the fallback doc