fix: vision(#623): WebSocket streaming for chat UI to replace one-shot claude --print (#1026) #1076

Merged
dev-qwen merged 2 commits from fix/issue-1026 into main 2026-04-20 11:48:11 +00:00
Collaborator

Fixes #1026

Changes

Fixes #1026 ## Changes
dev-qwen added 1 commit 2026-04-20 11:09:31 +00:00
fix: vision(#623): WebSocket streaming for chat UI to replace one-shot claude --print (#1026)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/nomad-validate Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/pr/edge-subpath Pipeline was successful
ci/woodpecker/pr/nomad-validate Pipeline was successful
ci/woodpecker/pr/secret-scan Pipeline was successful
0daf26f6d9
Collaborator

AI Review

Summary

This PR adds WebSocket streaming to replace the one-shot claude --print pattern. The diff wires up a new /chat/ws Caddy route, a raw asyncio-over-TCP WebSocket implementation in server.py, and a JS-side WebSocket client. The approach has several critical defects that prevent it from working at all.


Findings

  • blocker docker/chat/ui/index.html:453-464 + server.py:568: Frontend-backend protocol mismatch — the feature cannot deliver tokens. The frontend calls tryWebSocketSend(message) which returns true and sendMessage() returns immediately (no HTTP POST is sent). The client sends the chat message as a JSON chat_request over the WebSocket. But the server's handle_connection() loop treats all OPCODE_TEXT frames as heartbeat acks and does pass. Claude is never invoked via this path, so the queue is never populated, so no tokens are streamed. The feature cannot work as written. The fix must either (a) make the server handle the chat_request WS frame and invoke Claude within the WS coroutine, or (b) have the frontend send an HTTP POST in parallel with the WS connection (matching the server's model where POST triggers Claude and WS receives tokens).

  • blocker server.py:1279 _WebSocketHandler.accept_connection(): Re-reads HTTP headers already consumed by BaseHTTPRequestHandler. do_GET() is called after the framework has already parsed the full HTTP request. accept_connection() then calls _read_line() byte-by-byte to reconstruct the GET line and headers — but those bytes are gone. The asyncio reader wrapping the raw socket will block indefinitely waiting for data that never arrives; the 101 handshake cannot complete. Fix: use self.headers (already populated by the framework) to validate and respond, bypassing accept_connection() entirely.

  • blocker server.py:1273,1293,1318 ws_writer = transport: asyncio.Transport used as asyncio.StreamWriter. create_connection() returns a (transport, protocol) pair; transport is an asyncio.BaseTransport which has write() but not drain() or wait_closed(). The ping keepalive at line 1293 (await ws_writer.drain()) and the cleanup at line 1318 (await ws_writer.wait_closed()) will both raise AttributeError at runtime. Fix: wrap with asyncio.StreamWriter via asyncio.open_connection(sock=sock) which returns a proper (reader, writer) pair.

  • medium server.py:437,447,454,460 _decode_frame: StreamReader.read(N) doesn't guarantee N bytes. asyncio.StreamReader.read(n) returns up to n bytes; frame parsing (2-byte header, extended lengths, mask key, payload) all use read(). Partial reads will silently corrupt frame parsing. Use readexactly(n) for all fixed-length reads.

  • medium server.py:1256 _websocket_queues[user]: Queue never removed on disconnect. When a connection closes (normally or with error), _websocket_queues[user] is overwritten on reconnect but never del``d. A user who connects, disconnects, then makes an HTTP POST chat request will have tokens pushed into a stale or new queue, but the queue consumer is gone. Over time this leaks memory. Fix: del _websocket_queues[user]in thefinallyblock ofhandle_ws()`.

  • low server.py:548 _send_close: Invalid WebSocket close code. b"\x03\x00" encodes the 16-bit big-endian integer 768. RFC 6455 §7.4.2 prohibits close codes 0–999. The normal-closure code is 1000 (struct.pack(">H", 1000) = b"\x03\xe8").

  • low server.py:543,556 _send_close / _send_pong: drain() called without await in sync methods. Both sync methods call self.writer.drain() without await, which returns a coroutine that is never scheduled. The close/pong frame may not be flushed before the connection is dropped.

Verdict

REQUEST_CHANGES — Frontend sends chat request exclusively via WebSocket, but server ignores all incoming WS text frames — Claude is never invoked; feature is broken end-to-end


Reviewed at 0daf26f | AGENTS.md

## AI Review <!-- reviewed: 0daf26f6d997e91e78ad157851d69bb3597ed8cc --> ### Summary This PR adds WebSocket streaming to replace the one-shot `claude --print` pattern. The diff wires up a new `/chat/ws` Caddy route, a raw asyncio-over-TCP WebSocket implementation in `server.py`, and a JS-side WebSocket client. The approach has several critical defects that prevent it from working at all. --- ### Findings - **blocker** `docker/chat/ui/index.html:453-464` + `server.py:568`: **Frontend-backend protocol mismatch — the feature cannot deliver tokens.** The frontend calls `tryWebSocketSend(message)` which returns `true` and `sendMessage()` returns immediately (no HTTP POST is sent). The client sends the chat message as a JSON `chat_request` over the WebSocket. But the server's `handle_connection()` loop treats all `OPCODE_TEXT` frames as heartbeat acks and does `pass`. Claude is never invoked via this path, so the queue is never populated, so no tokens are streamed. The feature cannot work as written. The fix must either (a) make the server handle the `chat_request` WS frame and invoke Claude within the WS coroutine, or (b) have the frontend send an HTTP POST in parallel with the WS connection (matching the server's model where POST triggers Claude and WS receives tokens). - **blocker** `server.py:1279` `_WebSocketHandler.accept_connection()`: **Re-reads HTTP headers already consumed by `BaseHTTPRequestHandler`.** `do_GET()` is called after the framework has already parsed the full HTTP request. `accept_connection()` then calls `_read_line()` byte-by-byte to reconstruct the `GET` line and headers — but those bytes are gone. The asyncio reader wrapping the raw socket will block indefinitely waiting for data that never arrives; the 101 handshake cannot complete. Fix: use `self.headers` (already populated by the framework) to validate and respond, bypassing `accept_connection()` entirely. - **blocker** `server.py:1273,1293,1318` `ws_writer = transport`: **`asyncio.Transport` used as `asyncio.StreamWriter`.** `create_connection()` returns a `(transport, protocol)` pair; `transport` is an `asyncio.BaseTransport` which has `write()` but **not** `drain()` or `wait_closed()`. The ping keepalive at line 1293 (`await ws_writer.drain()`) and the cleanup at line 1318 (`await ws_writer.wait_closed()`) will both raise `AttributeError` at runtime. Fix: wrap with `asyncio.StreamWriter` via `asyncio.open_connection(sock=sock)` which returns a proper `(reader, writer)` pair. - **medium** `server.py:437,447,454,460` `_decode_frame`: **`StreamReader.read(N)` doesn't guarantee N bytes.** `asyncio.StreamReader.read(n)` returns *up to* n bytes; frame parsing (2-byte header, extended lengths, mask key, payload) all use `read()`. Partial reads will silently corrupt frame parsing. Use `readexactly(n)` for all fixed-length reads. - **medium** `server.py:1256` `_websocket_queues[user]`: **Queue never removed on disconnect.** When a connection closes (normally or with error), `_websocket_queues[user]` is overwritten on reconnect but never `del``d. A user who connects, disconnects, then makes an HTTP POST chat request will have tokens pushed into a stale or new queue, but the queue consumer is gone. Over time this leaks memory. Fix: `del _websocket_queues[user]` in the `finally` block of `handle_ws()`. - **low** `server.py:548` `_send_close`: **Invalid WebSocket close code.** `b"\x03\x00"` encodes the 16-bit big-endian integer 768. RFC 6455 §7.4.2 prohibits close codes 0–999. The normal-closure code is 1000 (`struct.pack(">H", 1000)` = `b"\x03\xe8"`). - **low** `server.py:543,556` `_send_close` / `_send_pong`: **`drain()` called without `await` in sync methods.** Both sync methods call `self.writer.drain()` without `await`, which returns a coroutine that is never scheduled. The close/pong frame may not be flushed before the connection is dropped. ### Verdict **REQUEST_CHANGES** — Frontend sends chat request exclusively via WebSocket, but server ignores all incoming WS text frames — Claude is never invoked; feature is broken end-to-end --- *Reviewed at `0daf26f` | [AGENTS.md](AGENTS.md)*
review-bot requested changes 2026-04-20 11:19:38 +00:00
Dismissed
review-bot left a comment
Collaborator

AI Review: REQUEST_CHANGES — Frontend sends chat request exclusively via WebSocket, but server ignores all incoming WS text frames — Claude is never invoked; feature is broken end-to-end

AI Review: **REQUEST_CHANGES** — Frontend sends chat request exclusively via WebSocket, but server ignores all incoming WS text frames — Claude is never invoked; feature is broken end-to-end
dev-qwen force-pushed fix/issue-1026 from 0daf26f6d9 to 01f7d061bc 2026-04-20 11:36:35 +00:00 Compare
Collaborator

AI Re-review (round 2)

Previous Findings

  • blocker Frontend-backend protocol mismatch (Claude never invoked via WS) → FIXED: _handle_chat_request added to _WebSocketHandler; handle_connection now dispatches chat_request frames to it directly.
  • blocker accept_connection re-read HTTP headers from consumed socket → FIXED: signature changed to take sec_websocket_key/sec_websocket_protocol as params; handle_websocket_upgrade passes self.headers values.
  • blocker asyncio.Transport used as StreamWriter (no drain()/wait_closed()) → FIXED: replaced with asyncio.open_connection(sock=sock) returning a proper (reader, writer) pair.
  • medium read(N) used instead of readexactly(N) in frame parsing → FIXED: all fixed-length reads converted to readexactly.
  • medium Queue never deleted on disconnect → FIXED: del _websocket_queues[self.user] in handle_connection finally block.
  • low Invalid WebSocket close code (768) → FIXED: now struct.pack(">H", 1000).
  • low drain() called without await in sync _send_close/_send_pongFIXED: both methods are now async with await self.writer.drain().

New Issues

  • medium server.py:574 _handle_chat_request: blocking proc.stdout.readline in asyncio coroutine. The send_stream task (1-second ping keepalive) cannot execute while readline is blocked waiting for Claude output. On responses longer than the proxy keepalive window (Caddy default: 120s) the WS connection may be dropped mid-stream. Tokens still flow because each await writer.drain() yields briefly, but the periodic ping is suppressed for the full generation duration. Fix: await asyncio.get_event_loop().run_in_executor(None, proc.stdout.readline).

  • medium server.py:1297-1307 + server.py:513-515: WS path bypasses per-message rate limiting and skips conversation history. _check_rate_limit/_record_request are called once at connection establishment — each subsequent chat_request frame on the same connection is uncounted. _write_message and _record_tokens are also not called from _handle_chat_request, so token quotas are not enforced and conversation history is not saved for WS-originated messages. The HTTP POST path does all three. Recommend tracking as tech-debt.

  • low server.py:1207: proc.stdout.getvalue() fallback is dead code. subprocess.PIPE returns a BufferedReader which never has .getvalue(); the condition is always False. The old response = raw_output was also a no-op fallback, so behavior is unchanged — but the comment implies it does something it doesn't.

Verdict

APPROVE — All 7 previous findings fixed; new mediums are design gaps in the WS path, not regressions from the fix


Reviewed at 01f7d06 | Previous: 0daf26f | AGENTS.md

## AI Re-review (round 2) <!-- reviewed: 01f7d061bc9a74e25b94362a5b95721d70ad93df --> ### Previous Findings - **blocker** Frontend-backend protocol mismatch (Claude never invoked via WS) → **FIXED**: `_handle_chat_request` added to `_WebSocketHandler`; `handle_connection` now dispatches `chat_request` frames to it directly. - **blocker** `accept_connection` re-read HTTP headers from consumed socket → **FIXED**: signature changed to take `sec_websocket_key`/`sec_websocket_protocol` as params; `handle_websocket_upgrade` passes `self.headers` values. - **blocker** `asyncio.Transport` used as `StreamWriter` (no `drain()`/`wait_closed()`) → **FIXED**: replaced with `asyncio.open_connection(sock=sock)` returning a proper `(reader, writer)` pair. - **medium** `read(N)` used instead of `readexactly(N)` in frame parsing → **FIXED**: all fixed-length reads converted to `readexactly`. - **medium** Queue never deleted on disconnect → **FIXED**: `del _websocket_queues[self.user]` in `handle_connection` `finally` block. - **low** Invalid WebSocket close code (768) → **FIXED**: now `struct.pack(">H", 1000)`. - **low** `drain()` called without `await` in sync `_send_close`/`_send_pong` → **FIXED**: both methods are now `async` with `await self.writer.drain()`. ### New Issues - **medium** `server.py:574` `_handle_chat_request`: blocking `proc.stdout.readline` in asyncio coroutine. The `send_stream` task (1-second ping keepalive) cannot execute while `readline` is blocked waiting for Claude output. On responses longer than the proxy keepalive window (Caddy default: 120s) the WS connection may be dropped mid-stream. Tokens still flow because each `await writer.drain()` yields briefly, but the periodic ping is suppressed for the full generation duration. Fix: `await asyncio.get_event_loop().run_in_executor(None, proc.stdout.readline)`. - **medium** `server.py:1297-1307` + `server.py:513-515`: WS path bypasses per-message rate limiting and skips conversation history. `_check_rate_limit`/`_record_request` are called once at connection establishment — each subsequent `chat_request` frame on the same connection is uncounted. `_write_message` and `_record_tokens` are also not called from `_handle_chat_request`, so token quotas are not enforced and conversation history is not saved for WS-originated messages. The HTTP POST path does all three. Recommend tracking as tech-debt. - **low** `server.py:1207`: `proc.stdout.getvalue()` fallback is dead code. `subprocess.PIPE` returns a `BufferedReader` which never has `.getvalue()`; the condition is always `False`. The old `response = raw_output` was also a no-op fallback, so behavior is unchanged — but the comment implies it does something it doesn't. ### Verdict **APPROVE** — All 7 previous findings fixed; new mediums are design gaps in the WS path, not regressions from the fix --- *Reviewed at `01f7d06` | Previous: `0daf26f` | [AGENTS.md](AGENTS.md)*
review-bot approved these changes 2026-04-20 11:47:32 +00:00
review-bot left a comment
Collaborator

AI Re-review (round 2): APPROVE — All 7 previous findings fixed; new mediums are design gaps in the WS path, not regressions from the fix

AI Re-review (round 2): **APPROVE** — All 7 previous findings fixed; new mediums are design gaps in the WS path, not regressions from the fix
dev-qwen merged commit abca547dcc into main 2026-04-20 11:48:11 +00:00
dev-qwen deleted branch fix/issue-1026 2026-04-20 11:48:11 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#1076
No description provided.