feat: drop chat rate-limiting — remove per-user hour/day request caps and token cap (reverts #711) #1084

New issue

Closed

opened 2026-04-20 15:18:27 +00:00 by disinto-admin · 0 comments

disinto-admin commented

2026-04-20 15:18:27 +00:00

Owner

Goal

Remove all rate-limiting logic from the chat server. Reverts the hour/day request caps and the daily token cap added in #711.

Rationale (from operator): chat is single-operator, not a public service. The rate limits were added defensively in case of prompt-injection leading to Claude loops, but that concern is better addressed by simply ending the chat session. The rate-limit code adds complexity and false failure modes (e.g. overnight reflection loops getting capped) without catching the real risks.

The change

In `docker/chat/server.py`

Delete:

_check_rate_limit(user) (around line 220)
_record_request(user) (around line 262)
_record_tokens(user, tokens) (around line 267)
Any module-level state these functions use (look for dicts keyed by user storing timestamps / counters)
Call sites in handle_chat(user) — search for _check_rate_limit and _record_request / _record_tokens; delete the conditional blocks and the increment calls

Keep the existing _parse_stream_json and conversation-history logic — those are independent.

In `docker-compose.yml`

Delete the three env vars from the chat: service (or edge: if #NNNN "chat-in-edge" has already landed):

CHAT_MAX_REQUESTS_PER_HOUR
CHAT_MAX_REQUESTS_PER_DAY
CHAT_MAX_TOKENS_PER_DAY

In `.env.example` (if present)

Remove the same three entries.

Acceptance criteria

grep -r 'CHAT_MAX_' docker/chat/ docker-compose.yml .env.example 2>/dev/null returns no matches
grep -r '_check_rate_limit\|_record_request\|_record_tokens' docker/chat/ returns no matches
Chat responds to POST /chat without rate-limit failure paths regardless of request volume (smoke test: send 100 messages in a tight loop, all return 200 or a Claude error, none return 429)
docker/chat/server.py has no references to per-user rate-limit state
shellcheck clean on any shell touched (none expected — this is Python + YAML only)
No broken imports in server.py after the deletion (e.g. time may be unused if only rate-limit code used it — verify)

#711 — original rate-limit implementation. Close as "superseded by #NNNN" once this lands.
Companion issue: move chat into edge container (#NNNN). Both simplifications from the same architectural pass.

## Goal Remove all rate-limiting logic from the chat server. Reverts the hour/day request caps and the daily token cap added in #711. **Rationale (from operator):** chat is single-operator, not a public service. The rate limits were added defensively in case of prompt-injection leading to Claude loops, but that concern is better addressed by simply ending the chat session. The rate-limit code adds complexity and false failure modes (e.g. overnight reflection loops getting capped) without catching the real risks. ## The change ### In `docker/chat/server.py` Delete: - `_check_rate_limit(user)` (around line 220) - `_record_request(user)` (around line 262) - `_record_tokens(user, tokens)` (around line 267) - Any module-level state these functions use (look for dicts keyed by user storing timestamps / counters) - Call sites in `handle_chat(user)` — search for `_check_rate_limit` and `_record_request` / `_record_tokens`; delete the conditional blocks and the increment calls Keep the existing `_parse_stream_json` and conversation-history logic — those are independent. ### In `docker-compose.yml` Delete the three env vars from the `chat:` service (or `edge:` if #NNNN "chat-in-edge" has already landed): - `CHAT_MAX_REQUESTS_PER_HOUR` - `CHAT_MAX_REQUESTS_PER_DAY` - `CHAT_MAX_TOKENS_PER_DAY` ### In `.env.example` (if present) Remove the same three entries. ## Acceptance criteria - [ ] `grep -r 'CHAT_MAX_' docker/chat/ docker-compose.yml .env.example 2>/dev/null` returns no matches - [ ] `grep -r '_check_rate_limit\|_record_request\|_record_tokens' docker/chat/` returns no matches - [ ] Chat responds to POST `/chat` without rate-limit failure paths regardless of request volume (smoke test: send 100 messages in a tight loop, all return 200 or a Claude error, none return 429) - [ ] `docker/chat/server.py` has no references to per-user rate-limit state - [ ] `shellcheck` clean on any shell touched (none expected — this is Python + YAML only) - [ ] No broken imports in `server.py` after the deletion (e.g. `time` may be unused if only rate-limit code used it — verify) ## Related - #711 — original rate-limit implementation. Close as "superseded by #NNNN" once this lands. - Companion issue: move chat into edge container (#NNNN). Both simplifications from the same architectural pass.