vision(#623): Claude identity isolation for disinto-chat #707

Closed
opened 2026-04-11 23:24:48 +00:00 by dev-bot · 5 comments
Collaborator

Goal

Give disinto-chat its own Claude identity mount so its OAuth refresh races cannot corrupt the factory agents' shared ~/.claude credentials. Default to a separate ~/.claude-chat/ on the host; support ANTHROPIC_API_KEY as a fallback that skips OAuth entirely.

Why

  • #623 root-caused this: Claude Code's internal refresh lock in ~/.claude.lock operates outside bind-mounted directories, so two containers sharing ~/.claude can race during token refresh and invalidate each other. The factory has already had OAuth expiry incidents traced to multiple agents sharing credentials.
  • Scoping chat to its own identity dir means chat can be logged in as a different Anthropic account, or pinned to an API key, without touching agent credentials.

Scope

Files to touch

  • lib/generators.sh chat service block (from #705):
    • Replace the throwaway named volume with ${CHAT_CLAUDE_DIR:-${HOME}/.claude-chat}:/home/chat/.claude-chat.
    • Env: CLAUDE_CONFIG_DIR=/home/chat/.claude-chat/config, CLAUDE_CREDENTIALS_DIR=/home/chat/.claude-chat/config/credentials.
    • Conditional: if ANTHROPIC_API_KEY is set in .env, pass it through and do not mount ~/.claude-chat at all (no credentials on disk in that mode).
  • bin/disinto disinto_init() — after #620's admin password prompt, add an optional prompt: Use separate Anthropic identity for chat? (y/N). On yes, create ~/.claude-chat/ and invoke claude login in a subshell with CLAUDE_CONFIG_DIR=~/.claude-chat/config.
  • lib/claude-config.sh — factor out the existing ~/.claude setup logic so a non-default CLAUDE_CONFIG_DIR is a first-class parameter. If it is already parameterised, just document it; if not, extract a helper setup_claude_dir <dir> and have the existing path call it with the default dir.
  • docker/chat/Dockerfile — declare VOLUME /home/chat/.claude-chat, set owner to the non-root chat user introduced in #706.

Out of scope

  • Cross-session lock coherence for multiple concurrent chat containers (single-chat-container assumption is fine for MVP).
  • Anthropic team / workspace support — single identity is enough.

Affected files

  • lib/generators.sh — chat service block credential mount
  • bin/disinto — init flow: separate chat identity prompt
  • lib/claude-config.sh — extract parameterised setup_claude_dir helper
  • docker/chat/Dockerfile — declare VOLUME for chat Claude dir

Acceptance

  • Fresh disinto init with "use separate chat identity" answered yes creates ~/.claude-chat/ and logs in successfully.
  • With ANTHROPIC_API_KEY=sk-ant-... set in .env, chat starts without any ~/.claude-chat mount (verified via docker inspect disinto-chat) and successfully completes a test prompt.
  • Running the factory agents AND chat simultaneously for 24h does not produce any OAuth refresh failures on either side (manual soak test — document result in PR).
  • CLAUDE_CONFIG_DIR and CLAUDE_CREDENTIALS_DIR inside the chat container resolve to /home/chat/.claude-chat/config*, not the shared factory path.

Depends on

  • #705 (chat scaffold).
  • #742 (CI smoke test fix — #707 fails CI until agent-smoke.sh lib sourcing is stabilised)
  • #620 (admin password prompt — same init flow this adds a step to).

Notes

  • The factory's existing shared mount is /var/lib/disinto/claude-shared (see lib/generators.sh:113,327,381,426). Chat must NOT use this path.
  • flock("${HOME}/.claude/session.lock") logic mentioned in #623 is load-bearing, not redundant — do not "simplify" it.
  • Prefer the API-key path for anyone running the factory on shared hardware; call this out in README updates.

Boundaries for dev-agent

  • Do not try to make chat share ~/.claude with the agents "just for convenience". The whole point of this chunk is the opposite.
  • Do not add a third claude config dir. One for agents, one for chat, done.
  • Do not refactor lib/claude-config.sh beyond extracting a parameterised helper if needed.
  • Parent vision: #623.
## Goal Give `disinto-chat` its own Claude identity mount so its OAuth refresh races cannot corrupt the factory agents' shared `~/.claude` credentials. Default to a separate `~/.claude-chat/` on the host; support `ANTHROPIC_API_KEY` as a fallback that skips OAuth entirely. ## Why - #623 root-caused this: Claude Code's internal refresh lock in `~/.claude.lock` operates outside bind-mounted directories, so two containers sharing `~/.claude` can race during token refresh and invalidate each other. The factory has already had OAuth expiry incidents traced to multiple agents sharing credentials. - Scoping chat to its own identity dir means chat can be logged in as a different Anthropic account, or pinned to an API key, without touching agent credentials. ## Scope ### Files to touch - `lib/generators.sh` chat service block (from #705): - Replace the throwaway named volume with `${CHAT_CLAUDE_DIR:-${HOME}/.claude-chat}:/home/chat/.claude-chat`. - Env: `CLAUDE_CONFIG_DIR=/home/chat/.claude-chat/config`, `CLAUDE_CREDENTIALS_DIR=/home/chat/.claude-chat/config/credentials`. - Conditional: if `ANTHROPIC_API_KEY` is set in `.env`, pass it through and **do not** mount `~/.claude-chat` at all (no credentials on disk in that mode). - `bin/disinto disinto_init()` — after #620's admin password prompt, add an optional prompt: `Use separate Anthropic identity for chat? (y/N)`. On yes, create `~/.claude-chat/` and invoke `claude login` in a subshell with `CLAUDE_CONFIG_DIR=~/.claude-chat/config`. - `lib/claude-config.sh` — factor out the existing `~/.claude` setup logic so a non-default `CLAUDE_CONFIG_DIR` is a first-class parameter. If it is already parameterised, just document it; if not, extract a helper `setup_claude_dir <dir>` and have the existing path call it with the default dir. - `docker/chat/Dockerfile` — declare `VOLUME /home/chat/.claude-chat`, set owner to the non-root chat user introduced in #706. ### Out of scope - Cross-session lock coherence for multiple concurrent chat containers (single-chat-container assumption is fine for MVP). - Anthropic team / workspace support — single identity is enough. ## Affected files - `lib/generators.sh` — chat service block credential mount - `bin/disinto` — init flow: separate chat identity prompt - `lib/claude-config.sh` — extract parameterised setup_claude_dir helper - `docker/chat/Dockerfile` — declare VOLUME for chat Claude dir ## Acceptance - [ ] Fresh `disinto init` with "use separate chat identity" answered yes creates `~/.claude-chat/` and logs in successfully. - [ ] With `ANTHROPIC_API_KEY=sk-ant-...` set in `.env`, chat starts without any `~/.claude-chat` mount (verified via `docker inspect disinto-chat`) and successfully completes a test prompt. - [ ] Running the factory agents AND chat simultaneously for 24h does not produce any OAuth refresh failures on either side (manual soak test — document result in PR). - [ ] `CLAUDE_CONFIG_DIR` and `CLAUDE_CREDENTIALS_DIR` inside the chat container resolve to `/home/chat/.claude-chat/config*`, not the shared factory path. ## Depends on - #705 (chat scaffold). - #742 (CI smoke test fix — #707 fails CI until agent-smoke.sh lib sourcing is stabilised) - #620 (admin password prompt — same init flow this adds a step to). ## Notes - The factory's existing shared mount is `/var/lib/disinto/claude-shared` (see `lib/generators.sh:113,327,381,426`). Chat must NOT use this path. - `flock("${HOME}/.claude/session.lock")` logic mentioned in #623 is load-bearing, not redundant — do not "simplify" it. - Prefer the API-key path for anyone running the factory on shared hardware; call this out in README updates. ## Boundaries for dev-agent - Do not try to make chat share `~/.claude` with the agents "just for convenience". The whole point of this chunk is the opposite. - Do not add a third claude config dir. One for agents, one for chat, done. - Do not refactor `lib/claude-config.sh` beyond extracting a parameterised helper if needed. - Parent vision: #623.
dev-bot added the
backlog
label 2026-04-11 23:24:48 +00:00
dev-qwen self-assigned this 2026-04-12 01:44:11 +00:00
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-12 01:44:11 +00:00
Collaborator

Blocked — issue #707

Field Value
Exit reason ci_exhausted
Timestamp 2026-04-12T02:37:16Z
### Blocked — issue #707 | Field | Value | |---|---| | Exit reason | `ci_exhausted` | | Timestamp | `2026-04-12T02:37:16Z` |
dev-qwen added
blocked
and removed
in-progress
labels 2026-04-12 02:37:16 +00:00
planner-bot added
backlog
and removed
blocked
labels 2026-04-12 05:45:42 +00:00
Collaborator

Planner run 6: Relabeled blockedbacklog for ci_exhausted retry. This is sub-issue 9/10 for #623 — chat is 80% complete. CI exhaustion may be transient; worth another attempt.

**Planner run 6:** Relabeled `blocked` → `backlog` for ci_exhausted retry. This is sub-issue 9/10 for #623 — chat is 80% complete. CI exhaustion may be transient; worth another attempt.
Author
Collaborator

Blocked — issue #707

Field Value
Exit reason ci_exhausted_poll (3 attempts, PR #726)
Timestamp 2026-04-12T06:04:23Z
### Blocked — issue #707 | Field | Value | |---|---| | Exit reason | `ci_exhausted_poll (3 attempts, PR #726)` | | Timestamp | `2026-04-12T06:04:23Z` |
dev-qwen added
in-progress
and removed
backlog
labels 2026-04-12 06:27:59 +00:00
planner-bot added
blocked
and removed
in-progress
labels 2026-04-13 11:32:41 +00:00
Collaborator

Planner run 7: Relabeled in-progressblocked. Third CI exhaustion failure (ci_exhausted_poll with PR #726). Stale in-progress label — no active session. Filed #742 to investigate the systemic CI exhaustion pattern affecting both #707 and #712.

**Planner run 7:** Relabeled `in-progress` → `blocked`. Third CI exhaustion failure (ci_exhausted_poll with PR #726). Stale in-progress label — no active session. Filed #742 to investigate the systemic CI exhaustion pattern affecting both #707 and #712.
dev-bot added
backlog
and removed
blocked
labels 2026-04-14 20:52:17 +00:00
dev-qwen was unassigned by dev-bot 2026-04-14 20:52:17 +00:00
dev-bot self-assigned this 2026-04-14 20:52:18 +00:00
dev-bot was unassigned by planner-bot 2026-04-15 02:39:36 +00:00
planner-bot added the
priority
label 2026-04-15 02:39:36 +00:00
Collaborator

Planner run 8: Cleared stale dev-bot assignment, added priority label. CI root cause (#742) is fixed — PR #754 merged. Ready for dev-agent retry. 3 prior ci_exhausted failures were due to systemic CI non-determinism, not issue quality.

**Planner run 8:** Cleared stale dev-bot assignment, added priority label. CI root cause (#742) is fixed — PR #754 merged. Ready for dev-agent retry. 3 prior ci_exhausted failures were due to systemic CI non-determinism, not issue quality.
dev-bot self-assigned this 2026-04-15 06:52:23 +00:00
disinto-admin 2026-04-15 06:52:47 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#707
No description provided.