docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
# Claude Code OAuth Concurrency Model
|
|
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
## Problem statement
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
The factory runs multiple concurrent Claude Code processes across
|
|
|
|
|
containers. OAuth access tokens are short-lived; refresh tokens rotate
|
|
|
|
|
on each use. If two processes POST the same refresh token to Anthropic's
|
|
|
|
|
token endpoint simultaneously, only one wins — the other gets
|
|
|
|
|
`invalid_grant` and the operator is forced to re-login.
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
Claude Code already serializes OAuth refreshes internally using
|
|
|
|
|
`proper-lockfile` (`src/utils/auth.ts:1485-1491`):
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
```typescript
|
|
|
|
|
release = await lockfile.lock(claudeDir)
|
|
|
|
|
```
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
`proper-lockfile` creates a lockfile via an atomic `mkdir(${path}.lock)`
|
|
|
|
|
call — a cross-process primitive that works across any number of
|
|
|
|
|
processes on the same filesystem. The problem was never the lock
|
|
|
|
|
implementation; it was that our old per-container bind-mount layout
|
|
|
|
|
(`~/.claude` mounted but `/home/agent/` container-local) caused each
|
|
|
|
|
container to compute a different lockfile path, so the locks never
|
|
|
|
|
coordinated.
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
## The fix: shared `CLAUDE_CONFIG_DIR`
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
`CLAUDE_CONFIG_DIR` is an officially supported env var in Claude Code
|
|
|
|
|
(`src/utils/envUtils.ts`). It controls where Claude resolves its config
|
|
|
|
|
directory instead of the default `~/.claude`.
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
By setting `CLAUDE_CONFIG_DIR` to a path on a shared bind mount, every
|
|
|
|
|
container computes the **same** lockfile location. `proper-lockfile`'s
|
|
|
|
|
atomic `mkdir(${CLAUDE_CONFIG_DIR}.lock)` then gives free cross-container
|
|
|
|
|
serialization — no external wrapper needed.
|
|
|
|
|
|
|
|
|
|
## Current layout
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
```
|
|
|
|
|
Host filesystem:
|
|
|
|
|
/var/lib/disinto/claude-shared/ ← CLAUDE_SHARED_DIR
|
|
|
|
|
└── config/ ← CLAUDE_CONFIG_DIR
|
|
|
|
|
├── credentials.json
|
|
|
|
|
├── settings.json
|
|
|
|
|
└── ...
|
|
|
|
|
|
|
|
|
|
Inside every container:
|
|
|
|
|
Same absolute path: /var/lib/disinto/claude-shared/config
|
|
|
|
|
Env: CLAUDE_CONFIG_DIR=/var/lib/disinto/claude-shared/config
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
```
|
|
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
The shared directory is mounted at the **same absolute path** inside
|
|
|
|
|
every container, so `proper-lockfile` resolves an identical lock path
|
|
|
|
|
everywhere.
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
### Where these values are defined
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
| What | Where |
|
|
|
|
|
|------|-------|
|
|
|
|
|
| Defaults for `CLAUDE_SHARED_DIR`, `CLAUDE_CONFIG_DIR` | `lib/env.sh:138-140` |
|
|
|
|
|
| `.env` documentation | `.env.example:92-99` |
|
|
|
|
|
| Container mounts + env passthrough (edge dispatcher) | `docker/edge/dispatcher.sh:446-448` (and analogous blocks for reproduce, triage, verify) |
|
|
|
|
|
| Auth detection using `CLAUDE_CONFIG_DIR` | `docker/agents/entrypoint.sh:101-102` |
|
|
|
|
|
| Bootstrap / migration during `disinto init` | `lib/claude-config.sh:setup_claude_config_dir()`, `bin/disinto:952-962` |
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
## Migration for existing dev boxes
|
|
|
|
|
|
|
|
|
|
For operators upgrading from the old `~/.claude` bind-mount layout,
|
|
|
|
|
`disinto init` handles the migration interactively (or with `--yes`).
|
|
|
|
|
The manual equivalent is:
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
```bash
|
|
|
|
|
# 1. Stop the factory
|
|
|
|
|
disinto down
|
|
|
|
|
|
|
|
|
|
# 2. Create the shared directory
|
|
|
|
|
mkdir -p /var/lib/disinto/claude-shared
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
# 3. Move existing config
|
|
|
|
|
mv "$HOME/.claude" /var/lib/disinto/claude-shared/config
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
# 4. Create a back-compat symlink so host-side claude still works
|
|
|
|
|
ln -sfn /var/lib/disinto/claude-shared/config "$HOME/.claude"
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
# 5. Export the env var (add to shell rc for persistence)
|
|
|
|
|
export CLAUDE_CONFIG_DIR=/var/lib/disinto/claude-shared/config
|
|
|
|
|
|
|
|
|
|
# 6. Start the factory
|
|
|
|
|
disinto up
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
```
|
|
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
## Verification
|
|
|
|
|
|
|
|
|
|
Watch for these analytics events during concurrent agent runs:
|
|
|
|
|
|
|
|
|
|
| Event | Meaning |
|
|
|
|
|
|-------|---------|
|
|
|
|
|
| `tengu_oauth_token_refresh_lock_acquiring` | A process is attempting to acquire the refresh lock |
|
|
|
|
|
| `tengu_oauth_token_refresh_lock_acquired` | Lock acquired; refresh proceeding |
|
|
|
|
|
| `tengu_oauth_token_refresh_lock_retry` | Lock is held by another process; retrying |
|
|
|
|
|
| `tengu_oauth_token_refresh_lock_race_resolved` | Contention detected and resolved normally |
|
|
|
|
|
| `tengu_oauth_token_refresh_lock_retry_limit_reached` | Lock acquisition failed after all retries |
|
|
|
|
|
|
|
|
|
|
**Healthy:** `_race_resolved` appearing during contention windows — this
|
|
|
|
|
means multiple processes tried to refresh simultaneously and the lock
|
|
|
|
|
correctly serialized them.
|
|
|
|
|
|
|
|
|
|
**Bad:** `_lock_retry_limit_reached` — indicates the lock is stuck or
|
|
|
|
|
the shared mount is not working. Verify that `CLAUDE_CONFIG_DIR` resolves
|
|
|
|
|
to the same path in all containers and that the filesystem supports
|
|
|
|
|
`mkdir` atomicity (any POSIX filesystem does).
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
## The deferred external `flock` wrapper
|
|
|
|
|
|
|
|
|
|
`lib/agent-sdk.sh:139,144` still wraps every `claude` invocation in an
|
|
|
|
|
external `flock` on `${HOME}/.claude/session.lock`:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
local lock_file="${HOME}/.claude/session.lock"
|
|
|
|
|
...
|
|
|
|
|
output=$(cd "$run_dir" && ( flock -w 600 9 || exit 1;
|
|
|
|
|
claude_run_with_watchdog claude "${args[@]}" ) 9>"$lock_file" ...)
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
```
|
|
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
With the `CLAUDE_CONFIG_DIR` fix in place, this external lock is
|
|
|
|
|
**redundant but harmless** — `proper-lockfile` serializes the refresh
|
|
|
|
|
internally, and `flock` serializes the entire invocation externally.
|
|
|
|
|
The external flock remains as a defense-in-depth measure; removal is
|
|
|
|
|
tracked as a separate vision-tier issue.
|
docs: document Claude Code OAuth concurrency model and external flock rationale (#637)
## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: http://forgejo:3000/disinto-admin/disinto/pulls/637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
2026-04-10 18:01:18 +00:00
|
|
|
|
|
|
|
|
## See also
|
|
|
|
|
|
2026-04-10 21:16:18 +00:00
|
|
|
- `lib/env.sh:138-140` — `CLAUDE_SHARED_DIR` / `CLAUDE_CONFIG_DIR` defaults
|
|
|
|
|
- `lib/claude-config.sh` — migration helper used by `disinto init`
|
|
|
|
|
- `lib/agent-sdk.sh:139,144` — the external `flock` wrapper (deferred removal)
|
|
|
|
|
- `docker/agents/entrypoint.sh:101-102` — `CLAUDE_CONFIG_DIR` auth detection
|
|
|
|
|
- `.env.example:92-99` — operator-facing documentation of the env vars
|
|
|
|
|
- Issue #623 — chat container auth strategy
|