## Summary
Adds `docs/CLAUDE-AUTH-CONCURRENCY.md` documenting why the external `flock` on `${HOME}/.claude/session.lock` in `lib/agent-sdk.sh` is load-bearing rather than belt-and-suspenders, and provides a decision matrix for adding new containers that run Claude Code.
Pure docs change. No code touched.
## Why
The factory runs N+1 concurrent Claude Code processes across containers (`disinto-agents` plus every transient container spawned by `docker/edge/dispatcher.sh`), all sharing `~/.claude` via bind mount. The historical "agents losing auth, frequent re-logins" issue that motivated the original `session.lock` flock is the OAuth refresh race — and the flock is the only thing currently protecting against it.
A reasonable assumption when looking at Claude Code is that its internal `proper-lockfile.lock(claudeDir)` (in `src/utils/auth.ts:1491` of the leaked TS source) handles the refresh race, making the external flock redundant. **It does not**, in our specific bind-mount layout. Empirically verified:
- `proper-lockfile` defaults to `<target>.lock` as a sibling file when no `lockfilePath` is given
- For `claudeDir = /home/agent/.claude`, the lock lands at `/home/agent/.claude.lock`
- `/home/agent/` is **not** bind-mounted in our setup — it is the container's local overlay filesystem
- Each container creates its own private `.claude.lock`, none shared
- Cross-container OAuth refresh race is therefore unprotected by Claude Code's internal lock
The external flock works because the lock file path `${HOME}/.claude/session.lock` is **inside** the bind-mounted directory, so all containers see the same inode.
This came up during design discussion of the chat container in #623, where the temptation was to mount the existing `~/.claude` and skip the external flock for interactive responsiveness. The doc captures the analysis so future implementers don't take that shortcut.
## Changes
- New file: `docs/CLAUDE-AUTH-CONCURRENCY.md` (~135 lines): rationale, empirical evidence, decision matrix for new containers, pointer to the upstream fix
- `lib/AGENTS.md`: one-line **Concurrency** addendum to the `lib/agent-sdk.sh` row pointing at the new doc
## Test plan
- [ ] Markdown renders correctly in Forgejo
- [ ] Relative link from `lib/AGENTS.md` to `docs/CLAUDE-AUTH-CONCURRENCY.md` resolves (`../docs/CLAUDE-AUTH-CONCURRENCY.md`)
- [ ] Code references in the doc still match the current state of `lib/agent-sdk.sh:139,144` and `docker/agents/entrypoint.sh:119-125`
## Refs
- #623 — chat container, the issue this analysis was driven by; #623 has a comment with the same analysis pointing back here once merged
Co-authored-by: Claude <noreply@anthropic.com>
Reviewed-on: #637
Co-authored-by: dev-bot <dev-bot@disinto.local>
Co-committed-by: dev-bot <dev-bot@disinto.local>
Extract branch-wait retry logic into _bp_wait_for_branch helper with
exponential backoff (10 attempts, 2s base, capped at 10s per wait,
~70s worst-case). Replaces the 3-attempt/2s-fixed loops in all three
setup functions. Upgrade caller warnings in bin/disinto to ERROR.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix off-by-one in mock admin/users/{username}/repos path extraction
(parts[4] was 'users', not the username — should be parts[5])
- Change _install_cron_impl to return 1 instead of exit 1 when crontab
is missing, so cron failure doesn't abort entire init
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This fixes the issue where agents-llama containers were using the main
FORGE_TOKEN (dev-bot) instead of dedicated credentials for the llama bot user.
Changes:
- forge-setup.sh: Added generation of FORGE_TOKEN_LLAMA and FORGE_PASS_LLAMA
for local-model bot users (dev-qwen, dev-qwen-nightly). These are created
as Forgejo users with their own API tokens and passwords for git push.
- generators.sh: Updated agents-llama service to use FORGE_TOKEN_LLAMA and
FORGE_PASS_LLAMA instead of falling back to dev-bot's credentials.
Fixed escaping to defer variable resolution to docker-compose runtime.
- docker-compose.yml: Updated to use FORGE_TOKEN_LLAMA and FORGE_PASS_LLAMA
(renamed from FORGE_TOKEN_DEVQWEN for consistency).
- .env.example: Added documentation for all per-bot tokens and passwords.
- projects/disinto.toml.example: Documented the auto-credential generation.
When a project TOML configures [agents.llama] with forge_user = dev-qwen:
1. disinto init creates the dev-qwen Forgejo user
2. Generates FORGE_TOKEN_LLAMA and FORGE_PASS_LLAMA
3. Adds dev-qwen as write collaborator on the project repo
4. The agents-llama container uses these credentials for all Forgejo API calls
This ensures issues and PRs created by the llama agent are correctly
attributed to dev-qwen instead of dev-bot.
formulas/release.sh still uses API tokens for mirror pushes. Add mounts
alongside secrets rather than replacing them, so both the .sh (token) and
.toml (SSH) formula paths work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The placeholder substitution ran before local-model services were appended,
leaving a literal CLAUDE_BIN_PLACEHOLDER in the compose file. Reorder so
the sed pass runs after all services (including local-model) are in place.
Also adds the `g` flag to the sed substitution so it replaces all occurrences,
and aligns the .ssh volume mount escaping with the other ${HOME} references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--poll-interval was incorrectly written as compact_pct in the project TOML,
misconfiguring CLAUDE_AUTOCOMPACT_PCT_OVERRIDE instead of polling behavior.
Now compact_pct is hardcoded to 60 (the correct default) and poll_interval
is a separate TOML field emitted as POLL_INTERVAL in the compose service.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 60-second grace period to stale in-progress detection in dev-poll.sh.
When a poller sees an in-progress issue with no assignee/PR/lock, it now
checks the timeline API for when the label was added. If <60s ago, it
skips stale detection to allow the claiming agent time to finish its
assign+label sequence.
Also documents the intentional assign-before-label ordering in
issue_claim() that minimizes the race window.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>