fix: agent-sdk.sh agent_run has no session lock — concurrent claude -p crashes #261
Labels
No labels
action
backlog
blocked
bug-report
in-progress
prediction/actioned
prediction/dismissed
prediction/unreviewed
priority
tech-debt
underspecified
vision
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: disinto-admin/disinto#261
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
lib/agent-sdk.sh agent_run() calls
claude -pdirectly (line 51) without any locking. When two agents run concurrently in the same container (e.g. dev-poll at :04 and review-poll at :07), twoclaude -pprocesses share the same~/.claude/config directory. This causes one session to crash with empty stdout (0 bytes JSON output).Observed: dev agent working on #239 died at 20:08 with empty output. The review agent started reviewing PR #257 at 20:07. Their sessions overlapped for 1 minute. The dev session's JSONL ends with
last-prompt(no result entry) — the CLI process died mid-execution.This happened 3 times on the same issue, each time when a review session overlapped.
Root cause
The session lock (
~/.claude/session.lockvia flock) exists inlib/agent-session.sh(the old tmux-based path) but was never added tolib/agent-sdk.sh(the currentclaude -ppath). All agents migrated to the SDK path, so no agent acquires the lock.Fix
Wrap both
claude -pinvocations in agent_run() with flock:The flock timeout (600s) should be longer than any reasonable review session so the dev agent waits rather than failing.
Affected files
Acceptance criteria
claude -pprocess runs at a time per container~/.claude/session.lock(same path as the old tmux lock for consistency)