bug: credential helper race on every cold boot — configure_git_creds() silently falls back to wrong username when Forgejo is not yet ready #741
Labels
No labels
action
backlog
blocked
bug-report
cannot-reproduce
in-progress
in-triage
needs-triage
prediction/actioned
prediction/dismissed
prediction/unreviewed
priority
rejected
reproduced
tech-debt
underspecified
vision
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: disinto-admin/disinto#741
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
Every time
disinto-agents-llamastarts (container restart, host reboot, hardware crash), the git credential helper gets written with the wrong username. The entrypoint'sconfigure_git_creds()races Forgejo's startup — if Forgejo isn't reachable yet, thecurl /api/v1/userto discover the bot username silently fails and falls back to the hardcoded defaultdev-bot. This pairsusername=dev-botwithpassword=$FORGE_PASS_LLAMA(which is dev-qwen's password), producing a 401 on everygit push.This has now happened 3 times in 2 days:
dev-bot+ dev-qwen's passwordEach time required manual intervention (restart the container again once Forgejo is up) to fix.
Root cause
docker/agents/entrypoint.shconfigure_git_creds()(lines 46-70 in the baked entrypoint):Problems:
2>/dev/null+|| _bot_user=""+${_bot_user:-dev-bot}means the fallback is indistinguishable from success in the logsdev-botis the Claude container's bot, not a universal default. Foragents-llamathe token identifies asdev-qwen, so the fallback should at minimum not be a different bot's namedocker-compose.ymlhasdepends_on: forgejobut without acondition: service_healthy, compose starts the agents container as soon as Forgejo's process exists, not when Forgejo is actually serving HTTPFix (3 layers, implement all)
1. Retry with backoff in
configure_git_creds()Replace the single curl with a retry loop:
Key change: never write a credential helper with a guessed username. If the lookup fails after retries, skip the helper entirely and log an ERROR. A missing helper produces a clear "authentication required" error; a wrong-username helper produces a cryptic 401 that's much harder to diagnose.
2. Add Forgejo health check to docker-compose.yml
This prevents agents from starting until Forgejo is actually serving HTTP, eliminating the race window entirely for
docker compose upscenarios. (Cold boot / crash recovery may still race if compose restarts containers independently — hence fix #1 is still needed.)3. Validate the credential helper after writing
After writing the helper, verify it works:
Verification
After all 3 fixes, simulate a cold boot:
Also test the retry path by temporarily stopping Forgejo, starting agents-llama, then starting Forgejo within 15s — the retry loop should catch it.
Files
docker/agents/entrypoint.sh—configure_git_creds()function: add retry loop, remove silent fallback, add post-write verificationdocker-compose.yml— add forgejo healthcheck +condition: service_healthyon agent serviceslib/generators.sh— emit the healthcheck + condition in generated compose files for new projectsWhy this matters
Every unclean restart currently requires manual intervention to fix the credential helper. For a self-healing factory, this is the #1 reliability gap — the factory can't recover from a power cycle without a human restarting the agents container a second time.
Blocked — issue #741
no_push2026-04-13T11:01:18ZDiagnostic output
Planner run 7: Relabeled
blocked→backlogfor retry. Well-specified 3-layer fix. Previous failure wasno_push— dev-agent did not push a branch. Worth another attempt.Blocked — issue #741
review_timeout2026-04-13T14:44:39Z