bug: generator emits invalid env var name FORGE_BOT_USER_<service>^^ when service name contains hyphen #852

Open
opened 2026-04-16 10:13:52 +00:00 by dev-bot · 0 comments
Collaborator

Problem

A hyphen in an [agents.X] TOML section name produces invalid shell identifiers in two places — one silently drops the env var, the other crashes the container.

Call site 1: lib/generators.sh (silent drop)

FORGE_BOT_USER_${service_name^^}: "${forge_user}"

For section [agents.dev-qwen2] this emits FORGE_BOT_USER_DEV-QWEN2 — invalid shell identifier. Docker Compose silently drops it; the container never sees the var.

Call site 2: lib/load-project.sh (crashes dev-poll)

while IFS='=' read -r _key _val; do
  [ -z "$_key" ] && continue
  export "$_key=$_val"
done <<< "$_AGENT_VARS"

_AGENT_VARS is Python-generated and uses f"AGENT_{name.upper()}_BASE_URL={...}" from tomllib — so a section [agents.dev-qwen2] produces AGENT_DEV-QWEN2_BASE_URL=.... Bash rejects the export with not a valid identifier. Because entrypoint runs under set -euo pipefail, the non-zero return propagates up, wait "${FAST_PIDS[@]}" fails, entrypoint exits, and the container enters a crash-loop (iteration 1 → exit → restart → iteration 1 → ...).

Confirmed today on disinto-agents-llama after [agents.dev-qwen2] landed in the project TOML — container restarted every ~3 seconds for several minutes; commenting out the block restored stability.

Severity

Not cosmetic — a single hired agent with a hyphen in its section name can take down every other TOML-driven sidecar that shares the projects/*.toml. dev-qwen2 breaking dev-qwen today is exactly that.

Repro

  1. disinto hire-an-agent dev-qwen2 dev --local-model <url> --model <name> (TOML block added: [agents.dev-qwen2]).
  2. grep FORGE_BOT_USER_ /home/johba/disinto/docker-compose.yml → observe FORGE_BOT_USER_DEV-QWEN2 (invalid identifier).
  3. docker exec disinto-agents-dev-qwen2 env | grep FORGE_BOT_USER → empty.
  4. On any TOML-driven sidecar that reads the same projects/disinto.toml (e.g. disinto-agents-llama with #855 fix applied), docker logs shows:
    /home/agent/repos/_factory/lib/load-project.sh: line 151: export: `AGENT_DEV-QWEN2_BASE_URL=...': not a valid identifier
    
    followed by container exit and restart.

Fix

Normalize the section name the same way in both emitters. The canonical rule (already used for FORGE_TOKEN_<USER> derivation, see #847):

# dashes → underscores, uppercase
safe_upper=$(echo "$service_name" | tr '[:lower:]-' '[:upper:]_')

Apply at:

  • lib/generators.sh_generate_local_model_services, replace ${service_name^^} with ${safe_upper} where it flows into env var names (FORGE_BOT_USER_*, any other FORGE_*_<NAME> keys).
  • lib/load-project.sh — the Python block generating AGENT_<NAME>_* lines: replace name.upper() with name.upper().replace("-", "_") so the emitted keys are always valid shell identifiers.

Secondary (defensive): the export loop in load-project.sh should tolerate or pre-validate invalid identifiers rather than letting them tank set -e. Even after fixing the emission, a hand-edited TOML could reintroduce the issue; fail gracefully with a warning instead of crashing.

Acceptance

  • Section [agents.dev-qwen2] produces env key FORGE_BOT_USER_DEV_QWEN2 (underscore) in generated compose.
  • load-project.sh on the same TOML emits AGENT_DEV_QWEN2_BASE_URL=... (underscore); export succeeds.
  • No not a valid identifier errors in docker logs disinto-agents-* for any hired agent with hyphens.
  • Existing single-word section names (e.g. llama) unchanged.
  • Load-project tolerates (warn-and-skip) invalid identifiers instead of crashing, as a defence against hand-edits.

Affected files

  • lib/generators.sh_generate_local_model_services
  • lib/load-project.sh — Python AGENT_* emission + shell export loop

Context

Caught while bringing up dev-qwen2 as a second parallel llama dev agent. Originally thought cosmetic (compose drops the bad var), but the load-project.sh call site turns it into a hard crash that took dev-qwen offline. See #855 for the adjacent FACTORY_REPO/volume-mount gap that exposed this bug (before #855, the poll loop never ran, so load-project was never called, so the crash stayed hidden).

Related: #847 (env-var naming rule origin), #855 (silent-zombie mode that hid this bug).

## Problem A hyphen in an `[agents.X]` TOML section name produces invalid shell identifiers in two places — one silently drops the env var, the other crashes the container. ### Call site 1: `lib/generators.sh` (silent drop) ```yaml FORGE_BOT_USER_${service_name^^}: "${forge_user}" ``` For section `[agents.dev-qwen2]` this emits `FORGE_BOT_USER_DEV-QWEN2` — invalid shell identifier. Docker Compose silently drops it; the container never sees the var. ### Call site 2: `lib/load-project.sh` (**crashes dev-poll**) ```bash while IFS='=' read -r _key _val; do [ -z "$_key" ] && continue export "$_key=$_val" done <<< "$_AGENT_VARS" ``` `_AGENT_VARS` is Python-generated and uses `f"AGENT_{name.upper()}_BASE_URL={...}"` from `tomllib` — so a section `[agents.dev-qwen2]` produces `AGENT_DEV-QWEN2_BASE_URL=...`. Bash rejects the `export` with `not a valid identifier`. Because entrypoint runs under `set -euo pipefail`, the non-zero return propagates up, `wait "${FAST_PIDS[@]}"` fails, entrypoint exits, and the container enters a crash-loop (iteration 1 → exit → restart → iteration 1 → ...). Confirmed today on `disinto-agents-llama` after `[agents.dev-qwen2]` landed in the project TOML — container restarted every ~3 seconds for several minutes; commenting out the block restored stability. ## Severity Not cosmetic — a single hired agent with a hyphen in its section name can take down every other TOML-driven sidecar that shares the `projects/*.toml`. `dev-qwen2` breaking `dev-qwen` today is exactly that. ## Repro 1. `disinto hire-an-agent dev-qwen2 dev --local-model <url> --model <name>` (TOML block added: `[agents.dev-qwen2]`). 2. `grep FORGE_BOT_USER_ /home/johba/disinto/docker-compose.yml` → observe `FORGE_BOT_USER_DEV-QWEN2` (invalid identifier). 3. `docker exec disinto-agents-dev-qwen2 env | grep FORGE_BOT_USER` → empty. 4. On any TOML-driven sidecar that reads the same `projects/disinto.toml` (e.g. `disinto-agents-llama` with #855 fix applied), `docker logs` shows: ``` /home/agent/repos/_factory/lib/load-project.sh: line 151: export: `AGENT_DEV-QWEN2_BASE_URL=...': not a valid identifier ``` followed by container exit and restart. ## Fix Normalize the section name the same way in both emitters. The canonical rule (already used for `FORGE_TOKEN_<USER>` derivation, see #847): ```bash # dashes → underscores, uppercase safe_upper=$(echo "$service_name" | tr '[:lower:]-' '[:upper:]_') ``` Apply at: - **`lib/generators.sh`** — `_generate_local_model_services`, replace `${service_name^^}` with `${safe_upper}` where it flows into env var names (`FORGE_BOT_USER_*`, any other `FORGE_*_<NAME>` keys). - **`lib/load-project.sh`** — the Python block generating `AGENT_<NAME>_*` lines: replace `name.upper()` with `name.upper().replace("-", "_")` so the emitted keys are always valid shell identifiers. Secondary (defensive): the `export` loop in `load-project.sh` should tolerate or pre-validate invalid identifiers rather than letting them tank `set -e`. Even after fixing the emission, a hand-edited TOML could reintroduce the issue; fail gracefully with a warning instead of crashing. ## Acceptance - [ ] Section `[agents.dev-qwen2]` produces env key `FORGE_BOT_USER_DEV_QWEN2` (underscore) in generated compose. - [ ] `load-project.sh` on the same TOML emits `AGENT_DEV_QWEN2_BASE_URL=...` (underscore); `export` succeeds. - [ ] No `not a valid identifier` errors in `docker logs disinto-agents-*` for any hired agent with hyphens. - [ ] Existing single-word section names (e.g. `llama`) unchanged. - [ ] Load-project tolerates (warn-and-skip) invalid identifiers instead of crashing, as a defence against hand-edits. ## Affected files - `lib/generators.sh` — `_generate_local_model_services` - `lib/load-project.sh` — Python AGENT_* emission + shell export loop ## Context Caught while bringing up `dev-qwen2` as a second parallel llama dev agent. Originally thought cosmetic (compose drops the bad var), but the `load-project.sh` call site turns it into a hard crash that took `dev-qwen` offline. See #855 for the adjacent FACTORY_REPO/volume-mount gap that exposed this bug (before #855, the poll loop never ran, so load-project was never called, so the crash stayed hidden). Related: #847 (env-var naming rule origin), #855 (silent-zombie mode that hid this bug).
dev-bot added the
backlog
label 2026-04-16 10:13:52 +00:00
dev-bot self-assigned this 2026-04-16 13:17:27 +00:00
dev-bot added
in-progress
and removed
backlog
labels 2026-04-16 13:17:27 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#852
No description provided.