bug: edge-control add_route targets non-existent Caddy server edge — registration succeeds in registry but traffic never routes #789

Closed
opened 2026-04-15 16:20:57 +00:00 by dev-bot · 0 comments
Collaborator

Symptom

tools/edge-control/lib/caddy.sh line ~53 POSTs new routes to the Caddy admin API at:

POST ${CADDY_ADMIN_URL}/config/apps/http/servers/edge/routes

The path segment .../servers/edge/... assumes a Caddy server named edge exists in the running config. But tools/edge-control/install.sh lines 229-240 emit a Caddyfile with no named server — just a global block and a bare :80, :443 {} site. At runtime Caddy auto-generates server names (conventionally srv0, srv1, …). There's no edge.

Consequence: allocate_port (lib/ports.sh) writes the project to registry.json successfully → ssh disinto-register@edge "register …" returns a clean JSON with a port and fqdn. The operator's .env gets EDGE_TUNNEL_PORT set. The edge container's autossh opens the reverse tunnel. But the Caddy admin POST returns 404 or silently writes to a server that doesn't handle HTTPS, and nothing ever reaches the tunnel.

The failure is quiet: curl -X POST /config/apps/http/servers/edge/routes hits an unknown path — Caddy admin API returns error JSON but add_route uses 2>&1 on the curl call and only fails on non-zero exit, not on HTTP 4xx with response body. So register.sh prints Added route: … and the shell exits 0.

This would only surface after a real install on a real box, and the current deployment didn't run edge-control at all — so the bug is latent but would bite on first use.

Fix

Two symmetric options, land both for defense-in-depth:

Part 1 — name the server in the emitted Caddyfile

In install.sh, the Caddyfile heredoc should declare the server explicitly. Caddyfile syntax doesn't name servers directly, so we go through the JSON config approach (write a JSON config file, not a Caddyfile) OR use the named-matcher + routes machinery. Simplest: convert the emitted config to JSON.

Alternative that stays Caddyfile-native: drop the fixed name edge in add_route and instead discover the server name at runtime. See Part 2.

Part 2 — add_route should discover the server name dynamically

Rather than hard-coding /servers/edge/, lib/caddy.sh should:

# Get the first (and typically only) server that listens on :80/:443
local server_name
server_name=$(curl -s "${CADDY_ADMIN_URL}/config/apps/http/servers" \
  | jq -r 'to_entries | map(select(.value.listen[]? | test(":(80|443)$"))) | .[0].key // empty')

if [ -z "$server_name" ]; then
  echo "Error: could not find a Caddy server listening on :80/:443" >&2
  return 1
fi

curl -s -X POST "${CADDY_ADMIN_URL}/config/apps/http/servers/${server_name}/routes"

And symmetrically in remove_route which hardcodes the same path.

Part 3 — fail loudly on admin-API errors

Both add_route and remove_route currently only catch curl's exit code. HTTP 4xx/5xx with an error body reads as success. Check the HTTP status code:

response=$(curl -sS -w '\n%{http_code}' -X POST "${CADDY_ADMIN_URL}/config/apps/http/servers/${server_name}/routes" \
  -H "Content-Type: application/json" -d "$route_config")
status=$(echo "$response" | tail -n1)
body=$(echo "$response" | sed '$d')
if [ "$status" -ge 400 ]; then
  echo "Error: Caddy admin API returned $status: $body" >&2
  return 1
fi

Same pattern in remove_route and reload_caddy.

Affected files

  • tools/edge-control/install.sh — emit a Caddyfile that produces a server named edge (via config-JSON or a named-block workaround), OR leave the Caddyfile minimal and rely on Part 2
  • tools/edge-control/lib/caddy.shadd_route and remove_route use dynamic server-name discovery; all three admin-API helpers check HTTP status

Acceptance criteria

  • On a fresh install, ssh disinto-register@edge "register foo ssh-ed25519 AAAAC3..." actually produces a working Caddy route that serves foo.disinto.ai127.0.0.1:<port> (end-to-end test: register, open tunnel on the registered port with nc -l <port>, curl the fqdn, see the nc output)
  • If the admin API returns a 4xx/5xx on route injection, the register command exits non-zero with a readable error message (not a silent Added route: + success exit)
  • remove_route handles the same error paths
  • CI green

Blocks

This blocks any real use of edge-control. #788 (landing page preservation) doesn't help if registered projects never actually route.

## Symptom `tools/edge-control/lib/caddy.sh` line ~53 POSTs new routes to the Caddy admin API at: ``` POST ${CADDY_ADMIN_URL}/config/apps/http/servers/edge/routes ``` The path segment `.../servers/edge/...` assumes a Caddy server named `edge` exists in the running config. But `tools/edge-control/install.sh` lines 229-240 emit a Caddyfile with no named server — just a global block and a bare `:80, :443 {}` site. At runtime Caddy auto-generates server names (conventionally `srv0`, `srv1`, …). There's no `edge`. Consequence: `allocate_port` (`lib/ports.sh`) writes the project to `registry.json` successfully → `ssh disinto-register@edge "register …"` returns a clean JSON with a port and fqdn. The operator's `.env` gets `EDGE_TUNNEL_PORT` set. The edge container's autossh opens the reverse tunnel. But the Caddy admin POST returns 404 or silently writes to a server that doesn't handle HTTPS, and nothing ever reaches the tunnel. The failure is quiet: `curl -X POST /config/apps/http/servers/edge/routes` hits an unknown path — Caddy admin API returns error JSON but `add_route` uses `2>&1` on the curl call and only fails on non-zero exit, not on HTTP 4xx with response body. So `register.sh` prints `Added route: …` and the shell exits 0. This would only surface after a real install on a real box, and the current deployment didn't run `edge-control` at all — so the bug is latent but would bite on first use. ## Fix Two symmetric options, land both for defense-in-depth: ### Part 1 — name the server in the emitted Caddyfile In `install.sh`, the Caddyfile heredoc should declare the server explicitly. Caddyfile syntax doesn't name servers directly, so we go through the JSON config approach (write a JSON config file, not a Caddyfile) OR use the named-matcher + routes machinery. Simplest: convert the emitted config to JSON. Alternative that stays Caddyfile-native: drop the fixed name `edge` in `add_route` and instead discover the server name at runtime. See Part 2. ### Part 2 — `add_route` should discover the server name dynamically Rather than hard-coding `/servers/edge/`, `lib/caddy.sh` should: ```bash # Get the first (and typically only) server that listens on :80/:443 local server_name server_name=$(curl -s "${CADDY_ADMIN_URL}/config/apps/http/servers" \ | jq -r 'to_entries | map(select(.value.listen[]? | test(":(80|443)$"))) | .[0].key // empty') if [ -z "$server_name" ]; then echo "Error: could not find a Caddy server listening on :80/:443" >&2 return 1 fi curl -s -X POST "${CADDY_ADMIN_URL}/config/apps/http/servers/${server_name}/routes" … ``` And symmetrically in `remove_route` which hardcodes the same path. ### Part 3 — fail loudly on admin-API errors Both `add_route` and `remove_route` currently only catch curl's exit code. HTTP 4xx/5xx with an error body reads as success. Check the HTTP status code: ```bash response=$(curl -sS -w '\n%{http_code}' -X POST "${CADDY_ADMIN_URL}/config/apps/http/servers/${server_name}/routes" \ -H "Content-Type: application/json" -d "$route_config") status=$(echo "$response" | tail -n1) body=$(echo "$response" | sed '$d') if [ "$status" -ge 400 ]; then echo "Error: Caddy admin API returned $status: $body" >&2 return 1 fi ``` Same pattern in `remove_route` and `reload_caddy`. ## Affected files - `tools/edge-control/install.sh` — emit a Caddyfile that produces a server named `edge` (via config-JSON or a named-block workaround), OR leave the Caddyfile minimal and rely on Part 2 - `tools/edge-control/lib/caddy.sh` — `add_route` and `remove_route` use dynamic server-name discovery; all three admin-API helpers check HTTP status ## Acceptance criteria - [ ] On a fresh install, `ssh disinto-register@edge "register foo ssh-ed25519 AAAAC3..."` actually produces a working Caddy route that serves `foo.disinto.ai` → `127.0.0.1:<port>` (end-to-end test: register, open tunnel on the registered port with `nc -l <port>`, curl the fqdn, see the nc output) - [ ] If the admin API returns a 4xx/5xx on route injection, the register command exits non-zero with a readable error message (not a silent `Added route:` + success exit) - [ ] `remove_route` handles the same error paths - [ ] CI green ## Blocks This blocks any real use of edge-control. #788 (landing page preservation) doesn't help if registered projects never actually route.
dev-bot added the
backlog
priority
bug-report
labels 2026-04-15 16:20:57 +00:00
dev-bot self-assigned this 2026-04-15 16:22:44 +00:00
dev-bot added
in-progress
and removed
backlog
labels 2026-04-15 16:22:44 +00:00
dev-bot removed their assignment 2026-04-15 16:37:20 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: disinto-admin/disinto#789
No description provided.