disinto-ops/sprints/website-observability-wire-up.md

7.6 KiB

Sprint: website observability wire-up

Vision issues

  • #426 — Website observability — make disinto.ai an observable addressable

What this enables

After this sprint, the factory can read engagement data from disinto.ai. The planner will have daily evidence files in evidence/engagement/ to answer: how many people visited, where they came from, which pages they viewed. Observables will exist. The prerequisites for two milestones unlock:

  • Adoption: "Landing page communicating value proposition" (evidence confirms it works)
  • Ship (Fold 2): "Engagement measurement baked into deploy pipelines" (verify-observable step becomes non-advisory)

What exists today

The design and most of the code are already done:

  • site/collect-engagement.sh — Complete. Parses Caddy's JSON access log, computes unique visitors / page views / top referrers, writes dated JSON evidence to $OPS_REPO_ROOT/evidence/engagement/YYYY-MM-DD.json.
  • formulas/run-publish-site.toml verify-observable step — Complete. Checks Caddy log activity, script presence, and evidence recency on every deploy.
  • docs/EVIDENCE-ARCHITECTURE.md — Documents the full pipeline: Caddy logs → collect-engagement → evidence/engagement/
  • docs/OBSERVABLE-DEPLOY.md — Documents the observable deploy pattern.
  • docker/edge/Dockerfile — Caddy edge container exists for the factory.

What's missing is the wiring: connecting the factory to the remote Caddy host where disinto.ai runs.

Complexity

Files touched: 4-6 depending on fork choices Subsystems: vault dispatch, SSH access, log collection, ops repo evidence Sub-issues: 3-4 Gluecode ratio: ~80% gluecode, ~20% greenfield (the container/formula is new)

Risks

  • Production Caddy is on a separate host from the factory — all collection must go over SSH.
  • Log format mismatch: collect-engagement.sh assumes Caddy's structured JSON format. If the production Caddy uses default Combined Log Format, the script will produce empty reports silently.
  • SSH key scope: the key used for collection should be purpose-limited to avoid granting broad access.
  • Evidence commit: the container must commit evidence to the ops repo via Forgejo API (not git push over SSH) to keep the secret surface minimal.

Cost — new infra to maintain

  • One vault action formula (formulas/collect-engagement.toml or extension of existing formula)
  • One SSH key on the Caddy host's authorized_keys
  • Daily evidence files in ops repo (evidence/engagement/*.json) — ~1KB/file
  • No new long-running services or agents

Recommendation

Worth it. The human-directed architecture (dispatchable container with SSH) is cleaner than running cron directly on the production host — it keeps all factory logic inside the factory and treats the Caddy host as a dumb data source.

Design forks

Q1: What does the container fetch from the Caddy host?

Context: collect-engagement.sh already parses Caddy JSON access logs into evidence JSON. The question is where that parsing happens.

  • (A) Fetch raw log, process locally: Container SSHs in, copies today's access log segment (e.g. rsync or scp), then runs collect-engagement.sh inside the container against the local copy. The Caddy host needs zero disinto code installed.
  • (B) Run script remotely: Container SSHs in and executes collect-engagement.sh on the Caddy host. Requires the script (or a minimal version) to be deployed on the host. Output piped back.
  • (C) Pull Caddy metrics API: Container opens an SSH tunnel to Caddy's admin API (port 2019) and pulls request metrics directly. No log file parsing — but Caddy's metrics endpoint is less rich than full access log analysis (no referrers, no per-page breakdown).

Architect recommends (A): keeps the Caddy host dumb, all logic in the factory container, and collect-engagement.sh runs unchanged.

Q2: How is the daily collection triggered?

Context: Other factory agents (supervisor, planner, gardener) run on direct cron via *-run.sh. Vault actions go through the PR approval workflow. The collection is a recurring low-risk read-only operation.

  • (A) Direct cron in edge container: Add a cron entry to the edge container entrypoint, like supervisor/planner. Simple, no vault overhead. Runs daily without approval.
  • (B) Vault action with auto-dispatch: Create a recurring vault action TOML. If PR #12 (blast-radius tiers) lands, low-tier actions auto-execute. If not, each run needs admin approval — too heavy for daily collection.
  • (C) Supervisor-triggered: Supervisor detects stale evidence (no evidence/engagement/ file for today) and dispatches collection. Reactive rather than scheduled.

Architect recommends (A): this is a read-only data collection, same risk profile as supervisor health checks. Vault gating a daily log fetch adds friction without security benefit.

Q3: How is the SSH key provisioned for the collection container?

Context: The vault dispatcher supports mounts: ["ssh"] which mounts ~/.ssh read-only into the container. The edge container already has SSH infrastructure for reverse tunnels (disinto-tunnel user, autossh).

  • (A) Factory operator's SSH keys (mounts: ["ssh"]): Reuse the existing SSH keys on the factory host. Simple, but grants the container access to all hosts the operator can reach.
  • (B) Dedicated purpose-limited key: Generate a new SSH keypair, install the public key on the Caddy host with command= restriction (only allows cat /var/log/caddy/access.log or similar). Private key stored as CADDY_SSH_KEY in .env.vault.enc. Minimal blast radius.
  • (C) Edge tunnel reverse path: Instead of the factory SSHing out to Caddy, have the Caddy host push logs in via the existing reverse tunnel infrastructure. Inverts the connection direction but requires a log-push agent on the Caddy host.

Architect recommends (B): purpose-limited key with command= restriction on the Caddy host gives least-privilege access. The factory never gets a shell on production.

Proposed sub-issues

  1. collect-engagement formula + container script: Create formulas/collect-engagement.toml with steps: SSH into Caddy host using dedicated key → fetch today's access log segment → run collect-engagement.sh on local copy → commit evidence JSON to ops repo via Forgejo API. Add cron entry to edge container.
  2. Format-detection guard in collect-engagement.sh: Add a check at script start that verifies the input file is Caddy JSON format (not Combined Log Format). Fail loudly with actionable error if format is wrong.
  3. evidence/engagement/ directory + ops-setup wiring: Ensure lib/ops-setup.sh creates the evidence directory. Register the engagement cron schedule in factory setup docs.
  4. Document Caddy host SSH setup: Rent-a-human instructions for: generate keypair, install public key with command= restriction on Caddy host, add private key to .env.vault.enc.

If Q1=B (remote execution):

Sub-issues 2-4 remain the same. Sub-issue 1 changes: container SSHs in and runs the script remotely, requiring script deployment on the Caddy host (additional manual step).

If Q2=B (vault-gated):

Sub-issue 1 changes: instead of cron, create a vault action TOML template and document the daily dispatch. Depends on PR #12 (blast-radius tiers) for auto-approval.

If Q3=A (operator SSH keys):

Sub-issue 4 is simplified (no dedicated key generation), but blast radius is wider.

If Q3=C (reverse tunnel):

Sub-issue 1 changes significantly: instead of SSH-out, configure a log-push cron on the Caddy host that sends logs through the reverse tunnel. More infrastructure on the Caddy host side.