disinto-ops/sprints/website-observability-wire-up.md

# Sprint: website observability wire-up

## Vision issues
- #426 — Website observability — make disinto.ai an observable addressable

## What this enables
After this sprint, the factory can read engagement data from disinto.ai. The planner
will have daily evidence files in `evidence/engagement/` to answer: how many people
visited, where they came from, which pages they viewed. Observables will exist.
The prerequisites for two milestones unlock:
- Adoption: "Landing page communicating value proposition" (evidence confirms it works)
- Ship (Fold 2): "Engagement measurement baked into deploy pipelines" (verify-observable step becomes non-advisory)

## What exists today

The design and most of the code are already done:

- `site/collect-engagement.sh` — Complete. Parses Caddy's JSON access log, computes unique visitors / page views / top referrers, writes dated JSON evidence to `$OPS_REPO_ROOT/evidence/engagement/YYYY-MM-DD.json`.
- `formulas/run-publish-site.toml` verify-observable step — Complete. Checks Caddy log activity, script presence, and evidence recency on every deploy.
- `docs/EVIDENCE-ARCHITECTURE.md` — Documents the full pipeline: Caddy logs → collect-engagement → evidence/engagement/
- `docs/OBSERVABLE-DEPLOY.md` — Documents the observable deploy pattern.
- `docker/edge/Dockerfile` — Caddy edge container exists for the factory.

What's missing is the wiring: connecting the factory to the remote Caddy host where
disinto.ai runs.

## Complexity

Files touched: 4-6 depending on fork choices
Subsystems: vault dispatch, SSH access, log collection, ops repo evidence
Sub-issues: 3-4
Gluecode ratio: ~80% gluecode, ~20% greenfield (the container/formula is new)

## Risks

- Production Caddy is on a separate host from the factory — all collection must go over SSH.
- Log format mismatch: collect-engagement.sh assumes Caddy's structured JSON format. If the production Caddy uses default Combined Log Format, the script will produce empty reports silently.
- SSH key scope: the key used for collection should be purpose-limited to avoid granting broad access.
- Evidence commit: the container must commit evidence to the ops repo via Forgejo API (not git push over SSH) to keep the secret surface minimal.

## Cost — new infra to maintain

- One vault action formula (`formulas/collect-engagement.toml` or extension of existing formula)
- One SSH key on the Caddy host's authorized_keys
- Daily evidence files in ops repo (evidence/engagement/*.json) — ~1KB/file
- No new long-running services or agents

## Recommendation

Worth it. The human-directed architecture (dispatchable container with SSH) is
cleaner than running cron directly on the production host — it keeps all factory
logic inside the factory and treats the Caddy host as a dumb data source.

## Design forks

### Q1: What does the container fetch from the Caddy host?

*Context: `collect-engagement.sh` already parses Caddy JSON access logs into evidence JSON. The question is where that parsing happens.*

- **(A) Fetch raw log, process locally**: Container SSHs in, copies today's access log segment (e.g. `rsync` or `scp`), then runs `collect-engagement.sh` inside the container against the local copy. The Caddy host needs zero disinto code installed.
- **(B) Run script remotely**: Container SSHs in and executes `collect-engagement.sh` on the Caddy host. Requires the script (or a minimal version) to be deployed on the host. Output piped back.
- **(C) Pull Caddy metrics API**: Container opens an SSH tunnel to Caddy's admin API (port 2019) and pulls request metrics directly. No log file parsing — but Caddy's metrics endpoint is less rich than full access log analysis (no referrers, no per-page breakdown).

*Architect recommends (A): keeps the Caddy host dumb, all logic in the factory container, and `collect-engagement.sh` runs unchanged.*

### Q2: How is the daily collection triggered?

*Context: Other factory agents (supervisor, planner, gardener) run on direct cron via `*-run.sh`. Vault actions go through the PR approval workflow. The collection is a recurring low-risk read-only operation.*

- **(A) Direct cron in edge container**: Add a cron entry to the edge container entrypoint, like supervisor/planner. Simple, no vault overhead. Runs daily without approval.
- **(B) Vault action with auto-dispatch**: Create a recurring vault action TOML. If PR #12 (blast-radius tiers) lands, low-tier actions auto-execute. If not, each run needs admin approval — too heavy for daily collection.
- **(C) Supervisor-triggered**: Supervisor detects stale evidence (no `evidence/engagement/` file for today) and dispatches collection. Reactive rather than scheduled.

*Architect recommends (A): this is a read-only data collection, same risk profile as supervisor health checks. Vault gating a daily log fetch adds friction without security benefit.*

### Q3: How is the SSH key provisioned for the collection container?

*Context: The vault dispatcher supports `mounts: ["ssh"]` which mounts `~/.ssh` read-only into the container. The edge container already has SSH infrastructure for reverse tunnels (`disinto-tunnel` user, `autossh`).*

- **(A) Factory operator's SSH keys** (`mounts: ["ssh"]`): Reuse the existing SSH keys on the factory host. Simple, but grants the container access to all hosts the operator can reach.
- **(B) Dedicated purpose-limited key**: Generate a new SSH keypair, install the public key on the Caddy host with `command=` restriction (only allows `cat /var/log/caddy/access.log` or similar). Private key stored as `CADDY_SSH_KEY` in `.env.vault.enc`. Minimal blast radius.
- **(C) Edge tunnel reverse path**: Instead of the factory SSHing *out* to Caddy, have the Caddy host push logs *in* via the existing reverse tunnel infrastructure. Inverts the connection direction but requires a log-push agent on the Caddy host.

*Architect recommends (B): purpose-limited key with `command=` restriction on the Caddy host gives least-privilege access. The factory never gets a shell on production.*

## Proposed sub-issues

### If Q1=A, Q2=A, Q3=B (recommended path):

1. **`collect-engagement` formula + container script**: Create `formulas/collect-engagement.toml` with steps: SSH into Caddy host using dedicated key → fetch today's access log segment → run `collect-engagement.sh` on local copy → commit evidence JSON to ops repo via Forgejo API. Add cron entry to edge container.
2. **Format-detection guard in `collect-engagement.sh`**: Add a check at script start that verifies the input file is Caddy JSON format (not Combined Log Format). Fail loudly with actionable error if format is wrong.
3. **`evidence/engagement/` directory + ops-setup wiring**: Ensure `lib/ops-setup.sh` creates the evidence directory. Register the engagement cron schedule in factory setup docs.
4. **Document Caddy host SSH setup**: Rent-a-human instructions for: generate keypair, install public key with `command=` restriction on Caddy host, add private key to `.env.vault.enc`.

### If Q1=B (remote execution):
Sub-issues 2-4 remain the same. Sub-issue 1 changes: container SSHs in and runs the script remotely, requiring script deployment on the Caddy host (additional manual step).

### If Q2=B (vault-gated):
Sub-issue 1 changes: instead of cron, create a vault action TOML template and document the daily dispatch. Depends on PR #12 (blast-radius tiers) for auto-approval.

### If Q3=A (operator SSH keys):
Sub-issue 4 is simplified (no dedicated key generation), but blast radius is wider.

### If Q3=C (reverse tunnel):
Sub-issue 1 changes significantly: instead of SSH-out, configure a log-push cron on the Caddy host that sends logs through the reverse tunnel. More infrastructure on the Caddy host side.
sprint: add website-observability-wire-up.md 2026-04-08 20:04:29 +00:00			`# Sprint: website observability wire-up`

			`## Vision issues`
			`- #426 — Website observability — make disinto.ai an observable addressable`

			`## What this enables`
			`After this sprint, the factory can read engagement data from disinto.ai. The planner`
			will have daily evidence files in `evidence/engagement/` to answer: how many people
			`visited, where they came from, which pages they viewed. Observables will exist.`
			`The prerequisites for two milestones unlock:`
			`- Adoption: "Landing page communicating value proposition" (evidence confirms it works)`
			`- Ship (Fold 2): "Engagement measurement baked into deploy pipelines" (verify-observable step becomes non-advisory)`

			`## What exists today`

			`The design and most of the code are already done:`

			- `site/collect-engagement.sh` — Complete. Parses Caddy's JSON access log, computes unique visitors / page views / top referrers, writes dated JSON evidence to `$OPS_REPO_ROOT/evidence/engagement/YYYY-MM-DD.json`.
			- `formulas/run-publish-site.toml` verify-observable step — Complete. Checks Caddy log activity, script presence, and evidence recency on every deploy.
			- `docs/EVIDENCE-ARCHITECTURE.md` — Documents the full pipeline: Caddy logs → collect-engagement → evidence/engagement/
			- `docs/OBSERVABLE-DEPLOY.md` — Documents the observable deploy pattern.
			- `docker/edge/Dockerfile` — Caddy edge container exists for the factory.

sprint: add design forks for website-observability-wire-up 2026-04-12 00:58:08 +00:00			`What's missing is the wiring: connecting the factory to the remote Caddy host where`
			`disinto.ai runs.`
sprint: add website-observability-wire-up.md 2026-04-08 20:04:29 +00:00
			`## Complexity`

sprint: add design forks for website-observability-wire-up 2026-04-12 00:58:08 +00:00			`Files touched: 4-6 depending on fork choices`
			`Subsystems: vault dispatch, SSH access, log collection, ops repo evidence`
			`Sub-issues: 3-4`
			`Gluecode ratio: ~80% gluecode, ~20% greenfield (the container/formula is new)`
sprint: add website-observability-wire-up.md 2026-04-08 20:04:29 +00:00
			`## Risks`

sprint: add design forks for website-observability-wire-up 2026-04-12 00:58:08 +00:00			`- Production Caddy is on a separate host from the factory — all collection must go over SSH.`
			`- Log format mismatch: collect-engagement.sh assumes Caddy's structured JSON format. If the production Caddy uses default Combined Log Format, the script will produce empty reports silently.`
			`- SSH key scope: the key used for collection should be purpose-limited to avoid granting broad access.`
			`- Evidence commit: the container must commit evidence to the ops repo via Forgejo API (not git push over SSH) to keep the secret surface minimal.`
sprint: add website-observability-wire-up.md 2026-04-08 20:04:29 +00:00
			`## Cost — new infra to maintain`

sprint: add design forks for website-observability-wire-up 2026-04-12 00:58:08 +00:00			- One vault action formula (`formulas/collect-engagement.toml` or extension of existing formula)
			`- One SSH key on the Caddy host's authorized_keys`
			`- Daily evidence files in ops repo (evidence/engagement/*.json) — ~1KB/file`
			`- No new long-running services or agents`
sprint: add website-observability-wire-up.md 2026-04-08 20:04:29 +00:00
			`## Recommendation`

sprint: add design forks for website-observability-wire-up 2026-04-12 00:58:08 +00:00			`Worth it. The human-directed architecture (dispatchable container with SSH) is`
			`cleaner than running cron directly on the production host — it keeps all factory`
			`logic inside the factory and treats the Caddy host as a dumb data source.`

			`## Design forks`

			`### Q1: What does the container fetch from the Caddy host?`

			Context: `collect-engagement.sh` already parses Caddy JSON access logs into evidence JSON. The question is where that parsing happens.

			- (A) Fetch raw log, process locally: Container SSHs in, copies today's access log segment (e.g. `rsync` or `scp`), then runs `collect-engagement.sh` inside the container against the local copy. The Caddy host needs zero disinto code installed.
			- (B) Run script remotely: Container SSHs in and executes `collect-engagement.sh` on the Caddy host. Requires the script (or a minimal version) to be deployed on the host. Output piped back.
			`- (C) Pull Caddy metrics API: Container opens an SSH tunnel to Caddy's admin API (port 2019) and pulls request metrics directly. No log file parsing — but Caddy's metrics endpoint is less rich than full access log analysis (no referrers, no per-page breakdown).`

			Architect recommends (A): keeps the Caddy host dumb, all logic in the factory container, and `collect-engagement.sh` runs unchanged.

			`### Q2: How is the daily collection triggered?`

			Context: Other factory agents (supervisor, planner, gardener) run on direct cron via `-run.sh`. Vault actions go through the PR approval workflow. The collection is a recurring low-risk read-only operation.*

			`- (A) Direct cron in edge container: Add a cron entry to the edge container entrypoint, like supervisor/planner. Simple, no vault overhead. Runs daily without approval.`
			`- (B) Vault action with auto-dispatch: Create a recurring vault action TOML. If PR #12 (blast-radius tiers) lands, low-tier actions auto-execute. If not, each run needs admin approval — too heavy for daily collection.`
			- (C) Supervisor-triggered: Supervisor detects stale evidence (no `evidence/engagement/` file for today) and dispatches collection. Reactive rather than scheduled.

			`Architect recommends (A): this is a read-only data collection, same risk profile as supervisor health checks. Vault gating a daily log fetch adds friction without security benefit.`

			`### Q3: How is the SSH key provisioned for the collection container?`

			Context: The vault dispatcher supports `mounts: ["ssh"]` which mounts `~/.ssh` read-only into the container. The edge container already has SSH infrastructure for reverse tunnels (`disinto-tunnel` user, `autossh`).

			- (A) Factory operator's SSH keys (`mounts: ["ssh"]`): Reuse the existing SSH keys on the factory host. Simple, but grants the container access to all hosts the operator can reach.
			- (B) Dedicated purpose-limited key: Generate a new SSH keypair, install the public key on the Caddy host with `command=` restriction (only allows `cat /var/log/caddy/access.log` or similar). Private key stored as `CADDY_SSH_KEY` in `.env.vault.enc`. Minimal blast radius.
			`- (C) Edge tunnel reverse path: Instead of the factory SSHing out to Caddy, have the Caddy host push logs in via the existing reverse tunnel infrastructure. Inverts the connection direction but requires a log-push agent on the Caddy host.`

			Architect recommends (B): purpose-limited key with `command=` restriction on the Caddy host gives least-privilege access. The factory never gets a shell on production.

			`## Proposed sub-issues`

			`### If Q1=A, Q2=A, Q3=B (recommended path):`

			1. `collect-engagement` formula + container script: Create `formulas/collect-engagement.toml` with steps: SSH into Caddy host using dedicated key → fetch today's access log segment → run `collect-engagement.sh` on local copy → commit evidence JSON to ops repo via Forgejo API. Add cron entry to edge container.
			2. Format-detection guard in `collect-engagement.sh`: Add a check at script start that verifies the input file is Caddy JSON format (not Combined Log Format). Fail loudly with actionable error if format is wrong.
			3. `evidence/engagement/` directory + ops-setup wiring: Ensure `lib/ops-setup.sh` creates the evidence directory. Register the engagement cron schedule in factory setup docs.
			4. Document Caddy host SSH setup: Rent-a-human instructions for: generate keypair, install public key with `command=` restriction on Caddy host, add private key to `.env.vault.enc`.

			`### If Q1=B (remote execution):`
			`Sub-issues 2-4 remain the same. Sub-issue 1 changes: container SSHs in and runs the script remotely, requiring script deployment on the Caddy host (additional manual step).`

			`### If Q2=B (vault-gated):`
			`Sub-issue 1 changes: instead of cron, create a vault action TOML template and document the daily dispatch. Depends on PR #12 (blast-radius tiers) for auto-approval.`

			`### If Q3=A (operator SSH keys):`
			`Sub-issue 4 is simplified (no dedicated key generation), but blast radius is wider.`
sprint: add website-observability-wire-up.md 2026-04-08 20:04:29 +00:00
sprint: add design forks for website-observability-wire-up 2026-04-12 00:58:08 +00:00			`### If Q3=C (reverse tunnel):`
			`Sub-issue 1 changes significantly: instead of SSH-out, configure a log-push cron on the Caddy host that sends logs through the reverse tunnel. More infrastructure on the Caddy host side.`