diff --git a/docs/EVIDENCE-ARCHITECTURE.md b/docs/EVIDENCE-ARCHITECTURE.md index b98042a..f181978 100644 --- a/docs/EVIDENCE-ARCHITECTURE.md +++ b/docs/EVIDENCE-ARCHITECTURE.md @@ -29,11 +29,13 @@ Different domains have different platforms: | Protocol | Ponder / GraphQL | On-chain state, trades, positions | **Partial** — Live (not yet wired to evidence) | | Infrastructure | DigitalOcean / system stats | CPU, RAM, disk, containers | **Planned** — Supervisor monitors, no evidence output yet | | User experience | Playwright personas | Conversion, friction, journey completion | **Partial** — Scripts exist (`run-usertest.sh`), no evidence output yet | +| Engagement | Caddy access logs | Visitors, referral sources, page paths | **Implemented** — `site/collect-engagement.sh` | | Funnel | Analytics (future) | Bounce rate, conversion, retention | **Planned** — Not started | Agents won't need to understand each platform. **Processes act as adapters** — they will read a platform's API and write structured evidence to git. ``` +[Caddy logs] ──→ collect-engagement process ──→ evidence/engagement/YYYY-MM-DD.json [Google Analytics] ──→ measure-funnel process ──→ evidence/funnel/YYYY-MM-DD.json [Ponder GraphQL] ──→ measure-protocol process ──→ evidence/protocol/YYYY-MM-DD.json [System stats] ──→ measure-resources process ──→ evidence/resources/YYYY-MM-DD.json @@ -56,6 +58,7 @@ Produce evidence without modifying the project under test. Some sense processes | `run-user-test` | UX quality across 5 personas | Playwright + docker stack | Spawns Docker stack (containers + volumes + networks); requires Docker daemon; leaves ephemeral state until torn down | **Implemented** — `run-usertest.sh` exists (harb #978) | | `measure-resources` | Infra state (CPU, RAM, disk, containers) | System / DigitalOcean API | Read-only API calls. Safe to run anytime | **Planned** | | `measure-protocol` | On-chain health (floor, reserves, volume) | Ponder GraphQL | Read-only API calls. Safe to run anytime | **Planned** | +| `collect-engagement` | Visitor engagement (visitors, referrers, pages) | Caddy access logs | Read-only log parsing. Safe to run anytime | **Implemented** — `site/collect-engagement.sh` (disinto #718) | | `measure-funnel` | User conversion and retention | Analytics API | Read-only API calls. Safe to run anytime | **Planned** | ### Mutation processes (create change) @@ -86,6 +89,7 @@ The planner won't need to know this loop exists as a rule. It will emerge from e ``` evidence/ + engagement/ # Visitor counts, referrers, page paths (from Caddy logs) evolution/ # Run params, generation stats, best fitness, champion red-team/ # Per-attack results, floor held/broken, ETH extracted holdout/ # Per-scenario pass/fail, gate decision diff --git a/docs/OBSERVABLE-DEPLOY.md b/docs/OBSERVABLE-DEPLOY.md new file mode 100644 index 0000000..3f5e618 --- /dev/null +++ b/docs/OBSERVABLE-DEPLOY.md @@ -0,0 +1,88 @@ +# Observable Deploy Pattern + +> Every addressable is born observable. It's not shipped until it's measured. +> — VISION.md + +## The pattern + +Every deploy formula must verify that the deployed artifact has a **return path** +before marking the deploy complete. An addressable without measurement is Fold 2 +without Fold 3 — shipped but not learned from. + +## How it works + +Deploy formulas add a final `verify-observable` step that checks: + +1. **Measurement infrastructure exists** — the mechanism that captures engagement + data is present and active (log file, analytics endpoint, event stream). +2. **Collection script is in place** — a process exists to transform raw signals + into structured evidence in `evidence//YYYY-MM-DD.json`. +3. **Evidence has been collected** — at least one report exists (or a note that + the first collection is pending). + +The step is advisory, not blocking — a deploy succeeds even if measurement +isn't yet active. But the output makes the gap visible to the planner, which +will file issues to close it. + +## Artifact types and their return paths + +| Artifact type | Addressable | Measurement source | Evidence path | +|---------------|-------------|-------------------|---------------| +| Static site | URL (disinto.ai) | Caddy access logs | `evidence/engagement/` | +| npm package | Registry name | Download counts API | `evidence/package/` | +| Smart contract | Contract address | On-chain events (Ponder) | `evidence/protocol/` | +| Docker image | Registry tag | Pull counts API | `evidence/container/` | + +## Adding observable verification to a deploy formula + +Add a `[[steps]]` block after the deploy verification step: + +```toml +[[steps]] +id = "verify-observable" +title = "Verify engagement measurement is active" +description = """ +Check that measurement infrastructure is active for this artifact. + +1. Verify the data source exists and is recent +2. Verify the collection script is present +3. Report latest evidence if available + +Observable status summary: + addressable= + measurement= + evidence= + consumer=planner (gap analysis) +""" +needs = ["verify"] +``` + +## Evidence format + +Each evidence file is dated JSON committed to `evidence//YYYY-MM-DD.json`. +The planner reads these during gap analysis. The predictor challenges staleness. + +Minimum fields for engagement evidence: + +```json +{ + "date": "2026-03-26", + "period_hours": 24, + "unique_visitors": 42, + "page_views": 156, + "top_pages": [{"path": "/", "views": 89}], + "top_referrers": [{"source": "news.ycombinator.com", "visits": 12}] +} +``` + +## The loop + +``` +deploy formula + → verify-observable step confirms measurement is active + → collect-engagement.sh (cron) parses logs → evidence/engagement/ + → planner reads evidence → identifies gaps → creates issues + → dev-agent implements → deploy formula runs again +``` + +This is the bridge from Fold 2 (ship) to Fold 3 (learn). diff --git a/evidence/engagement/.gitkeep b/evidence/engagement/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/formulas/run-publish-site.toml b/formulas/run-publish-site.toml index 8c57a85..cd3624f 100644 --- a/formulas/run-publish-site.toml +++ b/formulas/run-publish-site.toml @@ -178,3 +178,51 @@ Clean up temp files: rm -f /tmp/publish-site-deploy-sha /tmp/publish-site-deploy-target """ needs = ["prune-old-deploys"] + +[[steps]] +id = "verify-observable" +title = "Verify engagement measurement is active" +description = """ +Every deploy must confirm that the addressable has a return path (observable). +This is the bridge from Ship (Fold 2) to Learn (Fold 3). + +Check 1 — Caddy access log exists and is being written: + CADDY_LOG="${CADDY_ACCESS_LOG:-/var/log/caddy/access.log}" + if [ ! -f "$CADDY_LOG" ]; then + echo "WARNING: Caddy access log not found at $CADDY_LOG" + echo "Engagement measurement is NOT active — set CADDY_ACCESS_LOG if the path differs." + else + AGE_MIN=$(( ($(date +%s) - $(stat -c %Y "$CADDY_LOG" 2>/dev/null || echo 0)) / 60 )) + if [ "$AGE_MIN" -gt 60 ]; then + echo "WARNING: Caddy access log is ${AGE_MIN} minutes old — may not be active" + else + echo "OK: Caddy access log is active (last written ${AGE_MIN}m ago)" + fi + fi + +Check 2 — collect-engagement.sh is present in the repo: + FACTORY_ROOT="${FACTORY_ROOT:-/home/debian/dark-factory}" + if [ -x "$FACTORY_ROOT/site/collect-engagement.sh" ]; then + echo "OK: collect-engagement.sh is present and executable" + else + echo "WARNING: collect-engagement.sh not found or not executable" + fi + +Check 3 — engagement evidence has been collected at least once: + EVIDENCE_DIR="$FACTORY_ROOT/evidence/engagement" + LATEST=$(ls -1t "$EVIDENCE_DIR"/*.json 2>/dev/null | head -1 || true) + if [ -n "$LATEST" ]; then + echo "OK: Latest engagement report: $LATEST" + jq -r '" visitors=\(.unique_visitors) pages=\(.page_views) referrals=\(.referred_visitors)"' "$LATEST" 2>/dev/null || true + else + echo "NOTE: No engagement reports yet — run: bash site/collect-engagement.sh" + echo "The first report will appear after the cron job runs (daily at 23:55 UTC)." + fi + +Summary: + echo "" + echo "Observable status: addressable=disinto.ai measurement=caddy-access-logs" + echo "Evidence path: evidence/engagement/YYYY-MM-DD.json" + echo "Consumer: planner reads evidence/engagement/ during gap analysis" +""" +needs = ["verify"] diff --git a/site/collect-engagement.sh b/site/collect-engagement.sh new file mode 100644 index 0000000..c5975c6 --- /dev/null +++ b/site/collect-engagement.sh @@ -0,0 +1,192 @@ +#!/usr/bin/env bash +# ============================================================================= +# collect-engagement.sh — Parse Caddy access logs into engagement evidence +# +# Reads Caddy's structured JSON access log, extracts visitor engagement data +# for the last 24 hours, and writes a dated JSON report to evidence/engagement/. +# +# The planner consumes these reports to close the build→ship→learn loop: +# an addressable (disinto.ai) becomes observable when engagement data flows back. +# +# Usage: +# bash site/collect-engagement.sh +# +# Cron: 55 23 * * * cd /home/debian/dark-factory && bash site/collect-engagement.sh +# ============================================================================= +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +FACTORY_ROOT="$(dirname "$SCRIPT_DIR")" + +# shellcheck source=../lib/env.sh +source "$FACTORY_ROOT/lib/env.sh" + +LOGFILE="${FACTORY_ROOT}/site/collect-engagement.log" +log() { + printf '[%s] collect-engagement: %s\n' \ + "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" >> "$LOGFILE" +} + +# ── Configuration ──────────────────────────────────────────────────────────── + +# Caddy structured access log (JSON lines) +CADDY_LOG="${CADDY_ACCESS_LOG:-/var/log/caddy/access.log}" + +# Evidence output directory (committed to git) +EVIDENCE_DIR="${FACTORY_ROOT}/evidence/engagement" + +# Report date — defaults to today +REPORT_DATE=$(date -u +%Y-%m-%d) + +# Cutoff: only process entries from the last 24 hours +CUTOFF_TS=$(date -u -d '24 hours ago' +%s 2>/dev/null \ + || date -u -v-24H +%s 2>/dev/null \ + || echo 0) + +# ── Preflight checks ──────────────────────────────────────────────────────── + +if [ ! -f "$CADDY_LOG" ]; then + log "ERROR: Caddy access log not found at ${CADDY_LOG}" + echo "ERROR: Caddy access log not found at ${CADDY_LOG}" >&2 + echo "Set CADDY_ACCESS_LOG to the correct path." >&2 + exit 1 +fi + +if ! command -v jq &>/dev/null; then + log "ERROR: jq is required but not installed" + exit 1 +fi + +mkdir -p "$EVIDENCE_DIR" + +# ── Parse access log ──────────────────────────────────────────────────────── + +log "Parsing ${CADDY_LOG} for entries since $(date -u -d "@${CUTOFF_TS}" +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo "${CUTOFF_TS}")" + +# Extract relevant fields from Caddy JSON log lines. +# Caddy v2 structured log format: +# ts (float epoch), request.uri, request.remote_ip, request.headers.Referer, +# request.headers.User-Agent, status, size, duration +# +# Filter to last 24h, exclude assets/bots, produce a clean JSONL stream. +PARSED=$(jq -c --argjson cutoff "$CUTOFF_TS" ' + select(.ts >= $cutoff) + | select(.request.uri != null) + | { + ts: .ts, + ip: (.request.remote_ip // .request.remote_addr // "unknown" + | split(":")[0]), + uri: .request.uri, + status: .status, + size: .size, + duration: .duration, + referer: (.request.headers.Referer[0] // .request.headers.referer[0] + // "direct"), + ua: (.request.headers["User-Agent"][0] + // .request.headers["user-agent"][0] // "unknown") + } +' "$CADDY_LOG" 2>/dev/null || echo "") + +if [ -z "$PARSED" ]; then + log "No entries found in the last 24 hours" + jq -nc \ + --arg date "$REPORT_DATE" \ + --arg source "$CADDY_LOG" \ + '{ + date: $date, + source: $source, + period_hours: 24, + total_requests: 0, + unique_visitors: 0, + page_views: 0, + top_pages: [], + top_referrers: [], + note: "no entries in period" + }' > "${EVIDENCE_DIR}/${REPORT_DATE}.json" + log "Empty report written to ${EVIDENCE_DIR}/${REPORT_DATE}.json" + exit 0 +fi + +# ── Compute engagement metrics ────────────────────────────────────────────── + +# Filter out static assets and known bots for page-view metrics +PAGES=$(printf '%s\n' "$PARSED" | jq -c ' + select( + (.uri | test("\\.(css|js|png|jpg|jpeg|webp|ico|svg|woff2?|ttf|map)$") | not) + and (.ua | test("bot|crawler|spider|slurp|Googlebot|Bingbot|YandexBot"; "i") | not) + and (.status >= 200 and .status < 400) + ) +') + +TOTAL_REQUESTS=$(printf '%s\n' "$PARSED" | wc -l | tr -d ' ') +PAGE_VIEWS=$(printf '%s\n' "$PAGES" | grep -c . || echo 0) +UNIQUE_VISITORS=$(printf '%s\n' "$PAGES" | jq -r '.ip' | sort -u | wc -l | tr -d ' ') + +# Top pages by hit count +TOP_PAGES=$(printf '%s\n' "$PAGES" | jq -r '.uri' \ + | sort | uniq -c | sort -rn | head -10 \ + | awk '{printf "{\"path\":\"%s\",\"views\":%d}\n", $2, $1}' \ + | jq -sc '.') + +# Top referrers (exclude direct/self) +TOP_REFERRERS=$(printf '%s\n' "$PAGES" | jq -r '.referer' \ + | grep -v '^direct$' \ + | grep -v '^-$' \ + | grep -v 'disinto\.ai' \ + | sort | uniq -c | sort -rn | head -10 \ + | awk '{printf "{\"source\":\"%s\",\"visits\":%d}\n", $2, $1}' \ + | jq -sc '.' 2>/dev/null || echo '[]') + +# Unique visitors who came from external referrers +REFERRED_VISITORS=$(printf '%s\n' "$PAGES" | jq -r 'select(.referer != "direct" and .referer != "-" and (.referer | test("disinto\\.ai") | not)) | .ip' \ + | sort -u | wc -l | tr -d ' ') + +# Response time stats (p50, p95, p99 in ms) +RESPONSE_TIMES=$(printf '%s\n' "$PAGES" | jq -r '.duration // 0' | sort -n) +RT_COUNT=$(printf '%s\n' "$RESPONSE_TIMES" | wc -l | tr -d ' ') +if [ "$RT_COUNT" -gt 0 ]; then + P50_IDX=$(( (RT_COUNT * 50 + 99) / 100 )) + P95_IDX=$(( (RT_COUNT * 95 + 99) / 100 )) + P99_IDX=$(( (RT_COUNT * 99 + 99) / 100 )) + P50=$(printf '%s\n' "$RESPONSE_TIMES" | sed -n "${P50_IDX}p") + P95=$(printf '%s\n' "$RESPONSE_TIMES" | sed -n "${P95_IDX}p") + P99=$(printf '%s\n' "$RESPONSE_TIMES" | sed -n "${P99_IDX}p") +else + P50=0; P95=0; P99=0 +fi + +# ── Write evidence ────────────────────────────────────────────────────────── + +OUTPUT="${EVIDENCE_DIR}/${REPORT_DATE}.json" + +jq -nc \ + --arg date "$REPORT_DATE" \ + --arg source "$CADDY_LOG" \ + --argjson total_requests "$TOTAL_REQUESTS" \ + --argjson page_views "$PAGE_VIEWS" \ + --argjson unique_visitors "$UNIQUE_VISITORS" \ + --argjson referred_visitors "$REFERRED_VISITORS" \ + --argjson top_pages "$TOP_PAGES" \ + --argjson top_referrers "$TOP_REFERRERS" \ + --argjson p50 "${P50:-0}" \ + --argjson p95 "${P95:-0}" \ + --argjson p99 "${P99:-0}" \ + '{ + date: $date, + source: $source, + period_hours: 24, + total_requests: $total_requests, + page_views: $page_views, + unique_visitors: $unique_visitors, + referred_visitors: $referred_visitors, + top_pages: $top_pages, + top_referrers: $top_referrers, + response_time: { + p50_seconds: $p50, + p95_seconds: $p95, + p99_seconds: $p99 + } + }' > "$OUTPUT" + +log "Engagement report written to ${OUTPUT}: ${UNIQUE_VISITORS} visitors, ${PAGE_VIEWS} page views" +echo "Engagement report: ${UNIQUE_VISITORS} unique visitors, ${PAGE_VIEWS} page views → ${OUTPUT}"