fix: feat: observable addressables — engagement measurement for deployed artifacts (#718)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-26 11:57:19 +00:00
parent 4c438b7c59
commit 192fc39198
5 changed files with 332 additions and 0 deletions

View file

@ -29,11 +29,13 @@ Different domains have different platforms:
| Protocol | Ponder / GraphQL | On-chain state, trades, positions | **Partial** — Live (not yet wired to evidence) |
| Infrastructure | DigitalOcean / system stats | CPU, RAM, disk, containers | **Planned** — Supervisor monitors, no evidence output yet |
| User experience | Playwright personas | Conversion, friction, journey completion | **Partial** — Scripts exist (`run-usertest.sh`), no evidence output yet |
| Engagement | Caddy access logs | Visitors, referral sources, page paths | **Implemented**`site/collect-engagement.sh` |
| Funnel | Analytics (future) | Bounce rate, conversion, retention | **Planned** — Not started |
Agents won't need to understand each platform. **Processes act as adapters** — they will read a platform's API and write structured evidence to git.
```
[Caddy logs] ──→ collect-engagement process ──→ evidence/engagement/YYYY-MM-DD.json
[Google Analytics] ──→ measure-funnel process ──→ evidence/funnel/YYYY-MM-DD.json
[Ponder GraphQL] ──→ measure-protocol process ──→ evidence/protocol/YYYY-MM-DD.json
[System stats] ──→ measure-resources process ──→ evidence/resources/YYYY-MM-DD.json
@ -56,6 +58,7 @@ Produce evidence without modifying the project under test. Some sense processes
| `run-user-test` | UX quality across 5 personas | Playwright + docker stack | Spawns Docker stack (containers + volumes + networks); requires Docker daemon; leaves ephemeral state until torn down | **Implemented**`run-usertest.sh` exists (harb #978) |
| `measure-resources` | Infra state (CPU, RAM, disk, containers) | System / DigitalOcean API | Read-only API calls. Safe to run anytime | **Planned** |
| `measure-protocol` | On-chain health (floor, reserves, volume) | Ponder GraphQL | Read-only API calls. Safe to run anytime | **Planned** |
| `collect-engagement` | Visitor engagement (visitors, referrers, pages) | Caddy access logs | Read-only log parsing. Safe to run anytime | **Implemented**`site/collect-engagement.sh` (disinto #718) |
| `measure-funnel` | User conversion and retention | Analytics API | Read-only API calls. Safe to run anytime | **Planned** |
### Mutation processes (create change)
@ -86,6 +89,7 @@ The planner won't need to know this loop exists as a rule. It will emerge from e
```
evidence/
engagement/ # Visitor counts, referrers, page paths (from Caddy logs)
evolution/ # Run params, generation stats, best fitness, champion
red-team/ # Per-attack results, floor held/broken, ETH extracted
holdout/ # Per-scenario pass/fail, gate decision

88
docs/OBSERVABLE-DEPLOY.md Normal file
View file

@ -0,0 +1,88 @@
# Observable Deploy Pattern
> Every addressable is born observable. It's not shipped until it's measured.
> — VISION.md
## The pattern
Every deploy formula must verify that the deployed artifact has a **return path**
before marking the deploy complete. An addressable without measurement is Fold 2
without Fold 3 — shipped but not learned from.
## How it works
Deploy formulas add a final `verify-observable` step that checks:
1. **Measurement infrastructure exists** — the mechanism that captures engagement
data is present and active (log file, analytics endpoint, event stream).
2. **Collection script is in place** — a process exists to transform raw signals
into structured evidence in `evidence/<domain>/YYYY-MM-DD.json`.
3. **Evidence has been collected** — at least one report exists (or a note that
the first collection is pending).
The step is advisory, not blocking — a deploy succeeds even if measurement
isn't yet active. But the output makes the gap visible to the planner, which
will file issues to close it.
## Artifact types and their return paths
| Artifact type | Addressable | Measurement source | Evidence path |
|---------------|-------------|-------------------|---------------|
| Static site | URL (disinto.ai) | Caddy access logs | `evidence/engagement/` |
| npm package | Registry name | Download counts API | `evidence/package/` |
| Smart contract | Contract address | On-chain events (Ponder) | `evidence/protocol/` |
| Docker image | Registry tag | Pull counts API | `evidence/container/` |
## Adding observable verification to a deploy formula
Add a `[[steps]]` block after the deploy verification step:
```toml
[[steps]]
id = "verify-observable"
title = "Verify engagement measurement is active"
description = """
Check that measurement infrastructure is active for this artifact.
1. Verify the data source exists and is recent
2. Verify the collection script is present
3. Report latest evidence if available
Observable status summary:
addressable=<what was deployed>
measurement=<data source>
evidence=<path to evidence directory>
consumer=planner (gap analysis)
"""
needs = ["verify"]
```
## Evidence format
Each evidence file is dated JSON committed to `evidence/<domain>/YYYY-MM-DD.json`.
The planner reads these during gap analysis. The predictor challenges staleness.
Minimum fields for engagement evidence:
```json
{
"date": "2026-03-26",
"period_hours": 24,
"unique_visitors": 42,
"page_views": 156,
"top_pages": [{"path": "/", "views": 89}],
"top_referrers": [{"source": "news.ycombinator.com", "visits": 12}]
}
```
## The loop
```
deploy formula
→ verify-observable step confirms measurement is active
→ collect-engagement.sh (cron) parses logs → evidence/engagement/
→ planner reads evidence → identifies gaps → creates issues
→ dev-agent implements → deploy formula runs again
```
This is the bridge from Fold 2 (ship) to Fold 3 (learn).

View file

View file

@ -178,3 +178,51 @@ Clean up temp files:
rm -f /tmp/publish-site-deploy-sha /tmp/publish-site-deploy-target
"""
needs = ["prune-old-deploys"]
[[steps]]
id = "verify-observable"
title = "Verify engagement measurement is active"
description = """
Every deploy must confirm that the addressable has a return path (observable).
This is the bridge from Ship (Fold 2) to Learn (Fold 3).
Check 1 Caddy access log exists and is being written:
CADDY_LOG="${CADDY_ACCESS_LOG:-/var/log/caddy/access.log}"
if [ ! -f "$CADDY_LOG" ]; then
echo "WARNING: Caddy access log not found at $CADDY_LOG"
echo "Engagement measurement is NOT active — set CADDY_ACCESS_LOG if the path differs."
else
AGE_MIN=$(( ($(date +%s) - $(stat -c %Y "$CADDY_LOG" 2>/dev/null || echo 0)) / 60 ))
if [ "$AGE_MIN" -gt 60 ]; then
echo "WARNING: Caddy access log is ${AGE_MIN} minutes old — may not be active"
else
echo "OK: Caddy access log is active (last written ${AGE_MIN}m ago)"
fi
fi
Check 2 collect-engagement.sh is present in the repo:
FACTORY_ROOT="${FACTORY_ROOT:-/home/debian/dark-factory}"
if [ -x "$FACTORY_ROOT/site/collect-engagement.sh" ]; then
echo "OK: collect-engagement.sh is present and executable"
else
echo "WARNING: collect-engagement.sh not found or not executable"
fi
Check 3 engagement evidence has been collected at least once:
EVIDENCE_DIR="$FACTORY_ROOT/evidence/engagement"
LATEST=$(ls -1t "$EVIDENCE_DIR"/*.json 2>/dev/null | head -1 || true)
if [ -n "$LATEST" ]; then
echo "OK: Latest engagement report: $LATEST"
jq -r '" visitors=\(.unique_visitors) pages=\(.page_views) referrals=\(.referred_visitors)"' "$LATEST" 2>/dev/null || true
else
echo "NOTE: No engagement reports yet — run: bash site/collect-engagement.sh"
echo "The first report will appear after the cron job runs (daily at 23:55 UTC)."
fi
Summary:
echo ""
echo "Observable status: addressable=disinto.ai measurement=caddy-access-logs"
echo "Evidence path: evidence/engagement/YYYY-MM-DD.json"
echo "Consumer: planner reads evidence/engagement/ during gap analysis"
"""
needs = ["verify"]

192
site/collect-engagement.sh Normal file
View file

@ -0,0 +1,192 @@
#!/usr/bin/env bash
# =============================================================================
# collect-engagement.sh — Parse Caddy access logs into engagement evidence
#
# Reads Caddy's structured JSON access log, extracts visitor engagement data
# for the last 24 hours, and writes a dated JSON report to evidence/engagement/.
#
# The planner consumes these reports to close the build→ship→learn loop:
# an addressable (disinto.ai) becomes observable when engagement data flows back.
#
# Usage:
# bash site/collect-engagement.sh
#
# Cron: 55 23 * * * cd /home/debian/dark-factory && bash site/collect-engagement.sh
# =============================================================================
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
FACTORY_ROOT="$(dirname "$SCRIPT_DIR")"
# shellcheck source=../lib/env.sh
source "$FACTORY_ROOT/lib/env.sh"
LOGFILE="${FACTORY_ROOT}/site/collect-engagement.log"
log() {
printf '[%s] collect-engagement: %s\n' \
"$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" >> "$LOGFILE"
}
# ── Configuration ────────────────────────────────────────────────────────────
# Caddy structured access log (JSON lines)
CADDY_LOG="${CADDY_ACCESS_LOG:-/var/log/caddy/access.log}"
# Evidence output directory (committed to git)
EVIDENCE_DIR="${FACTORY_ROOT}/evidence/engagement"
# Report date — defaults to today
REPORT_DATE=$(date -u +%Y-%m-%d)
# Cutoff: only process entries from the last 24 hours
CUTOFF_TS=$(date -u -d '24 hours ago' +%s 2>/dev/null \
|| date -u -v-24H +%s 2>/dev/null \
|| echo 0)
# ── Preflight checks ────────────────────────────────────────────────────────
if [ ! -f "$CADDY_LOG" ]; then
log "ERROR: Caddy access log not found at ${CADDY_LOG}"
echo "ERROR: Caddy access log not found at ${CADDY_LOG}" >&2
echo "Set CADDY_ACCESS_LOG to the correct path." >&2
exit 1
fi
if ! command -v jq &>/dev/null; then
log "ERROR: jq is required but not installed"
exit 1
fi
mkdir -p "$EVIDENCE_DIR"
# ── Parse access log ────────────────────────────────────────────────────────
log "Parsing ${CADDY_LOG} for entries since $(date -u -d "@${CUTOFF_TS}" +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo "${CUTOFF_TS}")"
# Extract relevant fields from Caddy JSON log lines.
# Caddy v2 structured log format:
# ts (float epoch), request.uri, request.remote_ip, request.headers.Referer,
# request.headers.User-Agent, status, size, duration
#
# Filter to last 24h, exclude assets/bots, produce a clean JSONL stream.
PARSED=$(jq -c --argjson cutoff "$CUTOFF_TS" '
select(.ts >= $cutoff)
| select(.request.uri != null)
| {
ts: .ts,
ip: (.request.remote_ip // .request.remote_addr // "unknown"
| split(":")[0]),
uri: .request.uri,
status: .status,
size: .size,
duration: .duration,
referer: (.request.headers.Referer[0] // .request.headers.referer[0]
// "direct"),
ua: (.request.headers["User-Agent"][0]
// .request.headers["user-agent"][0] // "unknown")
}
' "$CADDY_LOG" 2>/dev/null || echo "")
if [ -z "$PARSED" ]; then
log "No entries found in the last 24 hours"
jq -nc \
--arg date "$REPORT_DATE" \
--arg source "$CADDY_LOG" \
'{
date: $date,
source: $source,
period_hours: 24,
total_requests: 0,
unique_visitors: 0,
page_views: 0,
top_pages: [],
top_referrers: [],
note: "no entries in period"
}' > "${EVIDENCE_DIR}/${REPORT_DATE}.json"
log "Empty report written to ${EVIDENCE_DIR}/${REPORT_DATE}.json"
exit 0
fi
# ── Compute engagement metrics ──────────────────────────────────────────────
# Filter out static assets and known bots for page-view metrics
PAGES=$(printf '%s\n' "$PARSED" | jq -c '
select(
(.uri | test("\\.(css|js|png|jpg|jpeg|webp|ico|svg|woff2?|ttf|map)$") | not)
and (.ua | test("bot|crawler|spider|slurp|Googlebot|Bingbot|YandexBot"; "i") | not)
and (.status >= 200 and .status < 400)
)
')
TOTAL_REQUESTS=$(printf '%s\n' "$PARSED" | wc -l | tr -d ' ')
PAGE_VIEWS=$(printf '%s\n' "$PAGES" | grep -c . || echo 0)
UNIQUE_VISITORS=$(printf '%s\n' "$PAGES" | jq -r '.ip' | sort -u | wc -l | tr -d ' ')
# Top pages by hit count
TOP_PAGES=$(printf '%s\n' "$PAGES" | jq -r '.uri' \
| sort | uniq -c | sort -rn | head -10 \
| awk '{printf "{\"path\":\"%s\",\"views\":%d}\n", $2, $1}' \
| jq -sc '.')
# Top referrers (exclude direct/self)
TOP_REFERRERS=$(printf '%s\n' "$PAGES" | jq -r '.referer' \
| grep -v '^direct$' \
| grep -v '^-$' \
| grep -v 'disinto\.ai' \
| sort | uniq -c | sort -rn | head -10 \
| awk '{printf "{\"source\":\"%s\",\"visits\":%d}\n", $2, $1}' \
| jq -sc '.' 2>/dev/null || echo '[]')
# Unique visitors who came from external referrers
REFERRED_VISITORS=$(printf '%s\n' "$PAGES" | jq -r 'select(.referer != "direct" and .referer != "-" and (.referer | test("disinto\\.ai") | not)) | .ip' \
| sort -u | wc -l | tr -d ' ')
# Response time stats (p50, p95, p99 in ms)
RESPONSE_TIMES=$(printf '%s\n' "$PAGES" | jq -r '.duration // 0' | sort -n)
RT_COUNT=$(printf '%s\n' "$RESPONSE_TIMES" | wc -l | tr -d ' ')
if [ "$RT_COUNT" -gt 0 ]; then
P50_IDX=$(( (RT_COUNT * 50 + 99) / 100 ))
P95_IDX=$(( (RT_COUNT * 95 + 99) / 100 ))
P99_IDX=$(( (RT_COUNT * 99 + 99) / 100 ))
P50=$(printf '%s\n' "$RESPONSE_TIMES" | sed -n "${P50_IDX}p")
P95=$(printf '%s\n' "$RESPONSE_TIMES" | sed -n "${P95_IDX}p")
P99=$(printf '%s\n' "$RESPONSE_TIMES" | sed -n "${P99_IDX}p")
else
P50=0; P95=0; P99=0
fi
# ── Write evidence ──────────────────────────────────────────────────────────
OUTPUT="${EVIDENCE_DIR}/${REPORT_DATE}.json"
jq -nc \
--arg date "$REPORT_DATE" \
--arg source "$CADDY_LOG" \
--argjson total_requests "$TOTAL_REQUESTS" \
--argjson page_views "$PAGE_VIEWS" \
--argjson unique_visitors "$UNIQUE_VISITORS" \
--argjson referred_visitors "$REFERRED_VISITORS" \
--argjson top_pages "$TOP_PAGES" \
--argjson top_referrers "$TOP_REFERRERS" \
--argjson p50 "${P50:-0}" \
--argjson p95 "${P95:-0}" \
--argjson p99 "${P99:-0}" \
'{
date: $date,
source: $source,
period_hours: 24,
total_requests: $total_requests,
page_views: $page_views,
unique_visitors: $unique_visitors,
referred_visitors: $referred_visitors,
top_pages: $top_pages,
top_referrers: $top_referrers,
response_time: {
p50_seconds: $p50,
p95_seconds: $p95,
p99_seconds: $p99
}
}' > "$OUTPUT"
log "Engagement report written to ${OUTPUT}: ${UNIQUE_VISITORS} visitors, ${PAGE_VIEWS} page views"
echo "Engagement report: ${UNIQUE_VISITORS} unique visitors, ${PAGE_VIEWS} page views → ${OUTPUT}"