fix: feat: observable addressables — engagement measurement for deployed artifacts (#718)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
openhands 2026-03-26 11:57:19 +00:00
parent 4c438b7c59
commit 192fc39198
5 changed files with 332 additions and 0 deletions

View file

@ -29,11 +29,13 @@ Different domains have different platforms:
| Protocol | Ponder / GraphQL | On-chain state, trades, positions | **Partial** — Live (not yet wired to evidence) |
| Infrastructure | DigitalOcean / system stats | CPU, RAM, disk, containers | **Planned** — Supervisor monitors, no evidence output yet |
| User experience | Playwright personas | Conversion, friction, journey completion | **Partial** — Scripts exist (`run-usertest.sh`), no evidence output yet |
| Engagement | Caddy access logs | Visitors, referral sources, page paths | **Implemented**`site/collect-engagement.sh` |
| Funnel | Analytics (future) | Bounce rate, conversion, retention | **Planned** — Not started |
Agents won't need to understand each platform. **Processes act as adapters** — they will read a platform's API and write structured evidence to git.
```
[Caddy logs] ──→ collect-engagement process ──→ evidence/engagement/YYYY-MM-DD.json
[Google Analytics] ──→ measure-funnel process ──→ evidence/funnel/YYYY-MM-DD.json
[Ponder GraphQL] ──→ measure-protocol process ──→ evidence/protocol/YYYY-MM-DD.json
[System stats] ──→ measure-resources process ──→ evidence/resources/YYYY-MM-DD.json
@ -56,6 +58,7 @@ Produce evidence without modifying the project under test. Some sense processes
| `run-user-test` | UX quality across 5 personas | Playwright + docker stack | Spawns Docker stack (containers + volumes + networks); requires Docker daemon; leaves ephemeral state until torn down | **Implemented**`run-usertest.sh` exists (harb #978) |
| `measure-resources` | Infra state (CPU, RAM, disk, containers) | System / DigitalOcean API | Read-only API calls. Safe to run anytime | **Planned** |
| `measure-protocol` | On-chain health (floor, reserves, volume) | Ponder GraphQL | Read-only API calls. Safe to run anytime | **Planned** |
| `collect-engagement` | Visitor engagement (visitors, referrers, pages) | Caddy access logs | Read-only log parsing. Safe to run anytime | **Implemented**`site/collect-engagement.sh` (disinto #718) |
| `measure-funnel` | User conversion and retention | Analytics API | Read-only API calls. Safe to run anytime | **Planned** |
### Mutation processes (create change)
@ -86,6 +89,7 @@ The planner won't need to know this loop exists as a rule. It will emerge from e
```
evidence/
engagement/ # Visitor counts, referrers, page paths (from Caddy logs)
evolution/ # Run params, generation stats, best fitness, champion
red-team/ # Per-attack results, floor held/broken, ETH extracted
holdout/ # Per-scenario pass/fail, gate decision

88
docs/OBSERVABLE-DEPLOY.md Normal file
View file

@ -0,0 +1,88 @@
# Observable Deploy Pattern
> Every addressable is born observable. It's not shipped until it's measured.
> — VISION.md
## The pattern
Every deploy formula must verify that the deployed artifact has a **return path**
before marking the deploy complete. An addressable without measurement is Fold 2
without Fold 3 — shipped but not learned from.
## How it works
Deploy formulas add a final `verify-observable` step that checks:
1. **Measurement infrastructure exists** — the mechanism that captures engagement
data is present and active (log file, analytics endpoint, event stream).
2. **Collection script is in place** — a process exists to transform raw signals
into structured evidence in `evidence/<domain>/YYYY-MM-DD.json`.
3. **Evidence has been collected** — at least one report exists (or a note that
the first collection is pending).
The step is advisory, not blocking — a deploy succeeds even if measurement
isn't yet active. But the output makes the gap visible to the planner, which
will file issues to close it.
## Artifact types and their return paths
| Artifact type | Addressable | Measurement source | Evidence path |
|---------------|-------------|-------------------|---------------|
| Static site | URL (disinto.ai) | Caddy access logs | `evidence/engagement/` |
| npm package | Registry name | Download counts API | `evidence/package/` |
| Smart contract | Contract address | On-chain events (Ponder) | `evidence/protocol/` |
| Docker image | Registry tag | Pull counts API | `evidence/container/` |
## Adding observable verification to a deploy formula
Add a `[[steps]]` block after the deploy verification step:
```toml
[[steps]]
id = "verify-observable"
title = "Verify engagement measurement is active"
description = """
Check that measurement infrastructure is active for this artifact.
1. Verify the data source exists and is recent
2. Verify the collection script is present
3. Report latest evidence if available
Observable status summary:
addressable=<what was deployed>
measurement=<data source>
evidence=<path to evidence directory>
consumer=planner (gap analysis)
"""
needs = ["verify"]
```
## Evidence format
Each evidence file is dated JSON committed to `evidence/<domain>/YYYY-MM-DD.json`.
The planner reads these during gap analysis. The predictor challenges staleness.
Minimum fields for engagement evidence:
```json
{
"date": "2026-03-26",
"period_hours": 24,
"unique_visitors": 42,
"page_views": 156,
"top_pages": [{"path": "/", "views": 89}],
"top_referrers": [{"source": "news.ycombinator.com", "visits": 12}]
}
```
## The loop
```
deploy formula
→ verify-observable step confirms measurement is active
→ collect-engagement.sh (cron) parses logs → evidence/engagement/
→ planner reads evidence → identifies gaps → creates issues
→ dev-agent implements → deploy formula runs again
```
This is the bridge from Fold 2 (ship) to Fold 3 (learn).