# Evidence Architecture — Roadmap > **Status: Partially Implemented** — This document describes the target evidence architecture. Items marked **Implemented** exist in the codebase; items marked **Partial** have upstream scripts but no evidence output yet; all others are **Planned**. See AGENTS.md for the current operational state. Disinto is purpose-built for one loop: **build software, launch it, improve it, reach market fit.** This document describes how autonomous agents will sense the world, produce evidence, and use that evidence to make decisions — from "which issue to work on next" to "is this ready to deploy." ## The Loop ``` build → measure → evidence good enough? no → improve → build again yes → deploy → measure in-market → evidence still good? no → improve → build again yes → expand ``` Every decision in this loop will be driven by evidence, not intuition. The planner will read structured evidence across all dimensions, identify the weakest one, and focus there. ## Evidence as Integration Layer Different domains have different platforms: | Domain | Platform | What it tracks | Status | |--------|----------|---------------|--------| | Code | forge | Issues, PRs, reviews | **Implemented** — Live | | CI/CD | Woodpecker | Build/test results | **Implemented** — Live | | Protocol | Ponder / GraphQL | On-chain state, trades, positions | **Partial** — Live (not yet wired to evidence) | | Infrastructure | DigitalOcean / system stats | CPU, RAM, disk, containers | **Planned** — Supervisor monitors, no evidence output yet | | User experience | Playwright personas | Conversion, friction, journey completion | **Partial** — Scripts exist (`run-usertest.sh`), no evidence output yet | | Engagement | Caddy access logs | Visitors, referral sources, page paths | **Implemented** — `site/collect-engagement.sh` | | Funnel | Analytics (future) | Bounce rate, conversion, retention | **Planned** — Not started | Agents won't need to understand each platform. **Processes act as adapters** — they will read a platform's API and write structured evidence to git. ``` [Caddy logs] ──→ collect-engagement process ──→ evidence/engagement/YYYY-MM-DD.json [Google Analytics] ──→ measure-funnel process ──→ evidence/funnel/YYYY-MM-DD.json [Ponder GraphQL] ──→ measure-protocol process ──→ evidence/protocol/YYYY-MM-DD.json [System stats] ──→ measure-resources process ──→ evidence/resources/YYYY-MM-DD.json [Playwright] ──→ run-user-test process ──→ evidence/user-test/YYYY-MM-DD.json ``` The planner will read `evidence/` — not Analytics, not Ponder, not DigitalOcean. Evidence is the normalized interface between the world and decisions. > **Terminology note — "process" vs "formula":** In this document, "process" means a self-contained measurement or mutation pipeline that reads an external platform and writes structured evidence to git. This is distinct from disinto's "formulas" (`formulas/*.toml`), which are TOML issue templates that guide agents through multi-step operational work (see `AGENTS.md` § Directory layout). Processes produce evidence; formulas orchestrate agent tasks. ## Process Types ### Sense processes Produce evidence without modifying the project under test. Some sense processes are pure reads (API calls, system stats); others — `run-holdout` and `run-user-test` — spawn a Docker stack (containers, volumes, networks) that requires the Docker daemon and leaves ephemeral state on the host until explicitly torn down. These are **not** safe to treat as no-op: they consume resources and mutate host-level Docker state. | Process | Measures | Platform | Resource profile | Status | |---------|----------|----------|-----------------|--------| | `run-holdout` | Code quality against blind scenarios | Playwright + docker stack | Spawns Docker stack (containers + volumes + networks); requires Docker daemon; leaves ephemeral state until torn down | **Implemented** — `evaluate.sh` exists (harb #977) | | `run-user-test` | UX quality across 5 personas | Playwright + docker stack | Spawns Docker stack (containers + volumes + networks); requires Docker daemon; leaves ephemeral state until torn down | **Implemented** — `run-usertest.sh` exists (harb #978) | | `measure-resources` | Infra state (CPU, RAM, disk, containers) | System / DigitalOcean API | Read-only API calls. Safe to run anytime | **Planned** | | `measure-protocol` | On-chain health (floor, reserves, volume) | Ponder GraphQL | Read-only API calls. Safe to run anytime | **Planned** | | `collect-engagement` | Visitor engagement (visitors, referrers, pages) | Caddy access logs | Read-only log parsing. Safe to run anytime | **Implemented** — `site/collect-engagement.sh` (disinto #718) | | `measure-funnel` | User conversion and retention | Analytics API | Read-only API calls. Safe to run anytime | **Planned** | ### Mutation processes (create change) Will produce new artifacts. Consume significant resources. Results delivered via PR. | Process | Produces | Consumes | Status | |---------|----------|----------|--------| | `run-evolution` | Better optimizer candidates (`.push3` programs) | CPU-heavy: transpile + compile + deploy + attack per candidate | **Implemented** — `evolve.sh` exists (harb #975) | | `run-red-team` | Evidence (floor held?) + new attack vectors | CPU + RAM for revm evaluation | **Implemented** — `red-team.sh` exists (harb #976) | ### Feedback loops Mutation processes will feed each other: ``` red-team discovers attack → new vector added to attacks/ via PR → evolution scores candidates against harder attacks → better optimizers survive → red-team runs again against improved candidates ``` The planner won't need to know this loop exists as a rule. It will emerge from evidence: "new attack vectors landed since last evolution run → evolution scores are stale → run evolution." ## Evidence Directory > **Not yet created.** See harb #973 for the implementation issue. ``` evidence/ engagement/ # Visitor counts, referrers, page paths (from Caddy logs) evolution/ # Run params, generation stats, best fitness, champion red-team/ # Per-attack results, floor held/broken, ETH extracted holdout/ # Per-scenario pass/fail, gate decision user-test/ # Per-persona reports, friction points resources/ # CPU, RAM, disk, container state protocol/ # On-chain metrics from Ponder funnel/ # Analytics conversion data (future) ``` Each file will be dated JSON. Machine-readable. Git history will show trends. The planner will diff against previous runs to detect improvement or regression. ## Delivery Pattern Every process will follow the same delivery contract: 1. **Evidence** (metrics/reports) → committed to `evidence/` on main 2. **Artifacts** (code changes, new attack vectors, evolved programs) → PR 3. **Summary** → issue comment with key metrics and link to evidence file ## Evidence-Gated Deployment Deployment will not be a human decision or a calendar event. It will be the natural consequence of all evidence dimensions being green: - **Holdout:** 90% scenarios pass - **Red-team:** Floor holds on all known attacks - **User-test:** All personas complete journey, newcomers convert - **Evolution:** Champion fitness above threshold - **Protocol metrics:** ETH reserve growing, floor ratcheting up - **Funnel:** Bounce rate below target, conversion above target When all dimensions pass their thresholds, deployment becomes the obvious next action. Until then, the planner will know **which dimension is weakest** and focus resources there. ## Resource Allocation The planner will optimize resource allocation across all processes. When the box is idle, it will find the highest-value use of compute based on evidence staleness and current gaps. Pure-read sense processes (API queries, system stats) are cheap — run them freely to keep evidence fresh. Docker-based sense processes (`run-holdout`, `run-user-test`) are heavier: they spin up full stacks and should be scheduled when the box has capacity. Mutation processes are expensive — run them when evidence justifies the cost. The planner will read evidence recency and decide: - "Red-team results are from before the VWAP fix → re-run" - "User-tests haven't run since February → stale" - "Evolution scored against 4 attacks but we now have 6 → outdated" - "Box is idle, no CI running → good time for evolution" No schedules. No hardcoded rules. The planner's judgment, informed by evidence. ## What Disinto Is Not Disinto is not a general-purpose company operating system. It does not model arbitrary resources or business processes. It is finely tuned for one thing: **money → software product → customer contact → knowledge → product improvement → market fit → more money.** Every agent, process, and evidence type serves this loop. ## Related Issues - harb #973 — Evidence directory structure - harb #974 — Red-team attack vector auto-promotion - harb #975 — `run-evolution` process - harb #976 — `run-red-team` process - harb #977 — `run-holdout` process - harb #978 — `run-user-test` process - disinto #139 — Action agent (process executor) - disinto #140 — Prediction agent (evidence reader) - disinto #142 — Planner triages predictions