> **Status: Partially Implemented** — This document describes the target evidence architecture. Items marked **Implemented** exist in the codebase; items marked **Partial** have upstream scripts but no evidence output yet; all others are **Planned**. See AGENTS.md for the current operational state.
Disinto is purpose-built for one loop: **build software, launch it, improve it, reach market fit.**
This document describes how autonomous agents will sense the world, produce evidence, and use that evidence to make decisions — from "which issue to work on next" to "is this ready to deploy."
## The Loop
```
build → measure → evidence good enough?
no → improve → build again
yes → deploy → measure in-market → evidence still good?
no → improve → build again
yes → expand
```
Every decision in this loop will be driven by evidence, not intuition. The planner will read structured evidence across all dimensions, identify the weakest one, and focus there.
The planner will read `$OPS_REPO_ROOT/evidence/` — not Analytics, not Ponder, not DigitalOcean. Evidence is the normalized interface between the world and decisions.
> **Terminology note — "process" vs "formula":** In this document, "process" means a self-contained measurement or mutation pipeline that reads an external platform and writes structured evidence to git. This is distinct from disinto's "formulas" (`formulas/*.toml`), which are TOML issue templates that guide agents through multi-step operational work (see `AGENTS.md` § Directory layout). Processes produce evidence; formulas orchestrate agent tasks.
Produce evidence without modifying the project under test. Some sense processes are pure reads (API calls, system stats); others — `run-holdout` and `run-user-test` — spawn a Docker stack (containers, volumes, networks) that requires the Docker daemon and leaves ephemeral state on the host until explicitly torn down. These are **not** safe to treat as no-op: they consume resources and mutate host-level Docker state.
| `run-holdout` | Code quality against blind scenarios | Playwright + docker stack | Spawns Docker stack (containers + volumes + networks); requires Docker daemon; leaves ephemeral state until torn down | **Implemented** — `evaluate.sh` exists (harb #977) |
| `run-user-test` | UX quality across 5 personas | Playwright + docker stack | Spawns Docker stack (containers + volumes + networks); requires Docker daemon; leaves ephemeral state until torn down | **Implemented** — `run-usertest.sh` exists (harb #978) |
| `measure-resources` | Infra state (CPU, RAM, disk, containers) | System / DigitalOcean API | Read-only API calls. Safe to run anytime | **Planned** |
| `measure-protocol` | On-chain health (floor, reserves, volume) | Ponder GraphQL | Read-only API calls. Safe to run anytime | **Planned** |
red-team discovers attack → new vector added to attacks/ via PR
→ evolution scores candidates against harder attacks
→ better optimizers survive
→ red-team runs again against improved candidates
```
The planner won't need to know this loop exists as a rule. It will emerge from evidence: "new attack vectors landed since last evolution run → evolution scores are stale → run evolution."
## Evidence Directory
> **Not yet created.** See harb #973 for the implementation issue.
evolution/ # Run params, generation stats, best fitness, champion
red-team/ # Per-attack results, floor held/broken, ETH extracted
holdout/ # Per-scenario pass/fail, gate decision
user-test/ # Per-persona reports, friction points
resources/ # CPU, RAM, disk, container state
protocol/ # On-chain metrics from Ponder
funnel/ # Analytics conversion data (future)
```
Each file will be dated JSON. Machine-readable. Git history will show trends. The planner will diff against previous runs to detect improvement or regression.
## Delivery Pattern
Every process will follow the same delivery contract:
1.**Evidence** (metrics/reports) → committed to `evidence/` on main
2.**Artifacts** (code changes, new attack vectors, evolved programs) → PR
3.**Summary** → issue comment with key metrics and link to evidence file
## Evidence-Gated Deployment
Deployment will not be a human decision or a calendar event. It will be the natural consequence of all evidence dimensions being green:
- **Holdout:** 90% scenarios pass
- **Red-team:** Floor holds on all known attacks
- **User-test:** All personas complete journey, newcomers convert
- **Evolution:** Champion fitness above threshold
- **Protocol metrics:** ETH reserve growing, floor ratcheting up
When all dimensions pass their thresholds, deployment becomes the obvious next action. Until then, the planner will know **which dimension is weakest** and focus resources there.
## Resource Allocation
The planner will optimize resource allocation across all processes. When the box is idle, it will find the highest-value use of compute based on evidence staleness and current gaps.
Pure-read sense processes (API queries, system stats) are cheap — run them freely to keep evidence fresh. Docker-based sense processes (`run-holdout`, `run-user-test`) are heavier: they spin up full stacks and should be scheduled when the box has capacity.