Merge pull request 'fix: docs: per-agent AGENTS.md files still describe cron triggers (#547)' (#567) from fix/issue-547 into main

2026-04-10 08:42:02 +00:00 · 2026-04-10 08:42:02 +00:00 · da4d9077dd
commit da4d9077dd
parent 84ab6ef0a8 6a6d2d0774
7 changed files with 96 additions and 64 deletions
--- a/architect/AGENTS.md
+++ b/architect/AGENTS.md
@ -67,7 +67,7 @@ resumes the session using `--resume session_id` to preserve codebase context.
 ## Execution

 Run via `architect/architect-run.sh`, which:
- Acquires a cron lock and checks available memory
+- Acquires a poll-loop lock (via `acquire_lock`) and checks available memory
 - Cleans up per-issue scratch files from previous runs (`/tmp/architect-{project}-scratch-*.md`)
 - Sources shared libraries (env.sh, formula-session.sh)
 - Uses FORGE_ARCHITECT_TOKEN for authentication
@ -88,12 +88,10 @@ Run via `architect/architect-run.sh`, which:
 - Bash creates the PR with pitch content and posts ACCEPT/REJECT footer comment
 - Branch names use issue number (architect/sprint-vision-{issue_number}) to avoid collisions

-## Cron
+## Schedule

-Suggested cron entry (every 6 hours):
-```cron
-0 */6 * * * cd /path/to/disinto && bash architect/architect-run.sh
-```
+The architect runs every 6 hours as part of the polling loop in
+`docker/agents/entrypoint.sh` (iteration math at line 196-208).

 ## State

--- a/dev/AGENTS.md
+++ b/dev/AGENTS.md
@ -4,17 +4,36 @@
 **Role**: Implement issues autonomously — write code, push branches, address
 CI failures and review feedback.

-**Trigger**: `dev-poll.sh` runs every 10 min via cron. Sources `lib/guard.sh` and
-calls `check_active dev` first — skips if `$FACTORY_ROOT/state/.dev-active` is
-absent. Then performs a direct-merge scan (approved + CI green PRs — including
-chore/gardener PRs without issue numbers), then checks the agent lock and scans
-for ready issues using a two-tier priority queue: (1) `priority`+`backlog` issues
-first (FIFO within tier), then (2) plain `backlog` issues (FIFO). Orphaned
-in-progress issues are also picked up. The direct-merge scan runs before the lock
-check so approved PRs get merged even while a dev-agent session is active.
+**Trigger**: `dev-poll.sh` is invoked by the polling loop in `docker/agents/entrypoint.sh`
+every 5 minutes (iteration math at line 171-175). Sources `lib/guard.sh` and calls
+`check_active dev` first — skips if `$FACTORY_ROOT/state/.dev-active` is absent. Then
+performs a direct-merge scan (approved + CI green PRs — including chore/gardener PRs
+without issue numbers), then checks the agent lock and scans for ready issues using a
+two-tier priority queue: (1) `priority`+`backlog` issues first (FIFO within tier), then
+(2) plain `backlog` issues (FIFO). Orphaned in-progress issues are also picked up. The
+direct-merge scan runs before the lock check so approved PRs get merged even while a
+dev-agent session is active.

 **Key files**:
- `dev/dev-poll.sh` — Cron scheduler: finds next ready issue, handles merge/rebase of approved PRs, tracks CI fix attempts. `BOT_USER` is resolved once at startup via the Forge `/user` API and cached for all assignee checks. Formula guard skips issues labeled `formula`, `prediction/dismissed`, or `prediction/unreviewed`. **Race prevention**: checks issue assignee before claiming — skips if assigned to a different bot user. **Stale branch abandonment**: closes PRs and deletes branches that are behind `$PRIMARY_BRANCH` (restarts poll cycle for a fresh start). **Stale in-progress recovery**: on each poll cycle, scans for issues labeled `in-progress`. If the issue has a `vision` label, sets `BLOCKED_BY_INPROGRESS=true` and skips further stale checks (vision issues are managed by the architect). If the issue is assigned to `$BOT_USER` (this agent), checks for pending review feedback first — if an open PR has `REQUEST_CHANGES`, spawns the dev-agent to address it before setting `BLOCKED_BY_INPROGRESS=true`; otherwise just sets blocked. If assigned to another agent, logs and falls through (does not block). If no assignee, no open PR, and no agent lock file — removes `in-progress`, adds `blocked` with a human-triage comment. **Per-agent open-PR gate**: before starting new work, filters open waiting PRs to only those assigned to this agent (`$BOT_USER`). Other agents' PRs do not block this agent's pipeline (#358, #369). **Pre-lock merge scan own-PRs only**: the direct-merge scan only merges PRs whose linked issue is assigned to this agent — skips PRs owned by other bot users (#374).
+- `dev/dev-poll.sh` — Polling loop participant: finds next ready issue, handles merge/rebase
+of approved PRs, tracks CI fix attempts. Invoked by `docker/agents/entrypoint.sh` every 5
+minutes. `BOT_USER` is resolved once at startup via the Forge `/user` API and cached for
+all assignee checks. Formula guard skips issues labeled `formula`, `prediction/dismissed`,
+or `prediction/unreviewed`. **Race prevention**: checks issue assignee before claiming —
+skips if assigned to a different bot user. **Stale branch abandonment**: closes PRs and
+deletes branches that are behind `$PRIMARY_BRANCH` (restarts poll cycle for a fresh start).
+**Stale in-progress recovery**: on each poll cycle, scans for issues labeled `in-progress`.
+If the issue has a `vision` label, sets `BLOCKED_BY_INPROGRESS=true` and skips further
+stale checks (vision issues are managed by the architect). If the issue is assigned to
+`$BOT_USER` (this agent), checks for pending review feedback first — if an open PR has
+`REQUEST_CHANGES`, spawns the dev-agent to address it before setting `BLOCKED_BY_INPROGRESS=true`;
+otherwise just sets blocked. If assigned to another agent, logs and falls through (does not
+block). If no assignee, no open PR, and no agent lock file — removes `in-progress`, adds
+`blocked` with a human-triage comment. **Per-agent open-PR gate**: before starting new work,
+filters open waiting PRs to only those assigned to this agent (`$BOT_USER`). Other agents'
+PRs do not block this agent's pipeline (#358, #369). **Pre-lock merge scan own-PRs only**:
+the direct-merge scan only merges PRs whose linked issue is assigned to this agent — skips
+PRs owned by other bot users (#374).
 - `dev/dev-agent.sh` — Orchestrator: claims issue, creates worktree + tmux session with interactive `claude`, monitors phase file, injects CI results and review feedback, merges on approval
 - `dev/phase-test.sh` — Integration test for the phase protocol

@ -32,9 +51,9 @@ check so approved PRs get merged even while a dev-agent session is active.

 **Crash recovery**: on `PHASE:crashed` or non-zero exit, the worktree is **preserved** (not destroyed) for debugging. Location logged. Supervisor housekeeping removes stale crashed worktrees older than 24h.

-**Lifecycle**: dev-poll.sh (`check_active dev`) → dev-agent.sh → tmux session → phase file
-drives CI/review loop → merge + `mirror_push()` → close issue. On respawn after
-`PHASE:escalate`, the stale phase file is cleared first so the session starts
-clean; the reinject prompt tells Claude not to re-escalate for the same reason.
-On respawn for any active PR, the prompt explicitly tells Claude the PR already
-exists and not to create a new one via API.
+**Lifecycle**: dev-poll.sh (invoked by polling loop, `check_active dev`) → dev-agent.sh →
+tmux session → phase file drives CI/review loop → merge + `mirror_push()` → close issue.
+On respawn after `PHASE:escalate`, the stale phase file is cleared first so the session
+starts clean; the reinject prompt tells Claude not to re-escalate for the same reason.
+On respawn for any active PR, the prompt explicitly tells Claude the PR already exists
+and not to create a new one via API.
--- a/gardener/AGENTS.md
+++ b/gardener/AGENTS.md
@ -7,18 +7,18 @@ the quality gate: strips the `backlog` label from issues that lack acceptance
 criteria checkboxes (`- [ ]`) or an `## Affected files` section. Invokes
 Claude to fix what it can; files vault items for what it cannot.

-**Trigger**: `gardener-run.sh` runs 4x/day via cron. Sources `lib/guard.sh` and
-calls `check_active gardener` first — skips if `$FACTORY_ROOT/state/.gardener-active`
-is absent. **Early-exit optimization**: if no issues, PRs, or repo files have
-changed since the last run (checked via Forgejo API and `git diff`), the model
-is not invoked — the run exits immediately (no tmux session, no tokens consumed).
-Otherwise, creates a tmux session with `claude --model sonnet`, injects
-`formulas/run-gardener.toml` as context, monitors the phase file, and cleans up
-on completion or timeout (2h max session). No action issues — the gardener runs
-directly from cron like the planner, predictor, and supervisor.
+**Trigger**: `gardener-run.sh` is invoked by the polling loop in `docker/agents/entrypoint.sh`
+every 6 hours (iteration math at line 182-194). Sources `lib/guard.sh` and calls
+`check_active gardener` first — skips if `$FACTORY_ROOT/state/.gardener-active` is absent.
+**Early-exit optimization**: if no issues, PRs, or repo files have changed since the last
+run (checked via Forgejo API and `git diff`), the model is not invoked — the run exits
+immediately (no tmux session, no tokens consumed). Otherwise, creates a tmux session with
+`claude --model sonnet`, injects `formulas/run-gardener.toml` as context, monitors the
+phase file, and cleans up on completion or timeout (2h max session). No action issues —
+the gardener runs as part of the polling loop alongside the planner, predictor, and supervisor.

 **Key files**:
- `gardener/gardener-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
+- `gardener/gardener-run.sh` — Polling loop participant + orchestrator: lock, memory guard,
  sources disinto project config, creates tmux session, injects formula prompt,
  monitors phase file via custom `_gardener_on_phase_change` callback (passed to
  `run_formula_and_monitor`). Stays alive through CI/review/merge cycle after
@ -35,8 +35,8 @@ directly from cron like the planner, predictor, and supervisor.
 - `FORGE_TOKEN`, `FORGE_GARDENER_TOKEN` (falls back to FORGE_TOKEN), `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`
 - `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by gardener-run.sh)

-**Lifecycle**: gardener-run.sh (cron 0,6,12,18) → `check_active gardener` → lock + memory guard →
-load formula + context → create tmux session →
+**Lifecycle**: gardener-run.sh (invoked by polling loop every 6h, `check_active gardener`) →
+lock + memory guard → load formula + context → create tmux session →
 Claude grooms backlog (writes proposed actions to manifest), bundles dust,
 updates AGENTS.md, commits manifest + docs to PR →
 `PHASE:awaiting_ci` (stays alive) → CI pass → `PHASE:awaiting_review` →
--- a/planner/AGENTS.md
+++ b/planner/AGENTS.md
@ -2,7 +2,7 @@
 # Planner Agent

 **Role**: Strategic planning using a Prerequisite Tree (Theory of Constraints),
-executed directly from cron via tmux + Claude.
+invoked by the polling loop in `docker/agents/entrypoint.sh` every 12 hours (iteration math at line 210-222) via tmux + Claude.
 Phase 0 (preflight): pull latest code, load persistent memory and prerequisite
 tree from `$OPS_REPO_ROOT/knowledge/planner-memory.md` and `$OPS_REPO_ROOT/prerequisites.md`. Also reads
 all available formulas: factory formulas (`$FACTORY_ROOT/formulas/*.toml`) and
@ -41,16 +41,16 @@ AGENTS.md maintenance is handled by the Gardener.
 prerequisite tree, memory, vault state) live under `$OPS_REPO_ROOT/`.
 Each project manages its own planner state in a separate ops repo.

-**Trigger**: `planner-run.sh` runs daily via cron (accepts an optional project
-TOML argument, defaults to `projects/disinto.toml`). Sources `lib/guard.sh` and
-calls `check_active planner` first — skips if `$FACTORY_ROOT/state/.planner-active`
-is absent. Then creates a tmux session with `claude --model opus`, injects
-`formulas/run-planner.toml` as context, monitors the phase file, and cleans up
-on completion or timeout. No action issues — the planner is a nervous system
-component, not work.
+**Trigger**: `planner-run.sh` is invoked by the polling loop in `docker/agents/entrypoint.sh`
+every 12 hours (iteration math at line 210-222). Accepts an optional project TOML argument,
+defaults to `projects/disinto.toml`. Sources `lib/guard.sh` and calls `check_active planner`
+first — skips if `$FACTORY_ROOT/state/.planner-active` is absent. Then creates a tmux session
+with `claude --model opus`, injects `formulas/run-planner.toml` as context, monitors the
+phase file, and cleans up on completion or timeout. No action issues — the planner is a
+nervous system component, not work.

 **Key files**:
- `planner/planner-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
+- `planner/planner-run.sh` — Polling loop participant + orchestrator: lock, memory guard,
  sources disinto project config, builds structural analysis via `lib/formula-session.sh:build_graph_section()`,
  creates tmux session, injects formula prompt, monitors phase file, handles crash recovery, cleans up
 - `formulas/run-planner.toml` — Execution spec: six steps (preflight,
--- a/predictor/AGENTS.md
+++ b/predictor/AGENTS.md
@ -22,14 +22,15 @@ exploit counts as 2 (prediction + action dispatch). The predictor MUST NOT
 emit feature work — only observations challenging claims, exposing gaps,
 and surfacing risks.

-**Trigger**: `predictor-run.sh` runs daily at 06:00 UTC via cron (1h before
-the planner at 07:00). Sources `lib/guard.sh` and calls `check_active predictor`
-first — skips if `$FACTORY_ROOT/state/.predictor-active` is absent. Also guarded
-by PID lock (`/tmp/predictor-run.lock`) and memory check (skips if available
-RAM < 2000 MB).
+**Trigger**: `predictor-run.sh` is invoked by the polling loop in `docker/agents/entrypoint.sh`
+every 24 hours (iteration math at line 224-236). Sources `lib/guard.sh` and calls
+`check_active predictor` first — skips if `$FACTORY_ROOT/state/.predictor-active` is absent.
+Also guarded by PID lock (`/tmp/predictor-run.lock`) and memory check (skips if available
+RAM < 2000 MB). Note: the 24h cadence is iteration-based, not anchored to 06:00 UTC —
+drifts on container restart.

 **Key files**:
- `predictor/predictor-run.sh` — Cron wrapper + orchestrator: active-state guard,
+- `predictor/predictor-run.sh` — Polling loop participant + orchestrator: active-state guard,
  lock, memory guard, sources disinto project config, builds structural analysis
  via `lib/formula-session.sh:build_graph_section()` (full-project scan — results
  included in prompt as `## Structural analysis`; failures non-fatal), builds
@ -44,7 +45,7 @@ RAM < 2000 MB).
 - `FORGE_TOKEN`, `FORGE_PREDICTOR_TOKEN` (falls back to FORGE_TOKEN), `FORGE_REPO`, `FORGE_API`, `PROJECT_NAME`, `PROJECT_REPO_ROOT`, `OPS_REPO_ROOT`
 - `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by predictor-run.sh)

-**Lifecycle**: predictor-run.sh (daily 06:00 cron) → lock + memory guard →
+**Lifecycle**: predictor-run.sh (invoked by polling loop every 24h) → lock + memory guard →
 load formula + context (AGENTS.md, VISION.md from code repo; RESOURCES.md, prerequisites.md from ops repo)
 → create tmux session → Claude fetches prediction history (open + closed) →
 reviews track record (actioned/dismissed/watching) → finds weaknesses
--- a/review/AGENTS.md
+++ b/review/AGENTS.md
@ -4,13 +4,26 @@
 **Role**: AI-powered PR review — post structured findings and formal
 approve/request-changes verdicts to forge.

-**Trigger**: `review-poll.sh` runs every 10 min via cron. It scans open PRs
-whose CI has passed and that lack a review for the current HEAD SHA, then
-spawns `review-pr.sh <pr-number>`.
+**Trigger**: `review-poll.sh` is invoked by the polling loop in `docker/agents/entrypoint.sh`
+every 5 minutes (iteration math at line 163-167). It scans open PRs whose CI has passed and
+that lack a review for the current HEAD SHA, then spawns `review-pr.sh <pr-number>`.

 **Key files**:
- `review/review-poll.sh` — Cron scheduler: finds unreviewed PRs with passing CI. Sources `lib/guard.sh` and calls `check_active reviewer` — skips if `$FACTORY_ROOT/state/.reviewer-active` is absent. **Circuit breaker**: counts existing `<!-- review-error: <sha> -->` comments; skips a PR if ≥3 consecutive errors for the same HEAD SHA (prevents flooding on repeated review failures).
- `review/review-pr.sh` — Creates/reuses a tmux session (`review-{project}-{pr}`), injects PR diff, waits for Claude to write structured JSON output, posts markdown review + formal forge review, auto-creates follow-up issues for pre-existing tech debt. **cd at startup**: changes to `$PROJECT_REPO_ROOT` early in the script — before any git commands — because the factory root is not a git repo after image rebuild (#408). Calls `resolve_forge_remote()` at startup to determine the correct git remote name (avoids hardcoded 'origin'). Before starting the session, runs `lib/build-graph.py --changed-files <PR files>` and appends the JSON structural analysis (affected objectives, orphaned prerequisites, thin evidence) to the review prompt. Graph failures are non-fatal — review proceeds without it.
+- `review/review-poll.sh` — Polling loop participant: finds unreviewed PRs with passing CI.
+Invoked by `docker/agents/entrypoint.sh` every 5 minutes. Sources `lib/guard.sh` and calls
+`check_active reviewer` — skips if `$FACTORY_ROOT/state/.reviewer-active` is absent.
+**Circuit breaker**: counts existing `<!-- review-error: <sha> -->` comments; skips a PR
+if ≥3 consecutive errors for the same HEAD SHA (prevents flooding on repeated review failures).
+- `review/review-pr.sh` — Polling loop participant: Creates/reuses a tmux session
+(`review-{project}-{pr}`), injects PR diff, waits for Claude to write structured JSON output,
+posts markdown review + formal forge review, auto-creates follow-up issues for pre-existing
+tech debt. **cd at startup**: changes to `$PROJECT_REPO_ROOT` early in the script — before
+any git commands — because the factory root is not a git repo after image rebuild (#408).
+Calls `resolve_forge_remote()` at startup to determine the correct git remote name (avoids
+hardcoded 'origin'). Before starting the session, runs `lib/build-graph.py --changed-files
+<PR files>` and appends the JSON structural analysis (affected objectives, orphaned
+prerequisites, thin evidence) to the review prompt. Graph failures are non-fatal — review
+proceeds without it.

 **Environment variables consumed**:
 - `FORGE_TOKEN` — Dev-agent token (must not be the same account as FORGE_REVIEW_TOKEN)
--- a/supervisor/AGENTS.md
+++ b/supervisor/AGENTS.md
@ -7,15 +7,16 @@ then runs an interactive Claude session (sonnet) that assesses health, auto-fixe
 issues, and writes a daily journal. When blocked on external
 resources or human decisions, files vault items instead of escalating directly.

-**Trigger**: `supervisor-run.sh` runs every 20 min via cron. Sources `lib/guard.sh`
-and calls `check_active supervisor` first — skips if
-`$FACTORY_ROOT/state/.supervisor-active` is absent. Then runs `claude -p`
-via `agent-sdk.sh`, injects `formulas/run-supervisor.toml` with
-pre-collected metrics as context, and cleans up on completion or timeout (20 min max session).
-No action issues — the supervisor runs directly from cron like the planner and predictor.
+**Trigger**: `supervisor-run.sh` is invoked by the polling loop in `docker/edge/entrypoint-edge.sh`
+every 20 minutes (line 50-53). Sources `lib/guard.sh` and calls `check_active supervisor` first
+— skips if `$FACTORY_ROOT/state/.supervisor-active` is absent. Then runs `claude -p` via
+`agent-sdk.sh`, injects `formulas/run-supervisor.toml` with pre-collected metrics as context,
+and cleans up on completion or timeout (20 min max session). Note: the supervisor runs in the
+**edge container** (`entrypoint-edge.sh`), not the agent container — this distinction matters
+for operators debugging the factory.

 **Key files**:
- `supervisor/supervisor-run.sh` — Cron wrapper + orchestrator: lock, memory guard,
+- `supervisor/supervisor-run.sh` — Polling loop participant + orchestrator: lock, memory guard,
  runs preflight.sh, sources disinto project config, runs claude -p via agent-sdk.sh,
  injects formula prompt with metrics, handles crash recovery
 - `supervisor/preflight.sh` — Data collection: system resources (RAM, disk, swap,
@ -46,6 +47,6 @@ P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).
 - Files vault items locally to `$PROJECT_REPO_ROOT/vault/pending/`
 - Logs a WARNING message at startup indicating degraded mode

-**Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run
-preflight.sh (collect metrics) → load formula + context → run claude -p via agent-sdk.sh
-→ Claude assesses health, auto-fixes, writes journal → `PHASE:done`.
+**Lifecycle**: supervisor-run.sh (invoked by polling loop every 20min, `check_active supervisor`)
+→ lock + memory guard → run preflight.sh (collect metrics) → load formula + context → run
+claude -p via agent-sdk.sh → Claude assesses health, auto-fixes, writes journal → `PHASE:done`.