From eb3327d2c9ada06fc3587d1857bb73c2f544bf57 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 11 Apr 2026 20:45:04 +0000 Subject: [PATCH] chore: gardener housekeeping 2026-04-11 --- AGENTS.md | 7 +++--- architect/AGENTS.md | 2 +- dev/AGENTS.md | 2 +- gardener/AGENTS.md | 2 +- gardener/pending-actions.json | 43 ++++++++--------------------------- lib/AGENTS.md | 3 ++- planner/AGENTS.md | 2 +- predictor/AGENTS.md | 2 +- review/AGENTS.md | 2 +- supervisor/AGENTS.md | 2 +- 10 files changed, 21 insertions(+), 46 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 48aea6b..e9f12e1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,4 +1,4 @@ - + # Disinto — Agent Instructions ## What this repo is @@ -36,7 +36,7 @@ disinto/ (code repo) │ validate.sh — vault item validator │ examples/ — example vault action TOMLs (promote, publish, release, webhook-call) ├── lib/ env.sh, agent-sdk.sh, ci-helpers.sh, ci-debug.sh, load-project.sh, parse-deps.sh, guard.sh, mirrors.sh, pr-lifecycle.sh, issue-lifecycle.sh, worktree.sh, formula-session.sh, stack-lock.sh, forge-setup.sh, forge-push.sh, ops-setup.sh, ci-setup.sh, generators.sh, hire-agent.sh, release.sh, build-graph.py, -│ branch-protection.sh, secret-scan.sh, tea-helpers.sh, vault.sh, ci-log-reader.py +│ branch-protection.sh, secret-scan.sh, tea-helpers.sh, vault.sh, ci-log-reader.py, git-creds.sh │ hooks/ — Claude Code session hooks (on-compact-reinject, on-idle-stop, on-phase-change, on-pretooluse-guard, on-session-end, on-stop-failure) ├── projects/ *.toml.example — templates; *.toml — local per-box config (gitignored) ├── formulas/ Issue templates (TOML specs for multi-step agent tasks) @@ -197,5 +197,4 @@ at each phase boundary by writing to a phase file (e.g. Key phases: `PHASE:awaiting_ci` → `PHASE:awaiting_review` → `PHASE:done`. Also: `PHASE:escalate` (needs human input), `PHASE:failed`. -See [docs/PHASE-PROTOCOL.md](docs/PHASE-PROTOCOL.md) for the complete spec -including the orchestrator reaction matrix, sequence diagram, and crash recovery. +See [docs/PHASE-PROTOCOL.md](docs/PHASE-PROTOCOL.md) for the complete spec, orchestrator reaction matrix, sequence diagram, and crash recovery. diff --git a/architect/AGENTS.md b/architect/AGENTS.md index f5fddf4..57fa3dd 100644 --- a/architect/AGENTS.md +++ b/architect/AGENTS.md @@ -1,4 +1,4 @@ - + # Architect — Agent Instructions ## What this agent is diff --git a/dev/AGENTS.md b/dev/AGENTS.md index f5dce0e..97d1a10 100644 --- a/dev/AGENTS.md +++ b/dev/AGENTS.md @@ -1,4 +1,4 @@ - + # Dev Agent **Role**: Implement issues autonomously — write code, push branches, address diff --git a/gardener/AGENTS.md b/gardener/AGENTS.md index f2e15a3..9d3104b 100644 --- a/gardener/AGENTS.md +++ b/gardener/AGENTS.md @@ -1,4 +1,4 @@ - + # Gardener Agent **Role**: Backlog grooming — detect duplicate issues, missing acceptance diff --git a/gardener/pending-actions.json b/gardener/pending-actions.json index f66edcd..a4b254b 100644 --- a/gardener/pending-actions.json +++ b/gardener/pending-actions.json @@ -1,47 +1,22 @@ [ - { - "action": "close", - "issue": 419, - "reason": "Vision goal complete — all sub-issues #437-#454 closed, vault blast-radius redesign delivered" - }, - { - "action": "close", - "issue": 494, - "reason": "Resolved by PRs #502 and #503 (both merged) — repo_root workaround removed, container paths derived at runtime" - }, - { - "action": "close", - "issue": 477, - "reason": "Obsolete — #379 (while-true loop) was deployed on 2026-04-08; env.sh container guard is now correct behavior, no revert needed" - }, { "action": "edit_body", - "issue": 498, - "body": "Flagged by AI reviewer in PR #496.\n\n## Problem\n\n`has_responses_to_process` is only set to `true` inside the `open_arch_prs >= 3` gate in `architect/architect-run.sh` (line 543). When fewer than 3 architect PRs are open, ACCEPT/REJECT responses on existing PRs are never processed — the response-processing block at line 687 defaults to `false` and is skipped entirely.\n\nThis means that if a user ACCEPTs or REJECTs a pitch while the open PR count is below 3, the architect agent will never handle the response.\n\n## Fix\n\nSet `has_responses_to_process` (or an equivalent guard) unconditionally by scanning open PRs for ACCEPT/REJECT responses, not only when the 3-PR cap is hit.\n\n---\n*Auto-created from AI review*\n\n## Acceptance criteria\n\n- [ ] `has_responses_to_process` is computed by scanning open architect PRs for ACCEPT/REJECT responses regardless of `open_arch_prs` count\n- [ ] When a user posts ACCEPT or REJECT on an architect PR and open PR count < 3, the response is processed in the same run\n- [ ] Existing behavior when `open_arch_prs >= 3` is unchanged\n- [ ] ShellCheck passes on modified files\n\n## Affected files\n\n- `architect/architect-run.sh` (lines ~543 and ~687 — response-processing gate)" + "issue": 690, + "body": "## Symptom\n\nEvery architect run since the polling-loop hotfix at 2026-04-11 19:35 UTC has fired correctly but produced **0 sprint pitches**. From `/home/agent/data/logs/architect/architect.log` for the 20:23 run:\n\n```\n[2026-04-11T20:23:27Z] architect: Generating pitch for vision issue #623\n[2026-04-11T20:23:27Z] architect: agent_run: starting (resume=(new), dir=/home/agent/repos/disinto)\nError: Input must be provided either through stdin or as a prompt argument when using --print\n[2026-04-11T20:23:29Z] architect: agent_run: empty output (claude may have crashed or failed, exit code: 0)\n[2026-04-11T20:23:29Z] architect: WARNING: empty pitch generated for vision issue #623\n[2026-04-11T20:23:29Z] architect: WARNING: failed to generate pitch for vision issue #623\n...\n[2026-04-11T20:23:31Z] architect: Generated 0 sprint pitch(es)\n```\n\n3 vision issues processed (#647, #623, #557), all failed identically. The wrapper still exits 0 and writes a journal entry, so the polling loop thinks it's healthy — but no architect work is actually happening.\n\n## Root cause\n\n`architect/architect-run.sh:519` calls `agent_run` with the **old positional-after-flags** signature:\n\n```bash\npitch_output=$(agent_run -p \"$pitch_prompt\" --output-format json --dangerously-skip-permissions --max-turns 200 ${CLAUDE_MODEL:+--model \"$CLAUDE_MODEL\"} 2>>\"$LOGFILE\") || true\n```\n\nBut `lib/agent-sdk.sh:123-156` was rewritten so `agent_run` now adds `-p`, `--output-format`, `--max-turns`, `--dangerously-skip-permissions`, and `--model` *itself*, and expects only `[--resume X] [--worktree Y] PROMPT`:\n\n```bash\nagent_run() {\n local resume_id=\"\" worktree_dir=\"\"\n while [[ \"${1:-}\" == --* ]]; do\n case \"$1\" in\n --resume) shift; resume_id=\"${1:-}\"; shift ;;\n --worktree) shift; worktree_dir=\"${1:-}\"; shift ;;\n *) shift ;;\n esac\n done\n local prompt=\"${1:-}\"\n\n local -a args=(-p \"$prompt\" --output-format json --dangerously-skip-permissions --max-turns 200)\n ...\n}\n```\n\nWhen architect calls `agent_run -p \"$pitch_prompt\" --output-format json …`:\n\n- `$1` = `-p` (does NOT match `--*` — pattern requires two dashes), so the option-stripping loop exits immediately\n- `local prompt=\"${1:-}\"` → `prompt=\"-p\"`\n- `args=(-p \"-p\" --output-format json …)` is passed to `claude`\n- `claude` sees `-p` repeated and effectively no prompt → `Error: Input must be provided either through stdin or as a prompt argument when using --print`\n\nThe bug has been latent since `agent_run` was rewritten. It only became visible today because before the cadence change to `ARCHITECT_INTERVAL=540`, architect ran every 6h and most operators didn't notice the empty output. With the new ~45-min cadence (9-iteration aliasing of 540s against 300s POLL_INTERVAL), the failure became obvious within an hour.\n\n`dev-agent.sh:894` already uses the correct new signature for comparison:\n\n```bash\nagent_run \"${RESUME_ARGS[@]}\" --worktree \"$WORKTREE\" \"$PROMPT_FOR_MODE\"\n```\n\n## Fix\n\nChange `architect/architect-run.sh:519` from:\n\n```bash\npitch_output=$(agent_run -p \"$pitch_prompt\" --output-format json --dangerously-skip-permissions --max-turns 200 ${CLAUDE_MODEL:+--model \"$CLAUDE_MODEL\"} 2>>\"$LOGFILE\") || true\n```\n\nto:\n\n```bash\npitch_output=$(agent_run \"$pitch_prompt\" 2>>\"$LOGFILE\") || true\n```\n\n`agent_run` already injects `--output-format json`, `--dangerously-skip-permissions`, `--max-turns 200`, and `--model \"$CLAUDE_MODEL\"` via its internal `args` array, so they don't need to be passed at the call site.\n\nSearch the rest of the tree for the same anti-pattern — likely candidates: `gardener/gardener-run.sh`, `planner/planner-run.sh`, `predictor/predictor-run.sh`, `vault/`, `supervisor/`. Any `agent_run -p …` callsites should be updated.\n\n```sh\ngrep -rn 'agent_run -p\\|agent_run --output-format' .\n```\n\n## Bonus bug\n\nThe architect wrapper's exit-status handling is wrong. Even with all pitches empty, `architect-run.sh` still exits 0 and writes a journal entry. It should at minimum:\n- Log a top-level WARNING / fail the run when `Generated 0 sprint pitch(es)` happens *and* there were vision issues to process\n- Propagate non-zero exit so the polling loop's eventual health checks can detect it\n\nThat's a smaller follow-up but worth filing separately if confirmed in the other slow agents too (planner runs in ~5 min, gardener in ~10 min — we'll see if they hit the same `--print` bug).\n\n## Impact\n\nArchitect has been silently producing no output since the `agent_run` rewrite — possibly weeks. Vision issues have not been getting pitches generated, so the planner has nothing to schedule from the architect's side, so the dev-poll backlog never gets refreshed from the vision pipeline. The factory's \"self-development\" loop has been broken at the architect → planner handoff for an unknown amount of time.\n\n## Verification after fix\n\nAfter the one-line change merges and is deployed:\n\n```sh\ndocker exec disinto-agents bash -c \"cd /home/agent/repos/_factory && DISINTO_AGENTS=architect bash architect/architect-run.sh projects/disinto.toml\"\ntail -50 /home/agent/data/logs/architect/architect.log\n```\n\nExpect: at least one `Generated N sprint pitch(es)` with N > 0, no `Error: Input must be provided` lines, and a non-empty `pitch_output` JSON for at least one vision issue.\n\n## Files\n\n- `architect/architect-run.sh` — line 519, the broken call\n- (search) other slow agents likely affected by the same drift\n\n## Acceptance criteria\n- [ ] `architect/architect-run.sh` call to `agent_run` updated to new signature (prompt-only, no flags)\n- [ ] All `agent_run -p …` / `agent_run --output-format` call sites in the repo found and updated\n- [ ] Architect produces at least one sprint pitch when vision issues are open\n- [ ] CI green\n\n## Affected files\n- `architect/architect-run.sh` — line ~519, broken `agent_run` call site\n- `gardener/gardener-run.sh` — possible same drift\n- `planner/planner-run.sh` — possible same drift\n- `predictor/predictor-run.sh` — possible same drift\n- `supervisor/supervisor-run.sh` — possible same drift\n" }, { - "action": "add_label", - "issue": 498, + "action": "remove_label", + "issue": 647, "label": "backlog" }, - { - "action": "edit_body", - "issue": 499, - "body": "Flagged by AI reviewer in PR #496.\n\n## Problem\n\nIn `architect/architect-run.sh` line 203, the `has_open_subissues` function compares `.number` (a JSON integer) against `$vid` (a bash string via `--arg`). In jq, `42 != \"42\"` evaluates to true (different types are never equal), so the self-exclusion filter never fires. In practice this is low-risk since vision issues don't contain 'Decomposed from #N' in their own bodies, but the self-exclusion logic is silently broken.\n\n## Fix\n\nCast the string to a number in jq: `select(.number != ($vid | tonumber))`\n\n---\n*Auto-created from AI review*\n\n## Acceptance criteria\n\n- [ ] `has_open_subissues` self-exclusion filter correctly excludes the vision issue itself using `($vid | tonumber)` cast\n- [ ] A vision issue does not appear in its own subissue list\n- [ ] ShellCheck passes on modified files\n\n## Affected files\n\n- `architect/architect-run.sh` (line ~203 — `has_open_subissues` jq filter)" - }, { "action": "add_label", - "issue": 499, - "label": "backlog" + "issue": 647, + "label": "vision" }, { - "action": "edit_body", - "issue": 471, - "body": "## Bug description\n\nWhen dev-bot picks a backlog issue and launches dev-agent.sh, a second dev-poll instance (dev-qwen) can race ahead and mark the issue as stale/blocked before dev-agent.sh finishes claiming it.\n\n## Reproduction\n\nObserved on issues #443 and #445 (2026-04-08):\n\n**#443 timeline:**\n- `20:39:03` — dev-bot removes `backlog`, adds `in-progress` (via dev-poll backlog pickup)\n- `20:39:04` — dev-qwen removes `in-progress`, adds `blocked` with reason `no_assignee_no_open_pr_no_lock`\n- `20:40:11` — dev-bot pushes commit (dev-agent was actually working the whole time)\n- `20:44:02` — PR merged, issue closed\n\n**#445 timeline:**\n- `20:54:03` — dev-bot adds `in-progress`\n- `20:54:06` — dev-qwen marks `blocked` (3 seconds later)\n- `20:55:13` — dev-bot pushes commit\n- `21:09:03` — PR merged, issue closed\n\nIn both cases, the work completed successfully despite being labeled blocked.\n\n## Root cause\n\n`issue_claim()` in `lib/issue-lifecycle.sh` performs three sequential API calls:\n1. PATCH assignee\n2. POST in-progress label\n3. DELETE backlog label\n\nMeanwhile, dev-poll on another agent (dev-qwen) runs its orphan scan, sees the issue labeled `in-progress` but with no assignee set yet (assign PATCH hasn't landed or was read stale), no open PR, and no lock file. It concludes the issue is stale and relabels to `blocked`.\n\nThe race window is ~1-3 seconds between in-progress being set and the assignee being visible to other pollers.\n\n## Impact\n\n- Issues get spuriously labeled `blocked` with a misleading stale diagnostic comment\n- dev-agent continues working anyway (it already has the issue number), so the blocked label is just noise\n- But it could confuse the gardener or humans reading the issue timeline\n- If another dev-poll instance picks up the blocked issue for recovery before the original agent finishes, it could cause duplicate work\n\n## Possible fixes\n\n1. **Assign before labeling**: In `issue_claim()`, set the assignee first, then add in-progress. This way, by the time in-progress is visible, the assignee is already set.\n2. **Grace period in stale detection**: Skip issues whose in-progress label was added less than N seconds ago (check label event timestamp via timeline API).\n3. **Lock file before label**: Write the agent lock file (`/tmp/dev-impl-summary-...`) at the start of dev-agent.sh before calling `issue_claim()`, so the stale detector sees the lock.\n4. **Atomic claim check**: dev-poll should re-check assignee after a short delay before declaring stale, to allow for API propagation.\n\n## Acceptance criteria\n\n- [ ] Stale detection in dev-poll does not mark an issue as blocked within the first 60 seconds of the in-progress label being applied\n- [ ] `issue_claim()` assigns the issue before adding the in-progress label (or equivalent fix is implemented)\n- [ ] No spurious `blocked` labels appear on issues that are actively being worked (verified by log inspection or integration test)\n- [ ] ShellCheck passes on modified files\n\n## Affected files\n\n- `lib/issue-lifecycle.sh` — `issue_claim()` function (assignee + label ordering)\n- `dev/dev-poll.sh` — orphan/stale detection logic" - }, - { - "action": "add_label", - "issue": 471, - "label": "backlog" + "action": "comment", + "issue": 647, + "body": "Quality gate: stripped `backlog` label — this issue is missing required backlog sections (`## Acceptance criteria` with checkboxes and `## Affected files`). The body also explicitly states this is a vision issue, not backlog. Relabeled as `vision`. When the work is ready for implementation, please use the issue templates at `.forgejo/ISSUE_TEMPLATE/` to add the required sections before re-adding `backlog`." } ] diff --git a/lib/AGENTS.md b/lib/AGENTS.md index f69fea9..50fa9d9 100644 --- a/lib/AGENTS.md +++ b/lib/AGENTS.md @@ -1,4 +1,4 @@ - + # Shared Helpers (`lib/`) All agents source `lib/env.sh` as their first action. Additional helpers are @@ -27,6 +27,7 @@ sourced as needed. | `lib/agent-sdk.sh` | `agent_run([--resume SESSION_ID] [--worktree DIR] PROMPT)` — one-shot `claude -p` invocation with session persistence. Saves session ID to `SID_FILE`, reads it back on resume. `agent_recover_session()` — restore previous session ID from `SID_FILE` on startup. **Nudge guard**: skips nudge injection if the worktree is clean and no push is expected, preventing spurious re-invocations. Callers must define `SID_FILE`, `LOGFILE`, and `log()` before sourcing. **Concurrency**: every `claude` invocation is wrapped in `flock -w 600` on `${HOME}/.claude/session.lock` to serialize OAuth refresh across containers — see [`docs/CLAUDE-AUTH-CONCURRENCY.md`](../docs/CLAUDE-AUTH-CONCURRENCY.md) for why this is load-bearing and when a new container should bypass it. | formula-driven agents (dev-agent, planner-run, predictor-run, gardener-run) | | `lib/forge-setup.sh` | `setup_forge()` — Forgejo instance provisioning: creates admin user, bot accounts, org, repos (code + ops), configures webhooks, sets repo topics. Extracted from `bin/disinto`. Requires `FORGE_URL`, `FORGE_TOKEN`, `FACTORY_ROOT`. **Password storage (#361)**: after creating each bot account, stores its password in `.env` as `FORGE__PASS` (e.g. `FORGE_PASS`, `FORGE_REVIEW_PASS`, etc.) for use by `forge-push.sh`. | bin/disinto (init) | | `lib/forge-push.sh` | `push_to_forge()` — pushes a local clone to the Forgejo remote and verifies the push. `_assert_forge_push_globals()` validates required env vars before use. Requires `FORGE_URL`, `FORGE_PASS`, `FACTORY_ROOT`, `PRIMARY_BRANCH`. **Auth**: uses `FORGE_PASS` (bot password) for git HTTP push — Forgejo 11.x rejects API tokens for `git push` (#361). | bin/disinto (init) | +| `lib/git-creds.sh` | Shared git credential helper configuration. `configure_git_creds([HOME_DIR] [RUN_AS_CMD])` — writes a static credential helper script and configures git globally to use password-based HTTP auth (Forgejo 11.x rejects API tokens for `git push`, #361). `repair_baked_cred_urls([--as RUN_AS_CMD] DIR ...)` — rewrites any git remote URLs that have credentials baked in to use clean URLs instead; uses `safe.directory` bypass for root-owned repos (#671). Requires `FORGE_PASS`, `FORGE_URL`, `FORGE_TOKEN`. | entrypoints (agents, edge) | | `lib/ops-setup.sh` | `setup_ops_repo()` — creates ops repo on Forgejo if it doesn't exist, configures bot collaborators, clones/initializes ops repo locally, seeds directory structure (vault, knowledge, evidence, sprints). Evidence subdirectories seeded: engagement/, red-team/, holdout/, evolution/, user-test/. Also seeds sprints/ for architect output. Exports `_ACTUAL_OPS_SLUG`. `migrate_ops_repo(ops_root, [primary_branch])` — idempotent migration helper that seeds missing directories and .gitkeep files on existing ops repos (pre-#407 deployments). | bin/disinto (init) | | `lib/ci-setup.sh` | `_install_cron_impl()` — installs crontab entries for bare-metal deployments (compose mode uses polling loop instead). `_create_woodpecker_oauth_impl()` — creates OAuth2 app on Forgejo for Woodpecker. `_generate_woodpecker_token_impl()` — auto-generates WOODPECKER_TOKEN via OAuth2 flow. `_activate_woodpecker_repo_impl()` — activates repo in Woodpecker. All gated by `_load_ci_context()` which validates required env vars. | bin/disinto (init) | | `lib/generators.sh` | Template generation for `disinto init`: `generate_compose()` — docker-compose.yml (uses `codeberg.org/forgejo/forgejo:11.0` tag; adds `security_opt: [apparmor:unconfined]` to all services for rootless container compatibility), `generate_caddyfile()` — Caddyfile, `generate_staging_index()` — staging index, `generate_deploy_pipelines()` — Woodpecker deployment pipeline configs. Requires `FACTORY_ROOT`, `PROJECT_NAME`, `PRIMARY_BRANCH`. | bin/disinto (init) | diff --git a/planner/AGENTS.md b/planner/AGENTS.md index b668cb4..e428fa9 100644 --- a/planner/AGENTS.md +++ b/planner/AGENTS.md @@ -1,4 +1,4 @@ - + # Planner Agent **Role**: Strategic planning using a Prerequisite Tree (Theory of Constraints), diff --git a/predictor/AGENTS.md b/predictor/AGENTS.md index 2eb76c1..e475ed0 100644 --- a/predictor/AGENTS.md +++ b/predictor/AGENTS.md @@ -1,4 +1,4 @@ - + # Predictor Agent **Role**: Abstract adversary (the "goblin"). Runs a 2-step formula diff --git a/review/AGENTS.md b/review/AGENTS.md index 28d7209..c03fd52 100644 --- a/review/AGENTS.md +++ b/review/AGENTS.md @@ -1,4 +1,4 @@ - + # Review Agent **Role**: AI-powered PR review — post structured findings and formal diff --git a/supervisor/AGENTS.md b/supervisor/AGENTS.md index a33d762..ad53d13 100644 --- a/supervisor/AGENTS.md +++ b/supervisor/AGENTS.md @@ -1,4 +1,4 @@ - + # Supervisor Agent **Role**: Health monitoring and auto-remediation, executed as a formula-driven