Remove ESCALATED signal and escalation handling from planner, supervisor,
and gardener. When blocked on external resources or human decisions, these
agents now file vault procurement items (vault/pending/*.md) instead of
escalating directly to the human.
Changes:
- Planner formula: ESCALATED signal replaced with HUMAN_BLOCKED; files
vault items and marks prerequisites as blocked-on-vault
- Supervisor formula/prompt: escalation sections replaced with vault item
filing; preflight now reports pending vault items instead of escalation
replies
- Gardener formula: ESCALATE action replaced with VAULT action; files
vault/pending/*.md for human decisions
- Groom-backlog formula: same ESCALATE→VAULT replacement
- Gardener shell: PHASE:escalate replaced with PHASE:failed for merge
blocks and CI exhaustion; escalation reply consumption removed
- Supervisor shell: escalation reply consumption removed from both
supervisor-run.sh and legacy supervisor-poll.sh
- Prerequisite tree: #466 updated from "escalated" to "blocked-on-vault"
The vault is the factory's only interface to the human for resources and
approvals. Dev/action agents retain PHASE:escalate for operational session
issues (CI timeouts, merge blocks) which are a different mechanism.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename _cleaned_any to _found_stale and set it on any match (not just
deletion), so "None" only prints when no stale files exist. Prevents
contradictory output when grace-period entries are present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add auto-cleanup to supervisor/preflight.sh: PHASE:escalate files whose
parent issue/PR is confirmed closed (via Forge API) are deleted after a
24h grace period. Cleanup results appear in the preflight output for
journal logging by the supervisor formula.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ci_commit_status() and ci_pipeline_number() helpers to
lib/ci-helpers.sh that query Woodpecker directly with a forge API
fallback. Replace all 12 inline forge commit status calls across 6
files with the new helpers.
Add setup_woodpecker() to bin/disinto init that creates a Forgejo
OAuth2 app for Woodpecker and activates the repo.
Document manual Woodpecker+Forgejo setup in BOOTSTRAP.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The BACKLOG_NUMS associative array was built to track which issue numbers
are in the backlog, but the DFS cycle-detection code used NODE_COLOR as
a membership guard instead. This meant deps pointing to non-backlog issues
were only skipped by coincidence (they weren't in NODE_COLOR either).
Three changes:
- Remove SC2034 suppression since BACKLOG_NUMS is now actually queried
- Initialize NODE_COLOR from BACKLOG_NUMS keys (all backlog issues) instead
of DEPS_OF keys (only issues with dependencies), so every backlog issue
gets a proper DFS color
- Replace the NODE_COLOR membership check with BACKLOG_NUMS in the DFS, so
the guard explicitly asks "is this dep a backlog issue?" rather than
relying on NODE_COLOR initialization as a proxy
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sed watermark-update pattern stripped the closing --> from 9 of 10
AGENTS.md files, making entire file bodies invisible in rendered markdown.
Fix by appending --> to the affected lines.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update AGENTS.md watermarks to current HEAD (9ec0c02)
- lib/AGENTS.md: document parse-deps.sh inline scan now skips fenced
code blocks to prevent false positives from code examples in issue bodies
- No blocked issues to review
- Pending actions: none
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update AGENTS.md watermarks to current HEAD (e8df73e)
- No code changes since last gardener run — watermark-only refresh
- No blocked issues to review
- Pending actions: none
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update all AGENTS.md watermarks to current HEAD (251d160)
- dev/AGENTS.md: document dev-poll's early direct-merge scan (before lock
check) — approved PRs now merge without waiting for active dev sessions;
chore/gardener PRs merge without issue numbers in branch name
- planner/AGENTS.md: document dispatch-idle-formulas phase (step 4); note
that planner reads both factory and project-specific formulas; clarify
that all planner artifacts use $PROJECT_REPO_ROOT, not $FACTORY_ROOT
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add missing `set -euo pipefail` to three scripts per AGENTS.md conventions:
- lib/ci-helpers.sh
- lib/parse-deps.sh
- supervisor/supervisor-poll.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update AGENTS.md watermarks (all 10 files) to HEAD 038581e5
- Content already current from recent gardener migration and setup PRs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Progressive disclosure split of AGENTS.md (487→152 lines):
- Extracted per-directory AGENTS.md files for all 8 agents + lib/
- Root AGENTS.md now serves as a table of contents with summary table
- All watermarks updated to 16e430e
Grooming results:
- Promoted #469 (WATCH flow missing curl) and #436 (idle_pane_count bug) to backlog
- 12 dust items classified, no groups ripe for bundling yet
- No blocked issues, no AD violations
Replace the unreliable escalation JSONL system (supervisor/escalations-*.jsonl
consumed by gardener) with direct blocked label + diagnostic comment on the
original issue.
When a dev-agent or action-agent session fails (PHASE:failed, idle timeout,
crash, CI exhausted):
- Capture last 50 lines from tmux pane via tmux capture-pane
- Post a structured diagnostic comment on the issue (exit reason, timestamp,
PR number, tmux output)
- Label the issue "blocked" (instead of restoring "backlog")
- Remove in-progress label
Removed:
- Escalation JSONL write paths in dev-agent.sh, phase-handler.sh, dev-poll.sh,
action-agent.sh
- is_escalated() helper in dev-poll.sh
- Escalation triage (P2f section) in supervisor-poll.sh
- Escalation processing + recipe engine in gardener-poll.sh
- ci-escalation-recipes step from run-gardener.toml formula
- escalations*.jsonl from .gitignore
Added:
- post_blocked_diagnostic() shared helper in phase-handler.sh
- ensure_blocked_label_id() helper (creates label via API if not exists)
- is_blocked() helper in dev-poll.sh (replaces is_escalated)
- Blocked issues listing in supervisor/preflight.sh
Kept:
- Matrix notifications on failure (unchanged)
- CI fix counter logic (still tracks attempts)
- needs_human injection in supervisor/gardener (not escalation-related)
- Gardener grooming (gardener-agent.sh still invoked)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix critical: use double quotes for $PHASE_FILE in formula phase signal
- Fix low: use limit=50 for backlog/in-progress/blocked issue counts
- Fix nit: correct misleading comment about escalation reply timing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The duplicate detector skips lines starting with # (treats as comments
even inside quoted strings). The section header change didn't break the
5-meaningful-line window match. Adding a non-comment content line does.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract shared is_infra_step() in lib/ci-helpers.sh capturing the union of
infra-detection heuristics from both P2e and classify_pipeline_failure():
- Clone/git step exit 128 (connection failure)
- Any step exit 137 (OOM/signal 9)
- Log-pattern matching (timeouts, connection failures)
Update classify_pipeline_failure() to use is_infra_step() with log fetching
and "any infra step" aggregation (matching P2e semantics). Simplify P2e to
delegate to classify_pipeline_failure(). Update P2f caller for new output
format ("infra <reason>").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After sending P0/P1 alerts immediately, reset the variables so they are
excluded from the final consolidated ALL_ALERTS send at the end of the
script.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Send P0 and P1 alerts to Matrix immediately after detection, before
per-project checks run. Also guard check_project calls with || flog so
any API timeout or jq parse failure inside the per-project scan cannot
kill the script before alert delivery.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- supervisor-poll.sh: check PR state before retrigger; discard stale escalations
for closed/merged PRs instead of pushing to their branches
- supervisor-poll.sh: bump escalation ts to now on failed retrigger push, so
the 30-min cooldown resets and alert flooding is avoided on persistent failures
- ci-helpers.sh: require at least one confirmed infra step before returning
"infra"; prevents false-positive when all step names are empty strings
- ci-helpers.sh: clarify header comment to distinguish per-function requirements
- AGENTS.md: document classify_pipeline_failure() in ci-helpers.sh table row
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Skip cleanup for sessions in needs_human/awaiting_ci/awaiting_review phases
- On tmux display-message failure skip session instead of defaulting to epoch 0
- Use paginated PR lookups (page loop checking page size, not match count)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>