fix: refactor: replace escalation JSONL with blocked label + diagnostic comment (#352)
Replace the unreliable escalation JSONL system (supervisor/escalations-*.jsonl consumed by gardener) with direct blocked label + diagnostic comment on the original issue. When a dev-agent or action-agent session fails (PHASE:failed, idle timeout, crash, CI exhausted): - Capture last 50 lines from tmux pane via tmux capture-pane - Post a structured diagnostic comment on the issue (exit reason, timestamp, PR number, tmux output) - Label the issue "blocked" (instead of restoring "backlog") - Remove in-progress label Removed: - Escalation JSONL write paths in dev-agent.sh, phase-handler.sh, dev-poll.sh, action-agent.sh - is_escalated() helper in dev-poll.sh - Escalation triage (P2f section) in supervisor-poll.sh - Escalation processing + recipe engine in gardener-poll.sh - ci-escalation-recipes step from run-gardener.toml formula - escalations*.jsonl from .gitignore Added: - post_blocked_diagnostic() shared helper in phase-handler.sh - ensure_blocked_label_id() helper (creates label via API if not exists) - is_blocked() helper in dev-poll.sh (replaces is_escalated) - Blocked issues listing in supervisor/preflight.sh Kept: - Matrix notifications on failure (unchanged) - CI fix counter logic (still tracks attempts) - needs_human injection in supervisor/gardener (not escalation-related) - Gardener grooming (gardener-agent.sh still invoked) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
0109f0b0c3
commit
61c44d31b1
10 changed files with 181 additions and 990 deletions
|
|
@ -749,18 +749,15 @@ case "${_MONITOR_LOOP_EXIT:-}" in
|
|||
idle_timeout|idle_prompt)
|
||||
if [ "${_MONITOR_LOOP_EXIT:-}" = "idle_prompt" ]; then
|
||||
notify_ctx \
|
||||
"session finished without phase signal — killed. Escalating to gardener." \
|
||||
"session finished without phase signal — killed. Escalating to gardener.${PR_NUMBER:+ PR <a href='${CODEBERG_WEB}/pulls/${PR_NUMBER}'>#${PR_NUMBER}</a>}"
|
||||
"session finished without phase signal — killed. Marking blocked." \
|
||||
"session finished without phase signal — killed. Marking blocked.${PR_NUMBER:+ PR <a href='${CODEBERG_WEB}/pulls/${PR_NUMBER}'>#${PR_NUMBER}</a>}"
|
||||
else
|
||||
notify_ctx \
|
||||
"session idle for 2h — killed. Escalating to gardener." \
|
||||
"session idle for 2h — killed. Escalating to gardener.${PR_NUMBER:+ PR <a href='${CODEBERG_WEB}/pulls/${PR_NUMBER}'>#${PR_NUMBER}</a>}"
|
||||
"session idle for 2h — killed. Marking blocked." \
|
||||
"session idle for 2h — killed. Marking blocked.${PR_NUMBER:+ PR <a href='${CODEBERG_WEB}/pulls/${PR_NUMBER}'>#${PR_NUMBER}</a>}"
|
||||
fi
|
||||
# Escalate: write to project-suffixed escalation file so gardener picks it up
|
||||
echo "{\"issue\":${ISSUE},\"pr\":${PR_NUMBER:-0},\"reason\":\"${_MONITOR_LOOP_EXIT:-idle_timeout}\",\"ts\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"}" \
|
||||
>> "${FACTORY_ROOT}/supervisor/escalations-${PROJECT_NAME}.jsonl"
|
||||
# Restore labels: remove in-progress, add backlog
|
||||
restore_to_backlog
|
||||
# Post diagnostic comment + label issue blocked (replaces escalation JSONL)
|
||||
post_blocked_diagnostic "${_MONITOR_LOOP_EXIT:-idle_timeout}"
|
||||
if [ -n "${PR_NUMBER:-}" ]; then
|
||||
log "keeping worktree (PR #${PR_NUMBER} still open)"
|
||||
else
|
||||
|
|
@ -773,8 +770,8 @@ case "${_MONITOR_LOOP_EXIT:-}" in
|
|||
;;
|
||||
crashed)
|
||||
# Belt-and-suspenders: _on_phase_change(PHASE:crashed) handles primary
|
||||
# cleanup (escalation, notification, labels, worktree, files).
|
||||
restore_to_backlog
|
||||
# cleanup (diagnostic comment, blocked label, worktree, files).
|
||||
post_blocked_diagnostic "crashed"
|
||||
;;
|
||||
done)
|
||||
# Belt-and-suspenders: callback in phase-handler.sh handles primary cleanup,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue