From 77cb4c46438aefd003eb3e552eb4337ca9677ac2 Mon Sep 17 00:00:00 2001 From: johba Date: Sun, 15 Mar 2026 18:06:25 +0100 Subject: [PATCH] =?UTF-8?q?refactor:=20rename=20factory/=20=E2=86=92=20sup?= =?UTF-8?q?ervisor/,=20factory-poll=20=E2=86=92=20supervisor-poll?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The supervisor agent was confusingly named "factory" (same as the project). Rename directory, script, log, lock, status, and escalation files. Update all references across scripts and docs. FACTORY_ROOT env var unchanged (refers to project root, not agent). Co-Authored-By: Claude Opus 4.6 (1M context) --- BOOTSTRAP.md | 14 +++--- README.md | 16 +++---- dev/dev-agent.sh | 4 +- dev/dev-poll.sh | 4 +- lib/matrix_listener.sh | 8 ++-- {factory => supervisor}/PROMPT.md | 22 +++++----- {factory => supervisor}/best-practices/ci.md | 0 .../best-practices/codeberg.md | 2 +- .../best-practices/dev-agent.md | 4 +- .../best-practices/disk.md | 2 +- {factory => supervisor}/best-practices/git.md | 4 +- .../best-practices/memory.md | 0 .../best-practices/review-agent.md | 2 +- .../supervisor-poll.sh | 44 +++++++++---------- {factory => supervisor}/update-prompt.sh | 10 ++--- 15 files changed, 68 insertions(+), 68 deletions(-) rename {factory => supervisor}/PROMPT.md (69%) rename {factory => supervisor}/best-practices/ci.md (100%) rename {factory => supervisor}/best-practices/codeberg.md (93%) rename {factory => supervisor}/best-practices/dev-agent.md (80%) rename {factory => supervisor}/best-practices/disk.md (95%) rename {factory => supervisor}/best-practices/git.md (96%) rename {factory => supervisor}/best-practices/memory.md (100%) rename {factory => supervisor}/best-practices/review-agent.md (95%) rename factory/factory-poll.sh => supervisor/supervisor-poll.sh (91%) rename {factory => supervisor}/update-prompt.sh (73%) diff --git a/BOOTSTRAP.md b/BOOTSTRAP.md index 0f19a0a..26b2079 100644 --- a/BOOTSTRAP.md +++ b/BOOTSTRAP.md @@ -55,7 +55,7 @@ CLAUDE_TIMEOUT=7200 # seconds per Claude invocation ### Required: CI pipeline -The repo needs at least one Woodpecker pipeline. Dark-factory monitors CI status to decide when a PR is ready for review and when it can merge. +The repo needs at least one Woodpecker pipeline. Disinto monitors CI status to decide when a PR is ready for review and when it can merge. ### Required: `CLAUDE.md` @@ -155,7 +155,7 @@ Add (adjust paths): FACTORY_ROOT=/home/you/disinto # Supervisor — health checks, auto-healing (every 10 min) -0,10,20,30,40,50 * * * * $FACTORY_ROOT/factory/factory-poll.sh +0,10,20,30,40,50 * * * * $FACTORY_ROOT/supervisor/supervisor-poll.sh # Review agent — find unreviewed PRs (every 10 min, offset +3) 3,13,23,33,43,53 * * * * $FACTORY_ROOT/review/review-poll.sh @@ -176,7 +176,7 @@ The 3-minute offsets prevent agents from competing for resources. ```bash # Should complete with "all clear" (no problems to fix) -bash factory/factory-poll.sh +bash supervisor/supervisor-poll.sh # Should list backlog issues (or "no backlog issues") bash dev/dev-poll.sh @@ -188,7 +188,7 @@ bash review/review-poll.sh Check logs after a few cycles: ```bash -tail -30 factory/factory.log +tail -30 supervisor/supervisor.log tail -30 dev/dev-agent.log tail -30 review/review.log ``` @@ -203,7 +203,7 @@ If you want real-time notifications and human-in-the-loop escalation: sudo cp lib/matrix_listener.service /etc/systemd/system/ sudo systemctl enable --now matrix_listener ``` -3. The factory and gardener will post status updates and escalation threads to the configured room. Reply in-thread to answer escalations. +3. The supervisor and gardener will post status updates and escalation threads to the configured room. Reply in-thread to answer escalations. ## Lifecycle @@ -219,7 +219,7 @@ You write issues (with backlog label) → merge, close issue, clean up Meanwhile: - factory-poll monitors health, kills stale processes, manages resources + supervisor-poll monitors health, kills stale processes, manages resources gardener grooms backlog: closes duplicates, promotes tech-debt, escalates ambiguity planner rebuilds AGENTS.md from git history, gap-analyses against VISION.md ``` @@ -233,4 +233,4 @@ Meanwhile: | CI stuck | `bash lib/ci-debug.sh` — check Woodpecker. Rate-limited? (exit 128 = wait 15 min) | | Claude not found | `which claude` — must be in PATH. Check `lib/env.sh` adds `~/.local/bin`. | | Merge fails | Branch protection misconfigured? Review bot needs write access to the repo. | -| Memory issues | Factory auto-heals at <500 MB free. Check `factory/factory.log` for P0 alerts. | +| Memory issues | Supervisor auto-heals at <500 MB free. Check `supervisor/supervisor.log` for P0 alerts. | diff --git a/README.md b/README.md index a653bc8..5fbe6cb 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ Point it at a Codeberg repo with a Woodpecker CI pipeline and it will pick up is ## Architecture ``` -cron (*/10) ──→ factory-poll.sh ← supervisor (bash checks, zero tokens) +cron (*/10) ──→ supervisor-poll.sh ← supervisor (bash checks, zero tokens) ├── all clear? → exit 0 └── problem? → claude -p (diagnose, fix, or escalate) @@ -33,9 +33,9 @@ all agents ──→ matrix_send() ← status updates, escalations, merge no **Required:** - [Claude CLI](https://docs.anthropic.com/en/docs/claude-cli) — `claude` in PATH, authenticated -- [Codeberg](https://codeberg.org/) account with an API token — the factory reads issues, opens PRs, posts comments, and merges via the Codeberg API +- [Codeberg](https://codeberg.org/) account with an API token — disinto reads issues, opens PRs, posts comments, and merges via the Codeberg API - A second Codeberg account for the review bot — reviews posted under a separate identity so the dev-agent doesn't review its own PRs (`REVIEW_BOT_TOKEN`) -- [Woodpecker CI](https://woodpecker-ci.org/) — local instance connected to your Codeberg repo; the factory monitors pipelines, retries failures, and queries the Woodpecker Postgres DB directly +- [Woodpecker CI](https://woodpecker-ci.org/) — local instance connected to your Codeberg repo; disinto monitors pipelines, retries failures, and queries the Woodpecker Postgres DB directly - PostgreSQL client (`psql`) — for Woodpecker DB queries (pipeline status, build counts) - `jq`, `curl`, `git` @@ -84,13 +84,13 @@ CLAUDE_TIMEOUT=7200 # max seconds per Claude invocation (default: 2h) # 3. Install cron (staggered to avoid overlap) crontab -e # Add: -# 0,10,20,30,40,50 * * * * /path/to/disinto/factory/factory-poll.sh +# 0,10,20,30,40,50 * * * * /path/to/disinto/supervisor/supervisor-poll.sh # 3,13,23,33,43,53 * * * * /path/to/disinto/review/review-poll.sh # 6,16,26,36,46,56 * * * * /path/to/disinto/dev/dev-poll.sh # 15 8 * * * /path/to/disinto/gardener/gardener-poll.sh # 4. Verify -bash factory/factory-poll.sh # should log "all clear" +bash supervisor/supervisor-poll.sh # should log "all clear" ``` ## Directory Structure @@ -113,8 +113,8 @@ disinto/ ├── gardener/ │ ├── gardener-poll.sh # Cron entry: backlog grooming │ └── best-practices.md # Gardener knowledge base -└── factory/ - ├── factory-poll.sh # Supervisor: health checks + claude -p +└── supervisor/ + ├── supervisor-poll.sh # Supervisor: health checks + claude -p ├── PROMPT.md # Supervisor's system prompt ├── update-prompt.sh # Self-learning: append to best-practices └── best-practices/ # Progressive disclosure knowledge base @@ -131,7 +131,7 @@ disinto/ | Agent | Trigger | Job | |-------|---------|-----| -| **Factory** (supervisor) | Every 10 min | Health checks (RAM, disk, CI, git). Calls Claude only when something is broken. Self-improving via `best-practices/`. | +| **Supervisor** | Every 10 min | Health checks (RAM, disk, CI, git). Calls Claude only when something is broken. Self-improving via `best-practices/`. | | **Dev** | Every 10 min | Picks up `backlog`-labeled issues, creates a branch, implements, opens a PR, monitors CI, responds to review, merges. | | **Review** | Every 10 min | Finds PRs without review, runs Claude-powered code review, approves or requests changes. | | **Gardener** | Daily | Grooms the issue backlog: detects duplicates, promotes `tech-debt` to `backlog`, closes stale issues, escalates ambiguous items. | diff --git a/dev/dev-agent.sh b/dev/dev-agent.sh index abc12bb..8debf62 100755 --- a/dev/dev-agent.sh +++ b/dev/dev-agent.sh @@ -1106,9 +1106,9 @@ while [ "$REVIEW_ROUND" -lt "$MAX_REVIEW_ROUNDS" ]; do CI_FIX_COUNT=$(( ${CI_FIX_COUNT:-0} + 1 )) if [ "$CI_FIX_COUNT" -gt 2 ]; then log "CI failure not recoverable after ${CI_FIX_COUNT} fix attempts" - # Escalate to supervisor — write marker for factory-poll.sh to pick up + # Escalate to supervisor — write marker for supervisor-poll.sh to pick up echo "{\"issue\":${ISSUE},\"pr\":${PR_NUMBER},\"reason\":\"ci_exhausted\",\"step\":\"${FAILED_STEP:-unknown}\",\"attempts\":${CI_FIX_COUNT},\"ts\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"}" \ - >> "${FACTORY_ROOT}/factory/escalations.jsonl" + >> "${FACTORY_ROOT}/supervisor/escalations.jsonl" log "escalated to supervisor via escalations.jsonl" break fi diff --git a/dev/dev-poll.sh b/dev/dev-poll.sh index 8a7593f..a3c8fad 100755 --- a/dev/dev-poll.sh +++ b/dev/dev-poll.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# dev-poll.sh — Pull-based factory: find the next ready issue and start dev-agent +# dev-poll.sh — Pull-based scheduler: find the next ready issue and start dev-agent # # Pull system: issues labeled "backlog" are candidates. An issue is READY when # ALL its dependency issues are closed (and their PRs merged). @@ -104,7 +104,7 @@ dep_is_merged() { return 1 fi - # Issue closed = dep satisfied. The factory only closes issues after + # Issue closed = dep satisfied. The scheduler only closes issues after # merging, so closed state is trustworthy. No need to hunt for the # specific PR — that was over-engineering that caused false negatives. return 0 diff --git a/lib/matrix_listener.sh b/lib/matrix_listener.sh index 6c4666f..01d0748 100755 --- a/lib/matrix_listener.sh +++ b/lib/matrix_listener.sh @@ -1,11 +1,11 @@ #!/usr/bin/env bash # matrix_listener.sh — Long-poll Matrix sync daemon # -# Listens for replies in the factory Matrix room and dispatches them +# Listens for replies in the Matrix coordination room and dispatches them # to the appropriate agent via well-known files. # # Dispatch: -# Thread reply to [supervisor] message → /tmp/factory-escalation-reply +# Thread reply to [supervisor] message → /tmp/supervisor-escalation-reply # Thread reply to [gardener] message → /tmp/gardener-escalation-reply # # Run as systemd service (see matrix_listener.service) or manually: @@ -18,7 +18,7 @@ source "$(dirname "$0")/../lib/env.sh" SINCE_FILE="/tmp/matrix-listener-since" THREAD_MAP="${MATRIX_THREAD_MAP:-/tmp/matrix-thread-map}" -LOGFILE="${FACTORY_ROOT}/factory/matrix-listener.log" +LOGFILE="${FACTORY_ROOT}/supervisor/matrix-listener.log" SYNC_TIMEOUT=30000 # 30s long-poll BACKOFF=5 MAX_BACKOFF=60 @@ -133,7 +133,7 @@ while true; do case "$AGENT" in supervisor) - printf '%s\t%s\t%s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$SENDER" "$BODY" >> /tmp/factory-escalation-reply + printf '%s\t%s\t%s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$SENDER" "$BODY" >> /tmp/supervisor-escalation-reply # Acknowledge matrix_send "supervisor" "✓ received, will act on next poll" "$THREAD_ROOT" >/dev/null 2>&1 || true ;; diff --git a/factory/PROMPT.md b/supervisor/PROMPT.md similarity index 69% rename from factory/PROMPT.md rename to supervisor/PROMPT.md index fc86282..b2b1d02 100644 --- a/factory/PROMPT.md +++ b/supervisor/PROMPT.md @@ -1,7 +1,7 @@ -# Factory Supervisor +# Supervisor Agent -You are the factory supervisor for `$CODEBERG_REPO`. You were called because -`factory-poll.sh` detected an issue it couldn't auto-fix. +You are the supervisor agent for `$CODEBERG_REPO`. You were called because +`supervisor-poll.sh` detected an issue it couldn't auto-fix. ## Priority Order @@ -16,13 +16,13 @@ You are the factory supervisor for `$CODEBERG_REPO`. You were called because Fix the issue yourself. You have full shell access and `--dangerously-skip-permissions`. Before acting, read the relevant best-practices file: -- Memory issues → `cat ${FACTORY_ROOT}/factory/best-practices/memory.md` -- Disk issues → `cat ${FACTORY_ROOT}/factory/best-practices/disk.md` -- CI issues → `cat ${FACTORY_ROOT}/factory/best-practices/ci.md` -- Codeberg / rate limits → `cat ${FACTORY_ROOT}/factory/best-practices/codeberg.md` -- Dev-agent issues → `cat ${FACTORY_ROOT}/factory/best-practices/dev-agent.md` -- Review-agent issues → `cat ${FACTORY_ROOT}/factory/best-practices/review-agent.md` -- Git issues → `cat ${FACTORY_ROOT}/factory/best-practices/git.md` +- Memory issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/memory.md` +- Disk issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/disk.md` +- CI issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/ci.md` +- Codeberg / rate limits → `cat ${FACTORY_ROOT}/supervisor/best-practices/codeberg.md` +- Dev-agent issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/dev-agent.md` +- Review-agent issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/review-agent.md` +- Git issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/git.md` ## Credentials & API Access @@ -66,6 +66,6 @@ ESCALATE: If you discover something new, append it to the relevant best-practices file: ```bash -bash ${FACTORY_ROOT}/factory/update-prompt.sh "best-practices/.md" "### Lesson title +bash ${FACTORY_ROOT}/supervisor/update-prompt.sh "best-practices/.md" "### Lesson title Description of what you learned." ``` diff --git a/factory/best-practices/ci.md b/supervisor/best-practices/ci.md similarity index 100% rename from factory/best-practices/ci.md rename to supervisor/best-practices/ci.md diff --git a/factory/best-practices/codeberg.md b/supervisor/best-practices/codeberg.md similarity index 93% rename from factory/best-practices/codeberg.md rename to supervisor/best-practices/codeberg.md index c8beaca..506330a 100644 --- a/factory/best-practices/codeberg.md +++ b/supervisor/best-practices/codeberg.md @@ -22,7 +22,7 @@ cd && git commit --allow-empty -m "ci: retrigger" --no-verify && git ``` ### Prevention -- The factory runs 3 agents staggered by 3 minutes. During heavy development, many PRs trigger CI simultaneously. +- The system runs 3 agents staggered by 3 minutes. During heavy development, many PRs trigger CI simultaneously. - One pipeline at a time is ideal on this VPS (resource + rate limit reasons). - If >3 pipelines are pending/running, do NOT create more work. diff --git a/factory/best-practices/dev-agent.md b/supervisor/best-practices/dev-agent.md similarity index 80% rename from factory/best-practices/dev-agent.md rename to supervisor/best-practices/dev-agent.md index dbd7939..ce0f472 100644 --- a/factory/best-practices/dev-agent.md +++ b/supervisor/best-practices/dev-agent.md @@ -44,12 +44,12 @@ DO NOT try to find the specific PR that closed an issue. This is over-engineerin - Codeberg shares issue/PR numbering — no guaranteed relationship - PRs don't always mention the issue number in title/body - Searching last N closed PRs misses older merges -- The factory itself closes issues after merging, so closed = merged +- The dev-agent closes issues after merging, so closed = merged The only check needed: `issue.state == "closed"`. ### False Positive: Status Unchanged Alert -The factory-poll alert 'status unchanged for Nmin' is a false positive for complex implementation tasks. The status is set to 'claude assessing + implementing' at the START of the `timeout 7200 claude -p ...` call and only updates after Claude finishes. Normal complex tasks (multi-file Solidity changes + forge test) take 45-90 minutes. To distinguish a false positive from a real stuck agent: check that the claude PID is alive (`ps -p `), consuming CPU (>0%), and has active threads (`pstree -p `). If the process is alive and using CPU, do NOT restart it — this wastes completed work. +The supervisor-poll alert 'status unchanged for Nmin' is a false positive for complex implementation tasks. The status is set to 'claude assessing + implementing' at the START of the `timeout 7200 claude -p ...` call and only updates after Claude finishes. Normal complex tasks (multi-file Solidity changes + forge test) take 45-90 minutes. To distinguish a false positive from a real stuck agent: check that the claude PID is alive (`ps -p `), consuming CPU (>0%), and has active threads (`pstree -p `). If the process is alive and using CPU, do NOT restart it — this wastes completed work. ### False Positive: 'Waiting for CI + Review' Alert The 'status unchanged for Nmin' alert is also a false positive when status is 'waiting for CI + review on PR #N (round R)'. This is an intentional sleep/poll loop — the agent is waiting for CI to pass and then for review-poll to post a review. CI can take 20–40 minutes; review follows. Do NOT restart the agent. Confirm by checking: (1) agent PID is alive, (2) CI commit status via `codeberg_api GET /commits//status`, (3) review-poll log shows it will pick up the PR on next cycle. diff --git a/factory/best-practices/disk.md b/supervisor/best-practices/disk.md similarity index 95% rename from factory/best-practices/disk.md rename to supervisor/best-practices/disk.md index 6b63484..2291cf2 100644 --- a/factory/best-practices/disk.md +++ b/supervisor/best-practices/disk.md @@ -2,7 +2,7 @@ ## Safe Fixes - Docker cleanup: `sudo docker system prune -f` (keeps images, removes stopped containers + dangling layers) -- Truncate factory logs >5MB: `truncate -s 0 ` +- Truncate supervisor logs >5MB: `truncate -s 0 ` - Remove stale worktrees: check `/tmp/${PROJECT_NAME}-worktree-*`, only if dev-agent not running on them - Woodpecker log_entries: `DELETE FROM log_entries WHERE id < (SELECT max(id) - 100000 FROM log_entries);` then `VACUUM;` - Node module caches in worktrees: `rm -rf /tmp/${PROJECT_NAME}-worktree-*/node_modules/` diff --git a/factory/best-practices/git.md b/supervisor/best-practices/git.md similarity index 96% rename from factory/best-practices/git.md rename to supervisor/best-practices/git.md index a48045c..6551d3a 100644 --- a/factory/best-practices/git.md +++ b/supervisor/best-practices/git.md @@ -34,7 +34,7 @@ ## Known Issues - Main repo MUST be on $PRIMARY_BRANCH at all times. Dev work happens in worktrees. -- Stale rebases (detached HEAD) break all worktree creation — silent factory stall. +- Stale rebases (detached HEAD) break all worktree creation — silent pipeline stall. - `git worktree add` fails if target directory exists (even empty). Remove first. - Many old branches exist locally (100+). Normal — don't bulk-delete. @@ -47,7 +47,7 @@ ## Lessons Learned - NEVER delete remote branches before confirming merge. Close PR, rebase locally, force-push if needed. -- Stale rebase caused 5h factory stall once (2026-03-11). Auto-heal added to dev-agent. +- Stale rebase caused 5h pipeline stall once (2026-03-11). Auto-heal added to dev-agent. - lint-staged hooks fail when `forge` not in PATH. Use `--no-verify` when committing from scripts. ### PR #608 Post-Mortem (2026-03-12/13) diff --git a/factory/best-practices/memory.md b/supervisor/best-practices/memory.md similarity index 100% rename from factory/best-practices/memory.md rename to supervisor/best-practices/memory.md diff --git a/factory/best-practices/review-agent.md b/supervisor/best-practices/review-agent.md similarity index 95% rename from factory/best-practices/review-agent.md rename to supervisor/best-practices/review-agent.md index c2a053e..32133df 100644 --- a/factory/best-practices/review-agent.md +++ b/supervisor/best-practices/review-agent.md @@ -19,7 +19,7 @@ - **Hallucinated findings** — bot may flag non-issues. This needs Clawy's judgment — escalate. ## Monitoring -- Unreviewed PRs with CI pass for >1h → factory-poll.sh auto-triggers review +- Unreviewed PRs with CI pass for >1h → supervisor-poll.sh auto-triggers review - Review errors should resolve on next poll cycle - If same PR fails review 3+ times → likely a prompt issue, escalate diff --git a/factory/factory-poll.sh b/supervisor/supervisor-poll.sh similarity index 91% rename from factory/factory-poll.sh rename to supervisor/supervisor-poll.sh index d83fc1a..a2441a1 100755 --- a/factory/factory-poll.sh +++ b/supervisor/supervisor-poll.sh @@ -1,20 +1,20 @@ #!/usr/bin/env bash -# factory-poll.sh — Factory supervisor: bash checks + claude -p for fixes +# supervisor-poll.sh — Supervisor agent: bash checks + claude -p for fixes # # Runs every 10min via cron. Does all health checks in bash (zero tokens). # Only invokes claude -p when auto-fix fails or issue is complex. # -# Cron: */10 * * * * /path/to/disinto/factory/factory-poll.sh +# Cron: */10 * * * * /path/to/disinto/supervisor/supervisor-poll.sh # -# Peek: cat /tmp/factory-status -# Log: tail -f /path/to/disinto/factory/factory.log +# Peek: cat /tmp/supervisor-status +# Log: tail -f /path/to/disinto/supervisor/supervisor.log source "$(dirname "$0")/../lib/env.sh" -LOGFILE="${FACTORY_ROOT}/factory/factory.log" -STATUSFILE="/tmp/factory-status" -LOCKFILE="/tmp/factory-poll.lock" -PROMPT_FILE="${FACTORY_ROOT}/factory/PROMPT.md" +LOGFILE="${FACTORY_ROOT}/supervisor/supervisor.log" +STATUSFILE="/tmp/supervisor-status" +LOCKFILE="/tmp/supervisor-poll.lock" +PROMPT_FILE="${FACTORY_ROOT}/supervisor/PROMPT.md" # Prevent overlapping runs if [ -f "$LOCKFILE" ]; then @@ -32,15 +32,15 @@ flog() { } status() { - printf '[%s] factory: %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" > "$STATUSFILE" + printf '[%s] supervisor: %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" > "$STATUSFILE" flog "$*" } # ── Check for escalation replies from Matrix ────────────────────────────── ESCALATION_REPLY="" -if [ -s /tmp/factory-escalation-reply ]; then - ESCALATION_REPLY=$(cat /tmp/factory-escalation-reply) - rm -f /tmp/factory-escalation-reply +if [ -s /tmp/supervisor-escalation-reply ]; then + ESCALATION_REPLY=$(cat /tmp/supervisor-escalation-reply) + rm -f /tmp/supervisor-escalation-reply flog "Got escalation reply: $(echo "$ESCALATION_REPLY" | head -1)" fi @@ -71,7 +71,7 @@ SWAP_USED_MB=$(free -m | awk '/Swap:/{print $3}') if [ "${AVAIL_MB:-9999}" -lt 500 ] || { [ "${SWAP_USED_MB:-0}" -gt 3000 ] && [ "${AVAIL_MB:-9999}" -lt 2000 ]; }; then flog "MEMORY CRISIS: avail=${AVAIL_MB}MB swap_used=${SWAP_USED_MB}MB — auto-fixing" - # Kill stale factory-spawned claude processes (>3h old) — skip interactive sessions + # Kill stale agent-spawned claude processes (>3h old) — skip interactive sessions STALE_CLAUDES=$(pgrep -f "claude -p" --older 10800 2>/dev/null || true) if [ -n "$STALE_CLAUDES" ]; then echo "$STALE_CLAUDES" | xargs kill 2>/dev/null || true @@ -113,7 +113,7 @@ if [ "${DISK_PERCENT:-0}" -gt 80 ]; then # Docker cleanup (safe — keeps images) sudo docker system prune -f >/dev/null 2>&1 && fixed "Docker prune" - # Truncate factory logs >10MB + # Truncate supervisor logs >10MB for logfile in "${FACTORY_ROOT}"/{dev,review,factory}/*.log; do if [ -f "$logfile" ]; then SIZE_KB=$(du -k "$logfile" 2>/dev/null | cut -f1) @@ -159,7 +159,7 @@ fi # ============================================================================= # P2: FACTORY STOPPED — CI, dev-agent, git # ============================================================================= -status "P2: checking factory" +status "P2: checking pipeline" # CI stuck STUCK_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='running' AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs || true) @@ -204,7 +204,7 @@ fi # ============================================================================= # P2b: FACTORY STALLED — backlog exists but no agent running # ============================================================================= -status "P2: checking factory stall" +status "P2: checking pipeline stall" BACKLOG_COUNT=$(codeberg_api GET "/issues?state=open&labels=backlog&type=issues&limit=1" 2>/dev/null | jq -r 'length' 2>/dev/null || echo "0") IN_PROGRESS=$(codeberg_api GET "/issues?state=open&labels=in-progress&type=issues&limit=1" 2>/dev/null | jq -r 'length' 2>/dev/null || echo "0") @@ -221,7 +221,7 @@ if [ "${BACKLOG_COUNT:-0}" -gt 0 ] && [ "${IN_PROGRESS:-0}" -eq 0 ]; then IDLE_MIN=$(( (NOW_EPOCH - LAST_LOG_EPOCH) / 60 )) if [ "$IDLE_MIN" -gt 20 ]; then - p2 "Factory stalled: ${BACKLOG_COUNT} backlog issue(s), no agent ran for ${IDLE_MIN}min" + p2 "Pipeline stalled: ${BACKLOG_COUNT} backlog issue(s), no agent ran for ${IDLE_MIN}min" fi fi @@ -277,7 +277,7 @@ done # P4: HOUSEKEEPING — stale processes # ============================================================================= # Check for dev-agent escalations -ESCALATION_FILE="${FACTORY_ROOT}/factory/escalations.jsonl" +ESCALATION_FILE="${FACTORY_ROOT}/supervisor/escalations.jsonl" if [ -s "$ESCALATION_FILE" ]; then ESCALATION_COUNT=$(wc -l < "$ESCALATION_FILE") p3 "Dev-agent escalated ${ESCALATION_COUNT} issue(s) — see ${ESCALATION_FILE}" @@ -285,7 +285,7 @@ fi status "P4: housekeeping" -# Stale factory-spawned claude processes (>3h, not caught by P0) — skip interactive sessions +# Stale agent-spawned claude processes (>3h, not caught by P0) — skip interactive sessions STALE_CLAUDES=$(pgrep -f "claude -p" --older 10800 2>/dev/null || true) if [ -n "$STALE_CLAUDES" ]; then echo "$STALE_CLAUDES" | xargs kill 2>/dev/null || true @@ -308,7 +308,7 @@ for wt in /tmp/${PROJECT_NAME}-worktree-* /tmp/${PROJECT_NAME}-review-*; do done git -C "$PROJECT_REPO_ROOT" worktree prune 2>/dev/null || true -# Rotate factory log if >5MB +# Rotate supervisor log if >5MB for logfile in "${FACTORY_ROOT}"/{dev,review,factory}/*.log; do if [ -f "$logfile" ]; then SIZE_KB=$(du -k "$logfile" 2>/dev/null | cut -f1) @@ -329,12 +329,12 @@ if [ -n "$ALL_ALERTS" ]; then ALERT_TEXT=$(echo -e "$ALL_ALERTS") # Notify Matrix - matrix_send "supervisor" "⚠️ Factory alerts: + matrix_send "supervisor" "⚠️ Supervisor alerts: ${ALERT_TEXT}" 2>/dev/null || true flog "Invoking claude -p for alerts" - CLAUDE_PROMPT="$(cat "$PROMPT_FILE" 2>/dev/null || echo "You are a factory supervisor. Fix the issue below.") + CLAUDE_PROMPT="$(cat "$PROMPT_FILE" 2>/dev/null || echo "You are a supervisor agent. Fix the issue below.") ## Current Alerts ${ALERT_TEXT} diff --git a/factory/update-prompt.sh b/supervisor/update-prompt.sh similarity index 73% rename from factory/update-prompt.sh rename to supervisor/update-prompt.sh index 75beb21..3d38855 100755 --- a/factory/update-prompt.sh +++ b/supervisor/update-prompt.sh @@ -2,15 +2,15 @@ # update-prompt.sh — Append a lesson to a best-practices file # # Usage: -# ./factory/update-prompt.sh "best-practices/memory.md" "### Title\nBody text" -# ./factory/update-prompt.sh --from-file "best-practices/memory.md" /tmp/lesson.md +# ./supervisor/update-prompt.sh "best-practices/memory.md" "### Title\nBody text" +# ./supervisor/update-prompt.sh --from-file "best-practices/memory.md" /tmp/lesson.md # # Called by claude -p when it learns something during a fix. # Commits and pushes the update to the disinto repo. source "$(dirname "$0")/../lib/env.sh" -TARGET_FILE="${FACTORY_ROOT}/factory/$1" +TARGET_FILE="${FACTORY_ROOT}/supervisor/$1" shift if [ "$1" = "--from-file" ] && [ -f "$2" ]; then @@ -40,8 +40,8 @@ else fi cd "$FACTORY_ROOT" -git add "factory/$1" 2>/dev/null || git add "$TARGET_FILE" -git commit -m "factory: learned — $(echo "$LESSON" | head -1 | sed 's/^#* *//')" --no-verify 2>/dev/null +git add "supervisor/$1" 2>/dev/null || git add "$TARGET_FILE" +git commit -m "supervisor: learned — $(echo "$LESSON" | head -1 | sed 's/^#* *//')" --no-verify 2>/dev/null git push origin main 2>/dev/null log "Updated $(basename "$TARGET_FILE") with new lesson"