refactor: rename factory/ → supervisor/, factory-poll → supervisor-poll
The supervisor agent was confusingly named "factory" (same as the project). Rename directory, script, log, lock, status, and escalation files. Update all references across scripts and docs. FACTORY_ROOT env var unchanged (refers to project root, not agent). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
8d73c2f8f9
commit
77cb4c4643
15 changed files with 68 additions and 68 deletions
14
BOOTSTRAP.md
14
BOOTSTRAP.md
|
|
@ -55,7 +55,7 @@ CLAUDE_TIMEOUT=7200 # seconds per Claude invocation
|
|||
|
||||
### Required: CI pipeline
|
||||
|
||||
The repo needs at least one Woodpecker pipeline. Dark-factory monitors CI status to decide when a PR is ready for review and when it can merge.
|
||||
The repo needs at least one Woodpecker pipeline. Disinto monitors CI status to decide when a PR is ready for review and when it can merge.
|
||||
|
||||
### Required: `CLAUDE.md`
|
||||
|
||||
|
|
@ -155,7 +155,7 @@ Add (adjust paths):
|
|||
FACTORY_ROOT=/home/you/disinto
|
||||
|
||||
# Supervisor — health checks, auto-healing (every 10 min)
|
||||
0,10,20,30,40,50 * * * * $FACTORY_ROOT/factory/factory-poll.sh
|
||||
0,10,20,30,40,50 * * * * $FACTORY_ROOT/supervisor/supervisor-poll.sh
|
||||
|
||||
# Review agent — find unreviewed PRs (every 10 min, offset +3)
|
||||
3,13,23,33,43,53 * * * * $FACTORY_ROOT/review/review-poll.sh
|
||||
|
|
@ -176,7 +176,7 @@ The 3-minute offsets prevent agents from competing for resources.
|
|||
|
||||
```bash
|
||||
# Should complete with "all clear" (no problems to fix)
|
||||
bash factory/factory-poll.sh
|
||||
bash supervisor/supervisor-poll.sh
|
||||
|
||||
# Should list backlog issues (or "no backlog issues")
|
||||
bash dev/dev-poll.sh
|
||||
|
|
@ -188,7 +188,7 @@ bash review/review-poll.sh
|
|||
Check logs after a few cycles:
|
||||
|
||||
```bash
|
||||
tail -30 factory/factory.log
|
||||
tail -30 supervisor/supervisor.log
|
||||
tail -30 dev/dev-agent.log
|
||||
tail -30 review/review.log
|
||||
```
|
||||
|
|
@ -203,7 +203,7 @@ If you want real-time notifications and human-in-the-loop escalation:
|
|||
sudo cp lib/matrix_listener.service /etc/systemd/system/
|
||||
sudo systemctl enable --now matrix_listener
|
||||
```
|
||||
3. The factory and gardener will post status updates and escalation threads to the configured room. Reply in-thread to answer escalations.
|
||||
3. The supervisor and gardener will post status updates and escalation threads to the configured room. Reply in-thread to answer escalations.
|
||||
|
||||
## Lifecycle
|
||||
|
||||
|
|
@ -219,7 +219,7 @@ You write issues (with backlog label)
|
|||
→ merge, close issue, clean up
|
||||
|
||||
Meanwhile:
|
||||
factory-poll monitors health, kills stale processes, manages resources
|
||||
supervisor-poll monitors health, kills stale processes, manages resources
|
||||
gardener grooms backlog: closes duplicates, promotes tech-debt, escalates ambiguity
|
||||
planner rebuilds AGENTS.md from git history, gap-analyses against VISION.md
|
||||
```
|
||||
|
|
@ -233,4 +233,4 @@ Meanwhile:
|
|||
| CI stuck | `bash lib/ci-debug.sh` — check Woodpecker. Rate-limited? (exit 128 = wait 15 min) |
|
||||
| Claude not found | `which claude` — must be in PATH. Check `lib/env.sh` adds `~/.local/bin`. |
|
||||
| Merge fails | Branch protection misconfigured? Review bot needs write access to the repo. |
|
||||
| Memory issues | Factory auto-heals at <500 MB free. Check `factory/factory.log` for P0 alerts. |
|
||||
| Memory issues | Supervisor auto-heals at <500 MB free. Check `supervisor/supervisor.log` for P0 alerts. |
|
||||
|
|
|
|||
16
README.md
16
README.md
|
|
@ -9,7 +9,7 @@ Point it at a Codeberg repo with a Woodpecker CI pipeline and it will pick up is
|
|||
## Architecture
|
||||
|
||||
```
|
||||
cron (*/10) ──→ factory-poll.sh ← supervisor (bash checks, zero tokens)
|
||||
cron (*/10) ──→ supervisor-poll.sh ← supervisor (bash checks, zero tokens)
|
||||
├── all clear? → exit 0
|
||||
└── problem? → claude -p (diagnose, fix, or escalate)
|
||||
|
||||
|
|
@ -33,9 +33,9 @@ all agents ──→ matrix_send() ← status updates, escalations, merge no
|
|||
**Required:**
|
||||
|
||||
- [Claude CLI](https://docs.anthropic.com/en/docs/claude-cli) — `claude` in PATH, authenticated
|
||||
- [Codeberg](https://codeberg.org/) account with an API token — the factory reads issues, opens PRs, posts comments, and merges via the Codeberg API
|
||||
- [Codeberg](https://codeberg.org/) account with an API token — disinto reads issues, opens PRs, posts comments, and merges via the Codeberg API
|
||||
- A second Codeberg account for the review bot — reviews posted under a separate identity so the dev-agent doesn't review its own PRs (`REVIEW_BOT_TOKEN`)
|
||||
- [Woodpecker CI](https://woodpecker-ci.org/) — local instance connected to your Codeberg repo; the factory monitors pipelines, retries failures, and queries the Woodpecker Postgres DB directly
|
||||
- [Woodpecker CI](https://woodpecker-ci.org/) — local instance connected to your Codeberg repo; disinto monitors pipelines, retries failures, and queries the Woodpecker Postgres DB directly
|
||||
- PostgreSQL client (`psql`) — for Woodpecker DB queries (pipeline status, build counts)
|
||||
- `jq`, `curl`, `git`
|
||||
|
||||
|
|
@ -84,13 +84,13 @@ CLAUDE_TIMEOUT=7200 # max seconds per Claude invocation (default: 2h)
|
|||
# 3. Install cron (staggered to avoid overlap)
|
||||
crontab -e
|
||||
# Add:
|
||||
# 0,10,20,30,40,50 * * * * /path/to/disinto/factory/factory-poll.sh
|
||||
# 0,10,20,30,40,50 * * * * /path/to/disinto/supervisor/supervisor-poll.sh
|
||||
# 3,13,23,33,43,53 * * * * /path/to/disinto/review/review-poll.sh
|
||||
# 6,16,26,36,46,56 * * * * /path/to/disinto/dev/dev-poll.sh
|
||||
# 15 8 * * * /path/to/disinto/gardener/gardener-poll.sh
|
||||
|
||||
# 4. Verify
|
||||
bash factory/factory-poll.sh # should log "all clear"
|
||||
bash supervisor/supervisor-poll.sh # should log "all clear"
|
||||
```
|
||||
|
||||
## Directory Structure
|
||||
|
|
@ -113,8 +113,8 @@ disinto/
|
|||
├── gardener/
|
||||
│ ├── gardener-poll.sh # Cron entry: backlog grooming
|
||||
│ └── best-practices.md # Gardener knowledge base
|
||||
└── factory/
|
||||
├── factory-poll.sh # Supervisor: health checks + claude -p
|
||||
└── supervisor/
|
||||
├── supervisor-poll.sh # Supervisor: health checks + claude -p
|
||||
├── PROMPT.md # Supervisor's system prompt
|
||||
├── update-prompt.sh # Self-learning: append to best-practices
|
||||
└── best-practices/ # Progressive disclosure knowledge base
|
||||
|
|
@ -131,7 +131,7 @@ disinto/
|
|||
|
||||
| Agent | Trigger | Job |
|
||||
|-------|---------|-----|
|
||||
| **Factory** (supervisor) | Every 10 min | Health checks (RAM, disk, CI, git). Calls Claude only when something is broken. Self-improving via `best-practices/`. |
|
||||
| **Supervisor** | Every 10 min | Health checks (RAM, disk, CI, git). Calls Claude only when something is broken. Self-improving via `best-practices/`. |
|
||||
| **Dev** | Every 10 min | Picks up `backlog`-labeled issues, creates a branch, implements, opens a PR, monitors CI, responds to review, merges. |
|
||||
| **Review** | Every 10 min | Finds PRs without review, runs Claude-powered code review, approves or requests changes. |
|
||||
| **Gardener** | Daily | Grooms the issue backlog: detects duplicates, promotes `tech-debt` to `backlog`, closes stale issues, escalates ambiguous items. |
|
||||
|
|
|
|||
|
|
@ -1106,9 +1106,9 @@ while [ "$REVIEW_ROUND" -lt "$MAX_REVIEW_ROUNDS" ]; do
|
|||
CI_FIX_COUNT=$(( ${CI_FIX_COUNT:-0} + 1 ))
|
||||
if [ "$CI_FIX_COUNT" -gt 2 ]; then
|
||||
log "CI failure not recoverable after ${CI_FIX_COUNT} fix attempts"
|
||||
# Escalate to supervisor — write marker for factory-poll.sh to pick up
|
||||
# Escalate to supervisor — write marker for supervisor-poll.sh to pick up
|
||||
echo "{\"issue\":${ISSUE},\"pr\":${PR_NUMBER},\"reason\":\"ci_exhausted\",\"step\":\"${FAILED_STEP:-unknown}\",\"attempts\":${CI_FIX_COUNT},\"ts\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"}" \
|
||||
>> "${FACTORY_ROOT}/factory/escalations.jsonl"
|
||||
>> "${FACTORY_ROOT}/supervisor/escalations.jsonl"
|
||||
log "escalated to supervisor via escalations.jsonl"
|
||||
break
|
||||
fi
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
#!/usr/bin/env bash
|
||||
# dev-poll.sh — Pull-based factory: find the next ready issue and start dev-agent
|
||||
# dev-poll.sh — Pull-based scheduler: find the next ready issue and start dev-agent
|
||||
#
|
||||
# Pull system: issues labeled "backlog" are candidates. An issue is READY when
|
||||
# ALL its dependency issues are closed (and their PRs merged).
|
||||
|
|
@ -104,7 +104,7 @@ dep_is_merged() {
|
|||
return 1
|
||||
fi
|
||||
|
||||
# Issue closed = dep satisfied. The factory only closes issues after
|
||||
# Issue closed = dep satisfied. The scheduler only closes issues after
|
||||
# merging, so closed state is trustworthy. No need to hunt for the
|
||||
# specific PR — that was over-engineering that caused false negatives.
|
||||
return 0
|
||||
|
|
|
|||
|
|
@ -1,11 +1,11 @@
|
|||
#!/usr/bin/env bash
|
||||
# matrix_listener.sh — Long-poll Matrix sync daemon
|
||||
#
|
||||
# Listens for replies in the factory Matrix room and dispatches them
|
||||
# Listens for replies in the Matrix coordination room and dispatches them
|
||||
# to the appropriate agent via well-known files.
|
||||
#
|
||||
# Dispatch:
|
||||
# Thread reply to [supervisor] message → /tmp/factory-escalation-reply
|
||||
# Thread reply to [supervisor] message → /tmp/supervisor-escalation-reply
|
||||
# Thread reply to [gardener] message → /tmp/gardener-escalation-reply
|
||||
#
|
||||
# Run as systemd service (see matrix_listener.service) or manually:
|
||||
|
|
@ -18,7 +18,7 @@ source "$(dirname "$0")/../lib/env.sh"
|
|||
|
||||
SINCE_FILE="/tmp/matrix-listener-since"
|
||||
THREAD_MAP="${MATRIX_THREAD_MAP:-/tmp/matrix-thread-map}"
|
||||
LOGFILE="${FACTORY_ROOT}/factory/matrix-listener.log"
|
||||
LOGFILE="${FACTORY_ROOT}/supervisor/matrix-listener.log"
|
||||
SYNC_TIMEOUT=30000 # 30s long-poll
|
||||
BACKOFF=5
|
||||
MAX_BACKOFF=60
|
||||
|
|
@ -133,7 +133,7 @@ while true; do
|
|||
|
||||
case "$AGENT" in
|
||||
supervisor)
|
||||
printf '%s\t%s\t%s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$SENDER" "$BODY" >> /tmp/factory-escalation-reply
|
||||
printf '%s\t%s\t%s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$SENDER" "$BODY" >> /tmp/supervisor-escalation-reply
|
||||
# Acknowledge
|
||||
matrix_send "supervisor" "✓ received, will act on next poll" "$THREAD_ROOT" >/dev/null 2>&1 || true
|
||||
;;
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
# Factory Supervisor
|
||||
# Supervisor Agent
|
||||
|
||||
You are the factory supervisor for `$CODEBERG_REPO`. You were called because
|
||||
`factory-poll.sh` detected an issue it couldn't auto-fix.
|
||||
You are the supervisor agent for `$CODEBERG_REPO`. You were called because
|
||||
`supervisor-poll.sh` detected an issue it couldn't auto-fix.
|
||||
|
||||
## Priority Order
|
||||
|
||||
|
|
@ -16,13 +16,13 @@ You are the factory supervisor for `$CODEBERG_REPO`. You were called because
|
|||
Fix the issue yourself. You have full shell access and `--dangerously-skip-permissions`.
|
||||
|
||||
Before acting, read the relevant best-practices file:
|
||||
- Memory issues → `cat ${FACTORY_ROOT}/factory/best-practices/memory.md`
|
||||
- Disk issues → `cat ${FACTORY_ROOT}/factory/best-practices/disk.md`
|
||||
- CI issues → `cat ${FACTORY_ROOT}/factory/best-practices/ci.md`
|
||||
- Codeberg / rate limits → `cat ${FACTORY_ROOT}/factory/best-practices/codeberg.md`
|
||||
- Dev-agent issues → `cat ${FACTORY_ROOT}/factory/best-practices/dev-agent.md`
|
||||
- Review-agent issues → `cat ${FACTORY_ROOT}/factory/best-practices/review-agent.md`
|
||||
- Git issues → `cat ${FACTORY_ROOT}/factory/best-practices/git.md`
|
||||
- Memory issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/memory.md`
|
||||
- Disk issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/disk.md`
|
||||
- CI issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/ci.md`
|
||||
- Codeberg / rate limits → `cat ${FACTORY_ROOT}/supervisor/best-practices/codeberg.md`
|
||||
- Dev-agent issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/dev-agent.md`
|
||||
- Review-agent issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/review-agent.md`
|
||||
- Git issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/git.md`
|
||||
|
||||
## Credentials & API Access
|
||||
|
||||
|
|
@ -66,6 +66,6 @@ ESCALATE: <what's wrong>
|
|||
|
||||
If you discover something new, append it to the relevant best-practices file:
|
||||
```bash
|
||||
bash ${FACTORY_ROOT}/factory/update-prompt.sh "best-practices/<file>.md" "### Lesson title
|
||||
bash ${FACTORY_ROOT}/supervisor/update-prompt.sh "best-practices/<file>.md" "### Lesson title
|
||||
Description of what you learned."
|
||||
```
|
||||
|
|
@ -22,7 +22,7 @@ cd <worktree> && git commit --allow-empty -m "ci: retrigger" --no-verify && git
|
|||
```
|
||||
|
||||
### Prevention
|
||||
- The factory runs 3 agents staggered by 3 minutes. During heavy development, many PRs trigger CI simultaneously.
|
||||
- The system runs 3 agents staggered by 3 minutes. During heavy development, many PRs trigger CI simultaneously.
|
||||
- One pipeline at a time is ideal on this VPS (resource + rate limit reasons).
|
||||
- If >3 pipelines are pending/running, do NOT create more work.
|
||||
|
||||
|
|
@ -44,12 +44,12 @@ DO NOT try to find the specific PR that closed an issue. This is over-engineerin
|
|||
- Codeberg shares issue/PR numbering — no guaranteed relationship
|
||||
- PRs don't always mention the issue number in title/body
|
||||
- Searching last N closed PRs misses older merges
|
||||
- The factory itself closes issues after merging, so closed = merged
|
||||
- The dev-agent closes issues after merging, so closed = merged
|
||||
|
||||
The only check needed: `issue.state == "closed"`.
|
||||
|
||||
### False Positive: Status Unchanged Alert
|
||||
The factory-poll alert 'status unchanged for Nmin' is a false positive for complex implementation tasks. The status is set to 'claude assessing + implementing' at the START of the `timeout 7200 claude -p ...` call and only updates after Claude finishes. Normal complex tasks (multi-file Solidity changes + forge test) take 45-90 minutes. To distinguish a false positive from a real stuck agent: check that the claude PID is alive (`ps -p <PID>`), consuming CPU (>0%), and has active threads (`pstree -p <PID>`). If the process is alive and using CPU, do NOT restart it — this wastes completed work.
|
||||
The supervisor-poll alert 'status unchanged for Nmin' is a false positive for complex implementation tasks. The status is set to 'claude assessing + implementing' at the START of the `timeout 7200 claude -p ...` call and only updates after Claude finishes. Normal complex tasks (multi-file Solidity changes + forge test) take 45-90 minutes. To distinguish a false positive from a real stuck agent: check that the claude PID is alive (`ps -p <PID>`), consuming CPU (>0%), and has active threads (`pstree -p <PID>`). If the process is alive and using CPU, do NOT restart it — this wastes completed work.
|
||||
|
||||
### False Positive: 'Waiting for CI + Review' Alert
|
||||
The 'status unchanged for Nmin' alert is also a false positive when status is 'waiting for CI + review on PR #N (round R)'. This is an intentional sleep/poll loop — the agent is waiting for CI to pass and then for review-poll to post a review. CI can take 20–40 minutes; review follows. Do NOT restart the agent. Confirm by checking: (1) agent PID is alive, (2) CI commit status via `codeberg_api GET /commits/<sha>/status`, (3) review-poll log shows it will pick up the PR on next cycle.
|
||||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
## Safe Fixes
|
||||
- Docker cleanup: `sudo docker system prune -f` (keeps images, removes stopped containers + dangling layers)
|
||||
- Truncate factory logs >5MB: `truncate -s 0 <file>`
|
||||
- Truncate supervisor logs >5MB: `truncate -s 0 <file>`
|
||||
- Remove stale worktrees: check `/tmp/${PROJECT_NAME}-worktree-*`, only if dev-agent not running on them
|
||||
- Woodpecker log_entries: `DELETE FROM log_entries WHERE id < (SELECT max(id) - 100000 FROM log_entries);` then `VACUUM;`
|
||||
- Node module caches in worktrees: `rm -rf /tmp/${PROJECT_NAME}-worktree-*/node_modules/`
|
||||
|
|
@ -34,7 +34,7 @@
|
|||
|
||||
## Known Issues
|
||||
- Main repo MUST be on $PRIMARY_BRANCH at all times. Dev work happens in worktrees.
|
||||
- Stale rebases (detached HEAD) break all worktree creation — silent factory stall.
|
||||
- Stale rebases (detached HEAD) break all worktree creation — silent pipeline stall.
|
||||
- `git worktree add` fails if target directory exists (even empty). Remove first.
|
||||
- Many old branches exist locally (100+). Normal — don't bulk-delete.
|
||||
|
||||
|
|
@ -47,7 +47,7 @@
|
|||
|
||||
## Lessons Learned
|
||||
- NEVER delete remote branches before confirming merge. Close PR, rebase locally, force-push if needed.
|
||||
- Stale rebase caused 5h factory stall once (2026-03-11). Auto-heal added to dev-agent.
|
||||
- Stale rebase caused 5h pipeline stall once (2026-03-11). Auto-heal added to dev-agent.
|
||||
- lint-staged hooks fail when `forge` not in PATH. Use `--no-verify` when committing from scripts.
|
||||
|
||||
### PR #608 Post-Mortem (2026-03-12/13)
|
||||
|
|
@ -19,7 +19,7 @@
|
|||
- **Hallucinated findings** — bot may flag non-issues. This needs Clawy's judgment — escalate.
|
||||
|
||||
## Monitoring
|
||||
- Unreviewed PRs with CI pass for >1h → factory-poll.sh auto-triggers review
|
||||
- Unreviewed PRs with CI pass for >1h → supervisor-poll.sh auto-triggers review
|
||||
- Review errors should resolve on next poll cycle
|
||||
- If same PR fails review 3+ times → likely a prompt issue, escalate
|
||||
|
||||
|
|
@ -1,20 +1,20 @@
|
|||
#!/usr/bin/env bash
|
||||
# factory-poll.sh — Factory supervisor: bash checks + claude -p for fixes
|
||||
# supervisor-poll.sh — Supervisor agent: bash checks + claude -p for fixes
|
||||
#
|
||||
# Runs every 10min via cron. Does all health checks in bash (zero tokens).
|
||||
# Only invokes claude -p when auto-fix fails or issue is complex.
|
||||
#
|
||||
# Cron: */10 * * * * /path/to/disinto/factory/factory-poll.sh
|
||||
# Cron: */10 * * * * /path/to/disinto/supervisor/supervisor-poll.sh
|
||||
#
|
||||
# Peek: cat /tmp/factory-status
|
||||
# Log: tail -f /path/to/disinto/factory/factory.log
|
||||
# Peek: cat /tmp/supervisor-status
|
||||
# Log: tail -f /path/to/disinto/supervisor/supervisor.log
|
||||
|
||||
source "$(dirname "$0")/../lib/env.sh"
|
||||
|
||||
LOGFILE="${FACTORY_ROOT}/factory/factory.log"
|
||||
STATUSFILE="/tmp/factory-status"
|
||||
LOCKFILE="/tmp/factory-poll.lock"
|
||||
PROMPT_FILE="${FACTORY_ROOT}/factory/PROMPT.md"
|
||||
LOGFILE="${FACTORY_ROOT}/supervisor/supervisor.log"
|
||||
STATUSFILE="/tmp/supervisor-status"
|
||||
LOCKFILE="/tmp/supervisor-poll.lock"
|
||||
PROMPT_FILE="${FACTORY_ROOT}/supervisor/PROMPT.md"
|
||||
|
||||
# Prevent overlapping runs
|
||||
if [ -f "$LOCKFILE" ]; then
|
||||
|
|
@ -32,15 +32,15 @@ flog() {
|
|||
}
|
||||
|
||||
status() {
|
||||
printf '[%s] factory: %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" > "$STATUSFILE"
|
||||
printf '[%s] supervisor: %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" > "$STATUSFILE"
|
||||
flog "$*"
|
||||
}
|
||||
|
||||
# ── Check for escalation replies from Matrix ──────────────────────────────
|
||||
ESCALATION_REPLY=""
|
||||
if [ -s /tmp/factory-escalation-reply ]; then
|
||||
ESCALATION_REPLY=$(cat /tmp/factory-escalation-reply)
|
||||
rm -f /tmp/factory-escalation-reply
|
||||
if [ -s /tmp/supervisor-escalation-reply ]; then
|
||||
ESCALATION_REPLY=$(cat /tmp/supervisor-escalation-reply)
|
||||
rm -f /tmp/supervisor-escalation-reply
|
||||
flog "Got escalation reply: $(echo "$ESCALATION_REPLY" | head -1)"
|
||||
fi
|
||||
|
||||
|
|
@ -71,7 +71,7 @@ SWAP_USED_MB=$(free -m | awk '/Swap:/{print $3}')
|
|||
if [ "${AVAIL_MB:-9999}" -lt 500 ] || { [ "${SWAP_USED_MB:-0}" -gt 3000 ] && [ "${AVAIL_MB:-9999}" -lt 2000 ]; }; then
|
||||
flog "MEMORY CRISIS: avail=${AVAIL_MB}MB swap_used=${SWAP_USED_MB}MB — auto-fixing"
|
||||
|
||||
# Kill stale factory-spawned claude processes (>3h old) — skip interactive sessions
|
||||
# Kill stale agent-spawned claude processes (>3h old) — skip interactive sessions
|
||||
STALE_CLAUDES=$(pgrep -f "claude -p" --older 10800 2>/dev/null || true)
|
||||
if [ -n "$STALE_CLAUDES" ]; then
|
||||
echo "$STALE_CLAUDES" | xargs kill 2>/dev/null || true
|
||||
|
|
@ -113,7 +113,7 @@ if [ "${DISK_PERCENT:-0}" -gt 80 ]; then
|
|||
# Docker cleanup (safe — keeps images)
|
||||
sudo docker system prune -f >/dev/null 2>&1 && fixed "Docker prune"
|
||||
|
||||
# Truncate factory logs >10MB
|
||||
# Truncate supervisor logs >10MB
|
||||
for logfile in "${FACTORY_ROOT}"/{dev,review,factory}/*.log; do
|
||||
if [ -f "$logfile" ]; then
|
||||
SIZE_KB=$(du -k "$logfile" 2>/dev/null | cut -f1)
|
||||
|
|
@ -159,7 +159,7 @@ fi
|
|||
# =============================================================================
|
||||
# P2: FACTORY STOPPED — CI, dev-agent, git
|
||||
# =============================================================================
|
||||
status "P2: checking factory"
|
||||
status "P2: checking pipeline"
|
||||
|
||||
# CI stuck
|
||||
STUCK_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='running' AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs || true)
|
||||
|
|
@ -204,7 +204,7 @@ fi
|
|||
# =============================================================================
|
||||
# P2b: FACTORY STALLED — backlog exists but no agent running
|
||||
# =============================================================================
|
||||
status "P2: checking factory stall"
|
||||
status "P2: checking pipeline stall"
|
||||
|
||||
BACKLOG_COUNT=$(codeberg_api GET "/issues?state=open&labels=backlog&type=issues&limit=1" 2>/dev/null | jq -r 'length' 2>/dev/null || echo "0")
|
||||
IN_PROGRESS=$(codeberg_api GET "/issues?state=open&labels=in-progress&type=issues&limit=1" 2>/dev/null | jq -r 'length' 2>/dev/null || echo "0")
|
||||
|
|
@ -221,7 +221,7 @@ if [ "${BACKLOG_COUNT:-0}" -gt 0 ] && [ "${IN_PROGRESS:-0}" -eq 0 ]; then
|
|||
IDLE_MIN=$(( (NOW_EPOCH - LAST_LOG_EPOCH) / 60 ))
|
||||
|
||||
if [ "$IDLE_MIN" -gt 20 ]; then
|
||||
p2 "Factory stalled: ${BACKLOG_COUNT} backlog issue(s), no agent ran for ${IDLE_MIN}min"
|
||||
p2 "Pipeline stalled: ${BACKLOG_COUNT} backlog issue(s), no agent ran for ${IDLE_MIN}min"
|
||||
fi
|
||||
fi
|
||||
|
||||
|
|
@ -277,7 +277,7 @@ done
|
|||
# P4: HOUSEKEEPING — stale processes
|
||||
# =============================================================================
|
||||
# Check for dev-agent escalations
|
||||
ESCALATION_FILE="${FACTORY_ROOT}/factory/escalations.jsonl"
|
||||
ESCALATION_FILE="${FACTORY_ROOT}/supervisor/escalations.jsonl"
|
||||
if [ -s "$ESCALATION_FILE" ]; then
|
||||
ESCALATION_COUNT=$(wc -l < "$ESCALATION_FILE")
|
||||
p3 "Dev-agent escalated ${ESCALATION_COUNT} issue(s) — see ${ESCALATION_FILE}"
|
||||
|
|
@ -285,7 +285,7 @@ fi
|
|||
|
||||
status "P4: housekeeping"
|
||||
|
||||
# Stale factory-spawned claude processes (>3h, not caught by P0) — skip interactive sessions
|
||||
# Stale agent-spawned claude processes (>3h, not caught by P0) — skip interactive sessions
|
||||
STALE_CLAUDES=$(pgrep -f "claude -p" --older 10800 2>/dev/null || true)
|
||||
if [ -n "$STALE_CLAUDES" ]; then
|
||||
echo "$STALE_CLAUDES" | xargs kill 2>/dev/null || true
|
||||
|
|
@ -308,7 +308,7 @@ for wt in /tmp/${PROJECT_NAME}-worktree-* /tmp/${PROJECT_NAME}-review-*; do
|
|||
done
|
||||
git -C "$PROJECT_REPO_ROOT" worktree prune 2>/dev/null || true
|
||||
|
||||
# Rotate factory log if >5MB
|
||||
# Rotate supervisor log if >5MB
|
||||
for logfile in "${FACTORY_ROOT}"/{dev,review,factory}/*.log; do
|
||||
if [ -f "$logfile" ]; then
|
||||
SIZE_KB=$(du -k "$logfile" 2>/dev/null | cut -f1)
|
||||
|
|
@ -329,12 +329,12 @@ if [ -n "$ALL_ALERTS" ]; then
|
|||
ALERT_TEXT=$(echo -e "$ALL_ALERTS")
|
||||
|
||||
# Notify Matrix
|
||||
matrix_send "supervisor" "⚠️ Factory alerts:
|
||||
matrix_send "supervisor" "⚠️ Supervisor alerts:
|
||||
${ALERT_TEXT}" 2>/dev/null || true
|
||||
|
||||
flog "Invoking claude -p for alerts"
|
||||
|
||||
CLAUDE_PROMPT="$(cat "$PROMPT_FILE" 2>/dev/null || echo "You are a factory supervisor. Fix the issue below.")
|
||||
CLAUDE_PROMPT="$(cat "$PROMPT_FILE" 2>/dev/null || echo "You are a supervisor agent. Fix the issue below.")
|
||||
|
||||
## Current Alerts
|
||||
${ALERT_TEXT}
|
||||
|
|
@ -2,15 +2,15 @@
|
|||
# update-prompt.sh — Append a lesson to a best-practices file
|
||||
#
|
||||
# Usage:
|
||||
# ./factory/update-prompt.sh "best-practices/memory.md" "### Title\nBody text"
|
||||
# ./factory/update-prompt.sh --from-file "best-practices/memory.md" /tmp/lesson.md
|
||||
# ./supervisor/update-prompt.sh "best-practices/memory.md" "### Title\nBody text"
|
||||
# ./supervisor/update-prompt.sh --from-file "best-practices/memory.md" /tmp/lesson.md
|
||||
#
|
||||
# Called by claude -p when it learns something during a fix.
|
||||
# Commits and pushes the update to the disinto repo.
|
||||
|
||||
source "$(dirname "$0")/../lib/env.sh"
|
||||
|
||||
TARGET_FILE="${FACTORY_ROOT}/factory/$1"
|
||||
TARGET_FILE="${FACTORY_ROOT}/supervisor/$1"
|
||||
shift
|
||||
|
||||
if [ "$1" = "--from-file" ] && [ -f "$2" ]; then
|
||||
|
|
@ -40,8 +40,8 @@ else
|
|||
fi
|
||||
|
||||
cd "$FACTORY_ROOT"
|
||||
git add "factory/$1" 2>/dev/null || git add "$TARGET_FILE"
|
||||
git commit -m "factory: learned — $(echo "$LESSON" | head -1 | sed 's/^#* *//')" --no-verify 2>/dev/null
|
||||
git add "supervisor/$1" 2>/dev/null || git add "$TARGET_FILE"
|
||||
git commit -m "supervisor: learned — $(echo "$LESSON" | head -1 | sed 's/^#* *//')" --no-verify 2>/dev/null
|
||||
git push origin main 2>/dev/null
|
||||
|
||||
log "Updated $(basename "$TARGET_FILE") with new lesson"
|
||||
Loading…
Add table
Add a link
Reference in a new issue