refactor: rename factory/ → supervisor/, factory-poll → supervisor-poll
The supervisor agent was confusingly named "factory" (same as the project). Rename directory, script, log, lock, status, and escalation files. Update all references across scripts and docs. FACTORY_ROOT env var unchanged (refers to project root, not agent). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
8d73c2f8f9
commit
77cb4c4643
15 changed files with 68 additions and 68 deletions
71
supervisor/PROMPT.md
Normal file
71
supervisor/PROMPT.md
Normal file
|
|
@ -0,0 +1,71 @@
|
|||
# Supervisor Agent
|
||||
|
||||
You are the supervisor agent for `$CODEBERG_REPO`. You were called because
|
||||
`supervisor-poll.sh` detected an issue it couldn't auto-fix.
|
||||
|
||||
## Priority Order
|
||||
|
||||
1. **P0 — Memory crisis:** RAM <500MB or swap >3GB
|
||||
2. **P1 — Disk pressure:** Disk >80%
|
||||
3. **P2 — Factory stopped:** Dev-agent dead, CI down, git broken
|
||||
4. **P3 — Factory degraded:** Derailed PR, stuck pipeline, unreviewed PRs
|
||||
5. **P4 — Housekeeping:** Stale processes, log rotation
|
||||
|
||||
## What You Can Do
|
||||
|
||||
Fix the issue yourself. You have full shell access and `--dangerously-skip-permissions`.
|
||||
|
||||
Before acting, read the relevant best-practices file:
|
||||
- Memory issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/memory.md`
|
||||
- Disk issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/disk.md`
|
||||
- CI issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/ci.md`
|
||||
- Codeberg / rate limits → `cat ${FACTORY_ROOT}/supervisor/best-practices/codeberg.md`
|
||||
- Dev-agent issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/dev-agent.md`
|
||||
- Review-agent issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/review-agent.md`
|
||||
- Git issues → `cat ${FACTORY_ROOT}/supervisor/best-practices/git.md`
|
||||
|
||||
## Credentials & API Access
|
||||
|
||||
Environment variables are set. Source the helper library for convenience functions:
|
||||
```bash
|
||||
source ${FACTORY_ROOT}/lib/env.sh
|
||||
```
|
||||
|
||||
This gives you:
|
||||
- `codeberg_api GET "/pulls?state=open"` — Codeberg API (uses $CODEBERG_TOKEN)
|
||||
- `wpdb -c "SELECT ..."` — Woodpecker Postgres (uses $WOODPECKER_DB_PASSWORD)
|
||||
- `woodpecker_api "/repos/$WOODPECKER_REPO_ID/pipelines"` — Woodpecker REST API (uses $WOODPECKER_TOKEN)
|
||||
- `$REVIEW_BOT_TOKEN` — for posting reviews as the review_bot account
|
||||
- `$PROJECT_REPO_ROOT` — path to the target project repo
|
||||
- `$PROJECT_NAME` — short project name (for worktree prefixes, container names)
|
||||
- `$PRIMARY_BRANCH` — main branch (master or main)
|
||||
- `$FACTORY_ROOT` — path to the disinto repo
|
||||
- `matrix_send <prefix> <message>` — send notifications to the Matrix coordination room
|
||||
|
||||
## Escalation
|
||||
|
||||
If you can't fix it, escalate via Matrix:
|
||||
```bash
|
||||
source ${FACTORY_ROOT}/lib/env.sh
|
||||
matrix_send "supervisor" "🏭 ESCALATE: <what's wrong and why you can't fix it>"
|
||||
```
|
||||
|
||||
Do NOT escalate if you can fix it. Do NOT ask permission. Fix first, report after.
|
||||
|
||||
## Output
|
||||
|
||||
```
|
||||
FIXED: <what you did>
|
||||
```
|
||||
or
|
||||
```
|
||||
ESCALATE: <what's wrong>
|
||||
```
|
||||
|
||||
## Learning
|
||||
|
||||
If you discover something new, append it to the relevant best-practices file:
|
||||
```bash
|
||||
bash ${FACTORY_ROOT}/supervisor/update-prompt.sh "best-practices/<file>.md" "### Lesson title
|
||||
Description of what you learned."
|
||||
```
|
||||
40
supervisor/best-practices/ci.md
Normal file
40
supervisor/best-practices/ci.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# CI Best Practices
|
||||
|
||||
## Environment
|
||||
- Woodpecker CI at localhost:8000 (Docker backend)
|
||||
- Postgres DB: use `wpdb` helper from env.sh
|
||||
- Woodpecker API: use `woodpecker_api` helper from env.sh
|
||||
- Example (harb): CI images pre-built at `registry.niovi.voyage/harb/*:latest`
|
||||
|
||||
## Safe Fixes
|
||||
- Retrigger CI: push empty commit to PR branch
|
||||
```bash
|
||||
cd /tmp/${PROJECT_NAME}-worktree-<issue> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
|
||||
```
|
||||
- Restart woodpecker-agent: `sudo systemctl restart woodpecker-agent`
|
||||
- View pipeline status: `wpdb -c "SELECT number, status FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID ORDER BY number DESC LIMIT 5;"`
|
||||
- View failed steps: `bash ${FACTORY_ROOT}/lib/ci-debug.sh failures <pipeline-number>`
|
||||
- View step logs: `bash ${FACTORY_ROOT}/lib/ci-debug.sh logs <pipeline-number> <step-name>`
|
||||
|
||||
## Dangerous (escalate)
|
||||
- Restarting woodpecker-server (drops all running pipelines)
|
||||
- Modifying pipeline configs in `.woodpecker/` directory
|
||||
|
||||
## Known Issues
|
||||
- Codeberg rate-limits SSH clones. `git` step fails with exit 128. Retrigger usually works.
|
||||
- `log_entries` table grows fast (was 5.6GB once). Truncate periodically.
|
||||
- Example (harb): Running CI + harb stack = 14+ containers on 8GB. Memory pressure is real.
|
||||
- CI images take hours to rebuild. Never run `docker system prune -a`.
|
||||
|
||||
## Lessons Learned
|
||||
- Exit code 128 on git step = Codeberg rate limit, not a code problem. Retrigger.
|
||||
- Exit code 137 = OOM kill. Check memory, kill stale processes, retrigger.
|
||||
- `node-quality` step fails on eslint/typescript errors — these need code fixes, not CI fixes.
|
||||
|
||||
### Example (harb): FEE_DEST address must match DeployLocal.sol
|
||||
When DeployLocal.sol changes the feeDest address, bootstrap-common.sh must also be updated.
|
||||
Current feeDest = keccak256('harb.local.feeDest') = 0x8A9145E1Ea4C4d7FB08cF1011c8ac1F0e10F9383.
|
||||
Symptom: bootstrap step exits 1 after 'Granting recenter access to deployer' with no error — setRecenterAccess reverts because wrong address is impersonated.
|
||||
|
||||
### Example (harb): keccak-derived FEE_DEST requires anvil_setBalance before impersonation
|
||||
When FEE_DEST is a keccak-derived address (e.g. keccak256('harb.local.feeDest')), it has zero ETH balance. Any function that calls `anvil_impersonateAccount` then `cast send --from $FEE_DEST --unlocked` will fail silently (output redirected to LOG_FILE) but exit 1 due to gas deduction failure. Fix: add `cast rpc anvil_setBalance "$FEE_DEST" "0xDE0B6B3A7640000"` before impersonation. Applied in both bootstrap-common.sh and red-team.sh.
|
||||
36
supervisor/best-practices/codeberg.md
Normal file
36
supervisor/best-practices/codeberg.md
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
# Codeberg Best Practices
|
||||
|
||||
## Rate Limiting
|
||||
Codeberg rate-limits SSH and HTTPS clones. Symptoms:
|
||||
- Woodpecker `git` step fails with exit code 128
|
||||
- Multiple pipelines fail in quick succession with the same error
|
||||
- Retriggers make it WORSE by adding more clone attempts
|
||||
|
||||
### What To Do
|
||||
- **Do NOT retrigger** during a rate-limit storm. Wait 10-15 minutes.
|
||||
- Check if multiple pipelines failed on `git` step recently:
|
||||
```bash
|
||||
wpdb -c "SELECT number, status, to_timestamp(started) FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID AND status='failure' ORDER BY number DESC LIMIT 5;"
|
||||
wpdb -c "SELECT s.name, s.exit_code FROM steps s JOIN pipelines p ON s.pipeline_id=p.id WHERE p.number=<N> AND p.repo_id=$WOODPECKER_REPO_ID AND s.state='failure';"
|
||||
```
|
||||
- If multiple `git` failures with exit 128 in the last 15 min → it's rate limiting. Wait.
|
||||
- Only retrigger after 15+ minutes of no CI activity.
|
||||
|
||||
### How To Retrigger Safely
|
||||
```bash
|
||||
cd <worktree> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
|
||||
```
|
||||
|
||||
### Prevention
|
||||
- The system runs 3 agents staggered by 3 minutes. During heavy development, many PRs trigger CI simultaneously.
|
||||
- One pipeline at a time is ideal on this VPS (resource + rate limit reasons).
|
||||
- If >3 pipelines are pending/running, do NOT create more work.
|
||||
|
||||
## OAuth Tokens
|
||||
- OAuth tokens expire ~2h. If Codeberg is down during refresh, re-login required.
|
||||
- API token is in `~/.netrc` — read via `awk` in env.sh.
|
||||
- Review bot has a separate token ($REVIEW_BOT_TOKEN) for formal reviews.
|
||||
|
||||
## Lessons Learned
|
||||
- Retrigger storm on 2026-03-12: supervisor + dev-agent both retriggered during rate limit, caused 5+ failed pipelines. Added cooldown awareness.
|
||||
- Empty commit retrigger works but adds noise to git history. Acceptable tradeoff.
|
||||
55
supervisor/best-practices/dev-agent.md
Normal file
55
supervisor/best-practices/dev-agent.md
Normal file
|
|
@ -0,0 +1,55 @@
|
|||
# Dev-Agent Best Practices
|
||||
|
||||
## Architecture
|
||||
- `dev-poll.sh` (cron */10) → finds ready backlog issues → spawns `dev-agent.sh`
|
||||
- `dev-agent.sh` uses `claude -p` for implementation, runs in git worktree
|
||||
- Lock file: `/tmp/dev-agent.lock` (contains PID)
|
||||
- Status file: `/tmp/dev-agent-status`
|
||||
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue-number>/`
|
||||
|
||||
## Safe Fixes
|
||||
- Remove stale lock: `rm -f /tmp/dev-agent.lock` (only if PID is dead)
|
||||
- Kill stuck agent: `kill <pid>` then clean lock
|
||||
- Restart on derailed PR: `bash ${FACTORY_ROOT}/dev/dev-agent.sh <issue-number> &`
|
||||
- Clean worktree: `cd $PROJECT_REPO_ROOT && git worktree remove /tmp/${PROJECT_NAME}-worktree-<N> --force`
|
||||
- Remove `in-progress` label if agent died without cleanup:
|
||||
```bash
|
||||
codeberg_api DELETE "/issues/<N>/labels/in-progress"
|
||||
```
|
||||
|
||||
## Dangerous (escalate)
|
||||
- Restarting agent on an issue that has an open PR with review changes — may lose context
|
||||
- Anything that modifies the PR branch history
|
||||
- Closing PRs or issues
|
||||
|
||||
## Known Issues
|
||||
- `claude -p -c` (continue) fails if session was compacted — falls back to fresh `-p`
|
||||
- CI_FIX_COUNT is now reset on CI pass (fixed 2026-03-12), so each review phase gets fresh CI fix budget
|
||||
- Worktree creation fails if main repo has stale rebase — auto-heals now
|
||||
- Large text in jq `--arg` can break — write to file first
|
||||
- `$([ "$VAR" = true ] && echo "...")` crashes under `set -euo pipefail`
|
||||
|
||||
## Lessons Learned
|
||||
- Agents don't have memory between tasks — full context must be in the prompt
|
||||
- Prior art injection (closed PR diffs) prevents rework
|
||||
- Feature issues MUST list affected e2e test files
|
||||
- CI fix loop is essential — first attempt rarely works
|
||||
- CLAUDE_TIMEOUT=7200 (2h) is needed for complex issues
|
||||
|
||||
## Dependency Resolution
|
||||
|
||||
**Trust closed state.** If a dependency issue is closed, the code is on the primary branch. Period.
|
||||
|
||||
DO NOT try to find the specific PR that closed an issue. This is over-engineering that causes false negatives:
|
||||
- Codeberg shares issue/PR numbering — no guaranteed relationship
|
||||
- PRs don't always mention the issue number in title/body
|
||||
- Searching last N closed PRs misses older merges
|
||||
- The dev-agent closes issues after merging, so closed = merged
|
||||
|
||||
The only check needed: `issue.state == "closed"`.
|
||||
|
||||
### False Positive: Status Unchanged Alert
|
||||
The supervisor-poll alert 'status unchanged for Nmin' is a false positive for complex implementation tasks. The status is set to 'claude assessing + implementing' at the START of the `timeout 7200 claude -p ...` call and only updates after Claude finishes. Normal complex tasks (multi-file Solidity changes + forge test) take 45-90 minutes. To distinguish a false positive from a real stuck agent: check that the claude PID is alive (`ps -p <PID>`), consuming CPU (>0%), and has active threads (`pstree -p <PID>`). If the process is alive and using CPU, do NOT restart it — this wastes completed work.
|
||||
|
||||
### False Positive: 'Waiting for CI + Review' Alert
|
||||
The 'status unchanged for Nmin' alert is also a false positive when status is 'waiting for CI + review on PR #N (round R)'. This is an intentional sleep/poll loop — the agent is waiting for CI to pass and then for review-poll to post a review. CI can take 20–40 minutes; review follows. Do NOT restart the agent. Confirm by checking: (1) agent PID is alive, (2) CI commit status via `codeberg_api GET /commits/<sha>/status`, (3) review-poll log shows it will pick up the PR on next cycle.
|
||||
24
supervisor/best-practices/disk.md
Normal file
24
supervisor/best-practices/disk.md
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
# Disk Best Practices
|
||||
|
||||
## Safe Fixes
|
||||
- Docker cleanup: `sudo docker system prune -f` (keeps images, removes stopped containers + dangling layers)
|
||||
- Truncate supervisor logs >5MB: `truncate -s 0 <file>`
|
||||
- Remove stale worktrees: check `/tmp/${PROJECT_NAME}-worktree-*`, only if dev-agent not running on them
|
||||
- Woodpecker log_entries: `DELETE FROM log_entries WHERE id < (SELECT max(id) - 100000 FROM log_entries);` then `VACUUM;`
|
||||
- Node module caches in worktrees: `rm -rf /tmp/${PROJECT_NAME}-worktree-*/node_modules/`
|
||||
- Git garbage collection: `cd $PROJECT_REPO_ROOT && git gc --prune=now`
|
||||
|
||||
## Dangerous (escalate)
|
||||
- `docker system prune -a --volumes` — deletes ALL images including CI build cache
|
||||
- Deleting anything in `$PROJECT_REPO_ROOT/` that's tracked by git
|
||||
- Truncating Woodpecker DB tables other than log_entries
|
||||
|
||||
## Known Disk Hogs
|
||||
- Woodpecker `log_entries` table: grows to 5GB+. Truncate periodically.
|
||||
- Docker overlay layers: survive normal prune. `-a` variant kills everything.
|
||||
- Git worktrees in /tmp: accumulate node_modules, build artifacts
|
||||
- Forge cache in `~/.foundry/cache/`: can grow large with many compilations
|
||||
|
||||
## Lessons Learned
|
||||
- After truncating log_entries, run VACUUM FULL (reclaims actual disk space)
|
||||
- Docker ghost overlay layers need `prune -a` but that kills CI images — only do this if truly desperate
|
||||
61
supervisor/best-practices/git.md
Normal file
61
supervisor/best-practices/git.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
# Git Best Practices
|
||||
|
||||
## Environment
|
||||
- Repo: `$PROJECT_REPO_ROOT`, remote: `$PROJECT_REMOTE`
|
||||
- Branch: `$PRIMARY_BRANCH` (protected — no direct push, PRs only)
|
||||
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue>/`
|
||||
|
||||
## Safe Fixes
|
||||
- Abort stale rebase: `cd $PROJECT_REPO_ROOT && git rebase --abort`
|
||||
- Switch to $PRIMARY_BRANCH: `git checkout $PRIMARY_BRANCH`
|
||||
- Prune worktrees: `git worktree prune`
|
||||
- Reset dirty state: `git checkout -- .` (only uncommitted changes)
|
||||
- Fetch latest: `git fetch origin $PRIMARY_BRANCH`
|
||||
|
||||
## Auto-fixable by Supervisor
|
||||
- **Merge conflict on approved PR**: rebase onto $PRIMARY_BRANCH and force-push
|
||||
```bash
|
||||
cd /tmp/${PROJECT_NAME}-worktree-<issue> || git worktree add /tmp/${PROJECT_NAME}-worktree-<issue> <branch>
|
||||
cd /tmp/${PROJECT_NAME}-worktree-<issue>
|
||||
git fetch origin $PRIMARY_BRANCH
|
||||
git rebase origin/$PRIMARY_BRANCH
|
||||
# If conflict is trivial (NatSpec, comments): resolve and continue
|
||||
# If conflict is code logic: escalate to Clawy
|
||||
git push origin <branch> --force
|
||||
```
|
||||
- **Stale rebase**: `git rebase --abort && git checkout $PRIMARY_BRANCH`
|
||||
- **Wrong branch**: `git checkout $PRIMARY_BRANCH`
|
||||
|
||||
## Dangerous (escalate)
|
||||
- `git reset --hard` on any branch with unpushed work
|
||||
- Deleting remote branches
|
||||
- Force-pushing to any branch
|
||||
- Anything on the $PRIMARY_BRANCH branch directly
|
||||
|
||||
## Known Issues
|
||||
- Main repo MUST be on $PRIMARY_BRANCH at all times. Dev work happens in worktrees.
|
||||
- Stale rebases (detached HEAD) break all worktree creation — silent pipeline stall.
|
||||
- `git worktree add` fails if target directory exists (even empty). Remove first.
|
||||
- Many old branches exist locally (100+). Normal — don't bulk-delete.
|
||||
|
||||
## Evolution Pipeline
|
||||
- The evolution pipeline (`tools/push3-evolution/evolve.sh`) temporarily modifies
|
||||
`onchain/src/OptimizerV3.sol` and `onchain/src/OptimizerV3Push3.sol` during runs.
|
||||
- **DO NOT revert these files while evolution is running** (check: `pgrep -f evolve.sh`).
|
||||
- If `/tmp/evolution.pid` exists and the PID is alive, the dirty state is intentional.
|
||||
- Evolution will restore the files when it finishes.
|
||||
|
||||
## Lessons Learned
|
||||
- NEVER delete remote branches before confirming merge. Close PR, rebase locally, force-push if needed.
|
||||
- Stale rebase caused 5h pipeline stall once (2026-03-11). Auto-heal added to dev-agent.
|
||||
- lint-staged hooks fail when `forge` not in PATH. Use `--no-verify` when committing from scripts.
|
||||
|
||||
### PR #608 Post-Mortem (2026-03-12/13)
|
||||
PR sat blocked for 24 hours while 21 other PRs merged. Root causes:
|
||||
1. **Supervisor didn't detect merge conflicts** — only checked CI state, not `mergeable`. Fixed: now checks `mergeable=false` as first condition.
|
||||
2. **Supervisor didn't detect stale REQUEST_CHANGES** — review bot requested changes, dev-agent never came back to fix them, moved on to other issues. Need: detect "PR has REQUEST_CHANGES older than N hours with no new push."
|
||||
3. **No staleness kill switch** — after N merge conflicts or N days, a PR should be auto-closed and the issue reopened for a fresh attempt. Rebasing across 21 commits is more work than starting over.
|
||||
|
||||
**Rules derived:**
|
||||
- Supervisor should close PRs that are >24h old with merge conflicts and no recent activity. Reopen the parent issue with a note pointing to the closed PR as prior art.
|
||||
- Dev-agent must not abandon a PR with REQUEST_CHANGES — either fix or close it before moving to new work.
|
||||
29
supervisor/best-practices/memory.md
Normal file
29
supervisor/best-practices/memory.md
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
# Memory Best Practices
|
||||
|
||||
## Environment
|
||||
- VPS: 8GB RAM, 4GB swap, Debian
|
||||
- Running: Docker stack (8 containers), Woodpecker CI, OpenClaw gateway
|
||||
|
||||
## Safe Fixes (no permission needed)
|
||||
- Kill stale `claude` processes (>3h old): `pgrep -f "claude" --older 10800 | xargs kill`
|
||||
- Drop filesystem caches: `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
|
||||
- Restart bloated Anvil: `sudo docker restart ${PROJECT_NAME}-anvil-1` (grows to 12GB+ over hours)
|
||||
- Kill orphan node processes from dead worktrees
|
||||
|
||||
## Dangerous (escalate)
|
||||
- `docker system prune -a --volumes` — kills CI images, hours to rebuild
|
||||
- Stopping project stack containers — breaks dev environment
|
||||
- OOM that survives all safe fixes — needs human decision on what to kill
|
||||
|
||||
## Known Memory Hogs
|
||||
- `claude` processes from dev-agent: 200MB+ each, can zombie
|
||||
- `dockerd`: 600MB+ baseline (normal)
|
||||
- `openclaw-gateway`: 500MB+ (normal)
|
||||
- Anvil container: starts small, grows unbounded over hours
|
||||
- `forge build` with via_ir: can spike to 4GB+. Use `--skip test script` to reduce.
|
||||
- Vite dev servers inside containers: 150MB+ each
|
||||
|
||||
## Lessons Learned
|
||||
- After killing processes, always `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
|
||||
- Swap doesn't drain from dropping caches alone — it's actual paged-out process memory
|
||||
- Running CI + full project stack = 14+ containers on 8GB. Only one pipeline at a time.
|
||||
30
supervisor/best-practices/review-agent.md
Normal file
30
supervisor/best-practices/review-agent.md
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
# Review Agent Best Practices
|
||||
|
||||
## Architecture
|
||||
- `review-poll.sh` (cron */10) → finds open PRs with CI pass + no review → spawns `review-pr.sh`
|
||||
- `review-pr.sh` uses `claude -p` to review the diff, posts structured comment
|
||||
- Uses `review_bot` Codeberg account for formal reviews (separate from main account)
|
||||
- Skips WIP/draft PRs (`[WIP]` in title or draft flag)
|
||||
|
||||
## Safe Fixes
|
||||
- Manually trigger review: `bash ${FACTORY_ROOT}/review/review-pr.sh <pr-number>`
|
||||
- Force re-review: `bash ${FACTORY_ROOT}/review/review-pr.sh <pr-number> --force`
|
||||
- Check review log: `tail -20 ${FACTORY_ROOT}/review/review.log`
|
||||
|
||||
## Common Failures
|
||||
- **"SKIP: CI=failure"** — review bot won't review until CI passes. Fix CI first.
|
||||
- **"already reviewed"** — bot checks `<!-- reviewed: SHA -->` comment marker. Use `--force` to override.
|
||||
- **Review error comment** — uses `<!-- review-error: SHA -->` marker, does NOT count as reviewed. Bot should retry automatically.
|
||||
- **Self-narration collapse** — bot sometimes narrates instead of producing structured JSON. JSON output format in the prompt prevents this.
|
||||
- **Hallucinated findings** — bot may flag non-issues. This needs Clawy's judgment — escalate.
|
||||
|
||||
## Monitoring
|
||||
- Unreviewed PRs with CI pass for >1h → supervisor-poll.sh auto-triggers review
|
||||
- Review errors should resolve on next poll cycle
|
||||
- If same PR fails review 3+ times → likely a prompt issue, escalate
|
||||
|
||||
## Lessons Learned
|
||||
- Review bot must output JSON — prevents self-narration collapse
|
||||
- DISCUSS verdict should be treated same as REQUEST_CHANGES by dev-agent
|
||||
- Error comments must NOT include `<!-- reviewed: SHA -->` — would falsely mark as reviewed
|
||||
- Review bot uses Codeberg formal reviews API — branch protection requires different user than PR author
|
||||
366
supervisor/supervisor-poll.sh
Executable file
366
supervisor/supervisor-poll.sh
Executable file
|
|
@ -0,0 +1,366 @@
|
|||
#!/usr/bin/env bash
|
||||
# supervisor-poll.sh — Supervisor agent: bash checks + claude -p for fixes
|
||||
#
|
||||
# Runs every 10min via cron. Does all health checks in bash (zero tokens).
|
||||
# Only invokes claude -p when auto-fix fails or issue is complex.
|
||||
#
|
||||
# Cron: */10 * * * * /path/to/disinto/supervisor/supervisor-poll.sh
|
||||
#
|
||||
# Peek: cat /tmp/supervisor-status
|
||||
# Log: tail -f /path/to/disinto/supervisor/supervisor.log
|
||||
|
||||
source "$(dirname "$0")/../lib/env.sh"
|
||||
|
||||
LOGFILE="${FACTORY_ROOT}/supervisor/supervisor.log"
|
||||
STATUSFILE="/tmp/supervisor-status"
|
||||
LOCKFILE="/tmp/supervisor-poll.lock"
|
||||
PROMPT_FILE="${FACTORY_ROOT}/supervisor/PROMPT.md"
|
||||
|
||||
# Prevent overlapping runs
|
||||
if [ -f "$LOCKFILE" ]; then
|
||||
LOCK_PID=$(cat "$LOCKFILE" 2>/dev/null)
|
||||
if kill -0 "$LOCK_PID" 2>/dev/null; then
|
||||
exit 0
|
||||
fi
|
||||
rm -f "$LOCKFILE"
|
||||
fi
|
||||
echo $$ > "$LOCKFILE"
|
||||
trap 'rm -f "$LOCKFILE" "$STATUSFILE"' EXIT
|
||||
|
||||
flog() {
|
||||
printf '[%s] %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" >> "$LOGFILE"
|
||||
}
|
||||
|
||||
status() {
|
||||
printf '[%s] supervisor: %s\n' "$(date -u '+%Y-%m-%d %H:%M:%S UTC')" "$*" > "$STATUSFILE"
|
||||
flog "$*"
|
||||
}
|
||||
|
||||
# ── Check for escalation replies from Matrix ──────────────────────────────
|
||||
ESCALATION_REPLY=""
|
||||
if [ -s /tmp/supervisor-escalation-reply ]; then
|
||||
ESCALATION_REPLY=$(cat /tmp/supervisor-escalation-reply)
|
||||
rm -f /tmp/supervisor-escalation-reply
|
||||
flog "Got escalation reply: $(echo "$ESCALATION_REPLY" | head -1)"
|
||||
fi
|
||||
|
||||
# Alerts by priority
|
||||
P0_ALERTS=""
|
||||
P1_ALERTS=""
|
||||
P2_ALERTS=""
|
||||
P3_ALERTS=""
|
||||
P4_ALERTS=""
|
||||
|
||||
p0() { P0_ALERTS="${P0_ALERTS}• [P0] $*\n"; flog "P0: $*"; }
|
||||
p1() { P1_ALERTS="${P1_ALERTS}• [P1] $*\n"; flog "P1: $*"; }
|
||||
p2() { P2_ALERTS="${P2_ALERTS}• [P2] $*\n"; flog "P2: $*"; }
|
||||
p3() { P3_ALERTS="${P3_ALERTS}• [P3] $*\n"; flog "P3: $*"; }
|
||||
p4() { P4_ALERTS="${P4_ALERTS}• [P4] $*\n"; flog "P4: $*"; }
|
||||
|
||||
FIXES=""
|
||||
fixed() { FIXES="${FIXES}• ✅ $*\n"; flog "FIXED: $*"; }
|
||||
|
||||
# =============================================================================
|
||||
# P0: MEMORY — check first, fix first
|
||||
# =============================================================================
|
||||
status "P0: checking memory"
|
||||
|
||||
AVAIL_MB=$(free -m | awk '/Mem:/{print $7}')
|
||||
SWAP_USED_MB=$(free -m | awk '/Swap:/{print $3}')
|
||||
|
||||
if [ "${AVAIL_MB:-9999}" -lt 500 ] || { [ "${SWAP_USED_MB:-0}" -gt 3000 ] && [ "${AVAIL_MB:-9999}" -lt 2000 ]; }; then
|
||||
flog "MEMORY CRISIS: avail=${AVAIL_MB}MB swap_used=${SWAP_USED_MB}MB — auto-fixing"
|
||||
|
||||
# Kill stale agent-spawned claude processes (>3h old) — skip interactive sessions
|
||||
STALE_CLAUDES=$(pgrep -f "claude -p" --older 10800 2>/dev/null || true)
|
||||
if [ -n "$STALE_CLAUDES" ]; then
|
||||
echo "$STALE_CLAUDES" | xargs kill 2>/dev/null || true
|
||||
fixed "Killed stale claude processes: ${STALE_CLAUDES}"
|
||||
fi
|
||||
|
||||
# Drop filesystem caches
|
||||
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null 2>&1
|
||||
fixed "Dropped filesystem caches"
|
||||
|
||||
# Restart Anvil if it's bloated (>1GB RSS)
|
||||
ANVIL_CONTAINER="${ANVIL_CONTAINER:-${PROJECT_NAME}-anvil-1}"
|
||||
ANVIL_RSS=$(sudo docker stats "$ANVIL_CONTAINER" --no-stream --format '{{.MemUsage}}' 2>/dev/null | grep -oP '^\S+' | head -1 || echo "0")
|
||||
if echo "$ANVIL_RSS" | grep -qP '\dGiB'; then
|
||||
sudo docker restart "$ANVIL_CONTAINER" >/dev/null 2>&1 && fixed "Restarted bloated Anvil (${ANVIL_RSS})"
|
||||
fi
|
||||
|
||||
# Re-check after fixes
|
||||
AVAIL_MB_AFTER=$(free -m | awk '/Mem:/{print $7}')
|
||||
SWAP_AFTER=$(free -m | awk '/Swap:/{print $3}')
|
||||
|
||||
if [ "${AVAIL_MB_AFTER:-0}" -lt 500 ] || [ "${SWAP_AFTER:-0}" -gt 3000 ]; then
|
||||
p0 "Memory still critical after auto-fix: avail=${AVAIL_MB_AFTER}MB swap=${SWAP_AFTER}MB"
|
||||
else
|
||||
flog "Memory recovered: avail=${AVAIL_MB_AFTER}MB swap=${SWAP_AFTER}MB"
|
||||
fi
|
||||
fi
|
||||
|
||||
# =============================================================================
|
||||
# P1: DISK
|
||||
# =============================================================================
|
||||
status "P1: checking disk"
|
||||
|
||||
DISK_PERCENT=$(df -h / | awk 'NR==2{print $5}' | tr -d '%')
|
||||
|
||||
if [ "${DISK_PERCENT:-0}" -gt 80 ]; then
|
||||
flog "DISK PRESSURE: ${DISK_PERCENT}% — auto-cleaning"
|
||||
|
||||
# Docker cleanup (safe — keeps images)
|
||||
sudo docker system prune -f >/dev/null 2>&1 && fixed "Docker prune"
|
||||
|
||||
# Truncate supervisor logs >10MB
|
||||
for logfile in "${FACTORY_ROOT}"/{dev,review,factory}/*.log; do
|
||||
if [ -f "$logfile" ]; then
|
||||
SIZE_KB=$(du -k "$logfile" 2>/dev/null | cut -f1)
|
||||
if [ "${SIZE_KB:-0}" -gt 10240 ]; then
|
||||
truncate -s 0 "$logfile"
|
||||
fixed "Truncated $(basename "$logfile") (was ${SIZE_KB}KB)"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
# Clean old worktrees
|
||||
IDLE_WORKTREES=$(find /tmp/${PROJECT_NAME}-worktree-* -maxdepth 0 -mmin +360 2>/dev/null || true)
|
||||
if [ -n "$IDLE_WORKTREES" ]; then
|
||||
cd "${PROJECT_REPO_ROOT}" && git worktree prune 2>/dev/null
|
||||
for wt in $IDLE_WORKTREES; do
|
||||
# Only remove if dev-agent is not running on it
|
||||
ISSUE_NUM=$(basename "$wt" | sed "s/${PROJECT_NAME}-worktree-//")
|
||||
if ! pgrep -f "dev-agent.sh ${ISSUE_NUM}" >/dev/null 2>&1; then
|
||||
rm -rf "$wt" && fixed "Removed stale worktree: $wt"
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
# Woodpecker log_entries cleanup
|
||||
LOG_ENTRIES_MB=$(wpdb -c "SELECT pg_size_pretty(pg_total_relation_size('log_entries'));" 2>/dev/null | xargs)
|
||||
if echo "$LOG_ENTRIES_MB" | grep -qP '\d+\s*(GB|MB)'; then
|
||||
SIZE_NUM=$(echo "$LOG_ENTRIES_MB" | grep -oP '\d+')
|
||||
SIZE_UNIT=$(echo "$LOG_ENTRIES_MB" | grep -oP '(GB|MB)')
|
||||
if [ "$SIZE_UNIT" = "GB" ] || { [ "$SIZE_UNIT" = "MB" ] && [ "$SIZE_NUM" -gt 500 ]; }; then
|
||||
wpdb -c "DELETE FROM log_entries WHERE id < (SELECT max(id) - 100000 FROM log_entries);" 2>/dev/null
|
||||
fixed "Trimmed Woodpecker log_entries (was ${LOG_ENTRIES_MB})"
|
||||
fi
|
||||
fi
|
||||
|
||||
DISK_AFTER=$(df -h / | awk 'NR==2{print $5}' | tr -d '%')
|
||||
if [ "${DISK_AFTER:-0}" -gt 80 ]; then
|
||||
p1 "Disk still ${DISK_AFTER}% after auto-clean"
|
||||
else
|
||||
flog "Disk recovered: ${DISK_AFTER}%"
|
||||
fi
|
||||
fi
|
||||
|
||||
# =============================================================================
|
||||
# P2: FACTORY STOPPED — CI, dev-agent, git
|
||||
# =============================================================================
|
||||
status "P2: checking pipeline"
|
||||
|
||||
# CI stuck
|
||||
STUCK_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='running' AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs || true)
|
||||
[ "${STUCK_CI:-0}" -gt 0 ] 2>/dev/null && p2 "CI: ${STUCK_CI} pipeline(s) running >20min"
|
||||
|
||||
PENDING_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='pending' AND EXTRACT(EPOCH FROM now() - to_timestamp(created)) > 1800;" 2>/dev/null | xargs || true)
|
||||
[ "${PENDING_CI:-0}" -gt 0 ] && p2 "CI: ${PENDING_CI} pipeline(s) pending >30min"
|
||||
|
||||
# Dev-agent health
|
||||
DEV_LOCK="/tmp/dev-agent.lock"
|
||||
if [ -f "$DEV_LOCK" ]; then
|
||||
DEV_PID=$(cat "$DEV_LOCK" 2>/dev/null)
|
||||
if ! kill -0 "$DEV_PID" 2>/dev/null; then
|
||||
rm -f "$DEV_LOCK"
|
||||
fixed "Removed stale dev-agent lock (PID ${DEV_PID} dead)"
|
||||
else
|
||||
DEV_STATUS_AGE=$(stat -c %Y /tmp/dev-agent-status 2>/dev/null || echo 0)
|
||||
NOW_EPOCH=$(date +%s)
|
||||
STATUS_AGE_MIN=$(( (NOW_EPOCH - DEV_STATUS_AGE) / 60 ))
|
||||
if [ "$STATUS_AGE_MIN" -gt 30 ]; then
|
||||
p2 "Dev-agent: status unchanged for ${STATUS_AGE_MIN}min"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# Git repo health
|
||||
cd "${PROJECT_REPO_ROOT}" 2>/dev/null || true
|
||||
GIT_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
|
||||
GIT_REBASE=$([ -d .git/rebase-merge ] || [ -d .git/rebase-apply ] && echo "yes" || echo "no")
|
||||
|
||||
if [ "$GIT_REBASE" = "yes" ]; then
|
||||
git rebase --abort 2>/dev/null && git checkout "${PRIMARY_BRANCH}" 2>/dev/null && \
|
||||
fixed "Aborted stale rebase, switched to ${PRIMARY_BRANCH}" || \
|
||||
p2 "Git: stale rebase, auto-abort failed"
|
||||
fi
|
||||
if [ "$GIT_BRANCH" != "${PRIMARY_BRANCH}" ] && [ "$GIT_BRANCH" != "unknown" ]; then
|
||||
git checkout "${PRIMARY_BRANCH}" 2>/dev/null && \
|
||||
fixed "Switched main repo from '${GIT_BRANCH}' to ${PRIMARY_BRANCH}" || \
|
||||
p2 "Git: on '${GIT_BRANCH}' instead of ${PRIMARY_BRANCH}"
|
||||
fi
|
||||
|
||||
# =============================================================================
|
||||
# P2b: FACTORY STALLED — backlog exists but no agent running
|
||||
# =============================================================================
|
||||
status "P2: checking pipeline stall"
|
||||
|
||||
BACKLOG_COUNT=$(codeberg_api GET "/issues?state=open&labels=backlog&type=issues&limit=1" 2>/dev/null | jq -r 'length' 2>/dev/null || echo "0")
|
||||
IN_PROGRESS=$(codeberg_api GET "/issues?state=open&labels=in-progress&type=issues&limit=1" 2>/dev/null | jq -r 'length' 2>/dev/null || echo "0")
|
||||
|
||||
if [ "${BACKLOG_COUNT:-0}" -gt 0 ] && [ "${IN_PROGRESS:-0}" -eq 0 ]; then
|
||||
# Backlog exists but nothing in progress — check if dev-agent ran recently
|
||||
DEV_LOG="${FACTORY_ROOT}/dev/dev-agent.log"
|
||||
if [ -f "$DEV_LOG" ]; then
|
||||
LAST_LOG_EPOCH=$(stat -c %Y "$DEV_LOG" 2>/dev/null || echo 0)
|
||||
else
|
||||
LAST_LOG_EPOCH=0
|
||||
fi
|
||||
NOW_EPOCH=$(date +%s)
|
||||
IDLE_MIN=$(( (NOW_EPOCH - LAST_LOG_EPOCH) / 60 ))
|
||||
|
||||
if [ "$IDLE_MIN" -gt 20 ]; then
|
||||
p2 "Pipeline stalled: ${BACKLOG_COUNT} backlog issue(s), no agent ran for ${IDLE_MIN}min"
|
||||
fi
|
||||
fi
|
||||
|
||||
# =============================================================================
|
||||
# P3: FACTORY DEGRADED — derailed PRs, unreviewed PRs
|
||||
# =============================================================================
|
||||
status "P3: checking PRs"
|
||||
|
||||
OPEN_PRS=$(codeberg_api GET "/pulls?state=open&limit=10" 2>/dev/null | jq -r '.[].number' 2>/dev/null || true)
|
||||
for pr in $OPEN_PRS; do
|
||||
PR_JSON=$(codeberg_api GET "/pulls/${pr}" 2>/dev/null || true)
|
||||
[ -z "$PR_JSON" ] && continue
|
||||
PR_SHA=$(echo "$PR_JSON" | jq -r '.head.sha // ""')
|
||||
[ -z "$PR_SHA" ] && continue
|
||||
|
||||
CI_STATE=$(codeberg_api GET "/commits/${PR_SHA}/status" 2>/dev/null | jq -r '.state // "unknown"' 2>/dev/null || true)
|
||||
|
||||
# Check for merge conflicts first (approved + CI pass but unmergeable)
|
||||
MERGEABLE=$(echo "$PR_JSON" | jq -r '.mergeable // true')
|
||||
if [ "$MERGEABLE" = "false" ] && [ "$CI_STATE" = "success" ]; then
|
||||
p3 "PR #${pr}: CI pass but merge conflict — needs rebase"
|
||||
elif [ "$CI_STATE" = "failure" ] || [ "$CI_STATE" = "error" ]; then
|
||||
UPDATED=$(echo "$PR_JSON" | jq -r '.updated_at // ""')
|
||||
if [ -n "$UPDATED" ]; then
|
||||
UPDATED_EPOCH=$(date -d "$UPDATED" +%s 2>/dev/null || echo 0)
|
||||
NOW_EPOCH=$(date +%s)
|
||||
AGE_MIN=$(( (NOW_EPOCH - UPDATED_EPOCH) / 60 ))
|
||||
[ "$AGE_MIN" -gt 30 ] && p3 "PR #${pr}: CI=${CI_STATE}, stale ${AGE_MIN}min"
|
||||
fi
|
||||
elif [ "$CI_STATE" = "success" ]; then
|
||||
# Check if reviewed at this SHA
|
||||
HAS_REVIEW=$(codeberg_api GET "/issues/${pr}/comments?limit=50" 2>/dev/null | \
|
||||
jq -r --arg sha "$PR_SHA" '[.[] | select(.body | contains("<!-- reviewed: " + $sha))] | length' 2>/dev/null || echo "0")
|
||||
|
||||
if [ "${HAS_REVIEW:-0}" -eq 0 ]; then
|
||||
UPDATED=$(echo "$PR_JSON" | jq -r '.updated_at // ""')
|
||||
if [ -n "$UPDATED" ]; then
|
||||
UPDATED_EPOCH=$(date -d "$UPDATED" +%s 2>/dev/null || echo 0)
|
||||
NOW_EPOCH=$(date +%s)
|
||||
AGE_MIN=$(( (NOW_EPOCH - UPDATED_EPOCH) / 60 ))
|
||||
if [ "$AGE_MIN" -gt 60 ]; then
|
||||
p3 "PR #${pr}: CI passed, no review for ${AGE_MIN}min"
|
||||
# Auto-trigger review
|
||||
bash "${FACTORY_ROOT}/review/review-pr.sh" "$pr" >> "${FACTORY_ROOT}/review/review.log" 2>&1 &
|
||||
fixed "Auto-triggered review for PR #${pr}"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
# =============================================================================
|
||||
# P4: HOUSEKEEPING — stale processes
|
||||
# =============================================================================
|
||||
# Check for dev-agent escalations
|
||||
ESCALATION_FILE="${FACTORY_ROOT}/supervisor/escalations.jsonl"
|
||||
if [ -s "$ESCALATION_FILE" ]; then
|
||||
ESCALATION_COUNT=$(wc -l < "$ESCALATION_FILE")
|
||||
p3 "Dev-agent escalated ${ESCALATION_COUNT} issue(s) — see ${ESCALATION_FILE}"
|
||||
fi
|
||||
|
||||
status "P4: housekeeping"
|
||||
|
||||
# Stale agent-spawned claude processes (>3h, not caught by P0) — skip interactive sessions
|
||||
STALE_CLAUDES=$(pgrep -f "claude -p" --older 10800 2>/dev/null || true)
|
||||
if [ -n "$STALE_CLAUDES" ]; then
|
||||
echo "$STALE_CLAUDES" | xargs kill 2>/dev/null || true
|
||||
fixed "Killed stale claude processes: $(echo $STALE_CLAUDES | wc -w) procs"
|
||||
fi
|
||||
|
||||
# Clean stale git worktrees (>2h, no active agent)
|
||||
NOW_TS=$(date +%s)
|
||||
for wt in /tmp/${PROJECT_NAME}-worktree-* /tmp/${PROJECT_NAME}-review-*; do
|
||||
[ -d "$wt" ] || continue
|
||||
WT_AGE_MIN=$(( (NOW_TS - $(stat -c %Y "$wt")) / 60 ))
|
||||
if [ "$WT_AGE_MIN" -gt 120 ]; then
|
||||
# Skip if an agent is still using it
|
||||
WT_BASE=$(basename "$wt")
|
||||
if ! pgrep -f "$WT_BASE" >/dev/null 2>&1; then
|
||||
git -C "$PROJECT_REPO_ROOT" worktree remove --force "$wt" 2>/dev/null && \
|
||||
fixed "Removed stale worktree: $wt (${WT_AGE_MIN}min old)" || true
|
||||
fi
|
||||
fi
|
||||
done
|
||||
git -C "$PROJECT_REPO_ROOT" worktree prune 2>/dev/null || true
|
||||
|
||||
# Rotate supervisor log if >5MB
|
||||
for logfile in "${FACTORY_ROOT}"/{dev,review,factory}/*.log; do
|
||||
if [ -f "$logfile" ]; then
|
||||
SIZE_KB=$(du -k "$logfile" 2>/dev/null | cut -f1)
|
||||
if [ "${SIZE_KB:-0}" -gt 5120 ]; then
|
||||
mv "$logfile" "${logfile}.old" 2>/dev/null
|
||||
fixed "Rotated $(basename "$logfile")"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
# =============================================================================
|
||||
# RESULT
|
||||
# =============================================================================
|
||||
|
||||
ALL_ALERTS="${P0_ALERTS}${P1_ALERTS}${P2_ALERTS}${P3_ALERTS}${P4_ALERTS}"
|
||||
|
||||
if [ -n "$ALL_ALERTS" ]; then
|
||||
ALERT_TEXT=$(echo -e "$ALL_ALERTS")
|
||||
|
||||
# Notify Matrix
|
||||
matrix_send "supervisor" "⚠️ Supervisor alerts:
|
||||
${ALERT_TEXT}" 2>/dev/null || true
|
||||
|
||||
flog "Invoking claude -p for alerts"
|
||||
|
||||
CLAUDE_PROMPT="$(cat "$PROMPT_FILE" 2>/dev/null || echo "You are a supervisor agent. Fix the issue below.")
|
||||
|
||||
## Current Alerts
|
||||
${ALERT_TEXT}
|
||||
|
||||
## Auto-fixes already applied by bash
|
||||
$(echo -e "${FIXES:-None}")
|
||||
|
||||
## System State
|
||||
RAM: $(free -m | awk '/Mem:/{printf "avail=%sMB", $7}') $(free -m | awk '/Swap:/{printf "swap=%sMB", $3}')
|
||||
Disk: $(df -h / | awk 'NR==2{printf "%s used of %s (%s)", $3, $2, $5}')
|
||||
Docker: $(sudo docker ps --format '{{.Names}}' 2>/dev/null | wc -l) containers running
|
||||
Claude procs: $(pgrep -f "claude" 2>/dev/null | wc -l)
|
||||
|
||||
$(if [ -n "$ESCALATION_REPLY" ]; then echo "
|
||||
## Human Response to Previous Escalation
|
||||
${ESCALATION_REPLY}
|
||||
|
||||
Act on this response."; fi)
|
||||
|
||||
Fix what you can. Escalate what you can't. Read the relevant best-practices file first."
|
||||
|
||||
CLAUDE_OUTPUT=$(timeout 300 claude -p --model sonnet --dangerously-skip-permissions \
|
||||
"$CLAUDE_PROMPT" 2>&1) || true
|
||||
flog "claude output: $(echo "$CLAUDE_OUTPUT" | tail -20)"
|
||||
status "claude responded"
|
||||
else
|
||||
[ -n "$FIXES" ] && flog "Housekeeping: $(echo -e "$FIXES")"
|
||||
status "all clear"
|
||||
fi
|
||||
47
supervisor/update-prompt.sh
Executable file
47
supervisor/update-prompt.sh
Executable file
|
|
@ -0,0 +1,47 @@
|
|||
#!/usr/bin/env bash
|
||||
# update-prompt.sh — Append a lesson to a best-practices file
|
||||
#
|
||||
# Usage:
|
||||
# ./supervisor/update-prompt.sh "best-practices/memory.md" "### Title\nBody text"
|
||||
# ./supervisor/update-prompt.sh --from-file "best-practices/memory.md" /tmp/lesson.md
|
||||
#
|
||||
# Called by claude -p when it learns something during a fix.
|
||||
# Commits and pushes the update to the disinto repo.
|
||||
|
||||
source "$(dirname "$0")/../lib/env.sh"
|
||||
|
||||
TARGET_FILE="${FACTORY_ROOT}/supervisor/$1"
|
||||
shift
|
||||
|
||||
if [ "$1" = "--from-file" ] && [ -f "$2" ]; then
|
||||
LESSON=$(cat "$2")
|
||||
elif [ -n "$1" ]; then
|
||||
LESSON="$1"
|
||||
else
|
||||
echo "Usage: update-prompt.sh <relative-path> '<lesson text>'" >&2
|
||||
echo " or: update-prompt.sh <relative-path> --from-file <path>" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -f "$TARGET_FILE" ]; then
|
||||
echo "Target file not found: $TARGET_FILE" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Append under "Lessons Learned" section if it exists, otherwise at end
|
||||
if grep -q "## Lessons Learned" "$TARGET_FILE"; then
|
||||
echo "" >> "$TARGET_FILE"
|
||||
echo "$LESSON" >> "$TARGET_FILE"
|
||||
else
|
||||
echo "" >> "$TARGET_FILE"
|
||||
echo "## Lessons Learned" >> "$TARGET_FILE"
|
||||
echo "" >> "$TARGET_FILE"
|
||||
echo "$LESSON" >> "$TARGET_FILE"
|
||||
fi
|
||||
|
||||
cd "$FACTORY_ROOT"
|
||||
git add "supervisor/$1" 2>/dev/null || git add "$TARGET_FILE"
|
||||
git commit -m "supervisor: learned — $(echo "$LESSON" | head -1 | sed 's/^#* *//')" --no-verify 2>/dev/null
|
||||
git push origin main 2>/dev/null
|
||||
|
||||
log "Updated $(basename "$TARGET_FILE") with new lesson"
|
||||
Loading…
Add table
Add a link
Reference in a new issue