refactor: make all scripts multi-project via env vars
Replace hardcoded harb references across the entire codebase: - HARB_REPO_ROOT → PROJECT_REPO_ROOT (with deprecated alias) - Derive PROJECT_NAME from CODEBERG_REPO slug - Add PRIMARY_BRANCH (master/main), WOODPECKER_REPO_ID env vars - Parameterize worktree prefixes, docker container names, branch refs - Genericize agent prompts (gardener, factory supervisor) - Update best-practices docs to use $-vars, prefix harb lessons All project-specific values now flow from .env → lib/env.sh → scripts. Backward-compatible: existing harb setups work without .env changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f16df6c53e
commit
90ef03a304
16 changed files with 117 additions and 116 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# Factory Supervisor
|
||||
|
||||
You are the factory supervisor for `johba/harb`. You were called because
|
||||
You are the factory supervisor for `$CODEBERG_REPO`. You were called because
|
||||
`factory-poll.sh` detected an issue it couldn't auto-fix.
|
||||
|
||||
## Priority Order
|
||||
|
|
@ -34,9 +34,11 @@ source ${FACTORY_ROOT}/lib/env.sh
|
|||
This gives you:
|
||||
- `codeberg_api GET "/pulls?state=open"` — Codeberg API (uses $CODEBERG_TOKEN)
|
||||
- `wpdb -c "SELECT ..."` — Woodpecker Postgres (uses $WOODPECKER_DB_PASSWORD)
|
||||
- `woodpecker_api "/repos/2/pipelines"` — Woodpecker REST API (uses $WOODPECKER_TOKEN)
|
||||
- `woodpecker_api "/repos/$WOODPECKER_REPO_ID/pipelines"` — Woodpecker REST API (uses $WOODPECKER_TOKEN)
|
||||
- `$REVIEW_BOT_TOKEN` — for posting reviews as the review_bot account
|
||||
- `$HARB_REPO_ROOT` — path to the harb repo
|
||||
- `$PROJECT_REPO_ROOT` — path to the target project repo
|
||||
- `$PROJECT_NAME` — short project name (for worktree prefixes, container names)
|
||||
- `$PRIMARY_BRANCH` — main branch (master or main)
|
||||
- `$FACTORY_ROOT` — path to the dark-factory repo
|
||||
|
||||
## Escalation
|
||||
|
|
|
|||
|
|
@ -4,15 +4,15 @@
|
|||
- Woodpecker CI at localhost:8000 (Docker backend)
|
||||
- Postgres DB: use `wpdb` helper from env.sh
|
||||
- Woodpecker API: use `woodpecker_api` helper from env.sh
|
||||
- CI images: pre-built at `registry.niovi.voyage/harb/*:latest`
|
||||
- Example (harb): CI images pre-built at `registry.niovi.voyage/harb/*:latest`
|
||||
|
||||
## Safe Fixes
|
||||
- Retrigger CI: push empty commit to PR branch
|
||||
```bash
|
||||
cd /tmp/harb-worktree-<issue> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
|
||||
cd /tmp/${PROJECT_NAME}-worktree-<issue> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
|
||||
```
|
||||
- Restart woodpecker-agent: `sudo systemctl restart woodpecker-agent`
|
||||
- View pipeline status: `wpdb -c "SELECT number, status FROM pipelines WHERE repo_id=2 ORDER BY number DESC LIMIT 5;"`
|
||||
- View pipeline status: `wpdb -c "SELECT number, status FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID ORDER BY number DESC LIMIT 5;"`
|
||||
- View failed steps: `bash ${FACTORY_ROOT}/lib/ci-debug.sh failures <pipeline-number>`
|
||||
- View step logs: `bash ${FACTORY_ROOT}/lib/ci-debug.sh logs <pipeline-number> <step-name>`
|
||||
|
||||
|
|
@ -23,7 +23,7 @@
|
|||
## Known Issues
|
||||
- Codeberg rate-limits SSH clones. `git` step fails with exit 128. Retrigger usually works.
|
||||
- `log_entries` table grows fast (was 5.6GB once). Truncate periodically.
|
||||
- Running CI + harb stack = 14+ containers on 8GB. Memory pressure is real.
|
||||
- Example (harb): Running CI + harb stack = 14+ containers on 8GB. Memory pressure is real.
|
||||
- CI images take hours to rebuild. Never run `docker system prune -a`.
|
||||
|
||||
## Lessons Learned
|
||||
|
|
@ -31,10 +31,10 @@
|
|||
- Exit code 137 = OOM kill. Check memory, kill stale processes, retrigger.
|
||||
- `node-quality` step fails on eslint/typescript errors — these need code fixes, not CI fixes.
|
||||
|
||||
### FEE_DEST address must match DeployLocal.sol
|
||||
### Example (harb): FEE_DEST address must match DeployLocal.sol
|
||||
When DeployLocal.sol changes the feeDest address, bootstrap-common.sh must also be updated.
|
||||
Current feeDest = keccak256('harb.local.feeDest') = 0x8A9145E1Ea4C4d7FB08cF1011c8ac1F0e10F9383.
|
||||
Symptom: bootstrap step exits 1 after 'Granting recenter access to deployer' with no error — setRecenterAccess reverts because wrong address is impersonated.
|
||||
|
||||
### keccak-derived FEE_DEST requires anvil_setBalance before impersonation
|
||||
### Example (harb): keccak-derived FEE_DEST requires anvil_setBalance before impersonation
|
||||
When FEE_DEST is a keccak-derived address (e.g. keccak256('harb.local.feeDest')), it has zero ETH balance. Any function that calls `anvil_impersonateAccount` then `cast send --from $FEE_DEST --unlocked` will fail silently (output redirected to LOG_FILE) but exit 1 due to gas deduction failure. Fix: add `cast rpc anvil_setBalance "$FEE_DEST" "0xDE0B6B3A7640000"` before impersonation. Applied in both bootstrap-common.sh and red-team.sh.
|
||||
|
|
|
|||
|
|
@ -10,8 +10,8 @@ Codeberg rate-limits SSH and HTTPS clones. Symptoms:
|
|||
- **Do NOT retrigger** during a rate-limit storm. Wait 10-15 minutes.
|
||||
- Check if multiple pipelines failed on `git` step recently:
|
||||
```bash
|
||||
wpdb -c "SELECT number, status, to_timestamp(started) FROM pipelines WHERE repo_id=2 AND status='failure' ORDER BY number DESC LIMIT 5;"
|
||||
wpdb -c "SELECT s.name, s.exit_code FROM steps s JOIN pipelines p ON s.pipeline_id=p.id WHERE p.number=<N> AND p.repo_id=2 AND s.state='failure';"
|
||||
wpdb -c "SELECT number, status, to_timestamp(started) FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID AND status='failure' ORDER BY number DESC LIMIT 5;"
|
||||
wpdb -c "SELECT s.name, s.exit_code FROM steps s JOIN pipelines p ON s.pipeline_id=p.id WHERE p.number=<N> AND p.repo_id=$WOODPECKER_REPO_ID AND s.state='failure';"
|
||||
```
|
||||
- If multiple `git` failures with exit 128 in the last 15 min → it's rate limiting. Wait.
|
||||
- Only retrigger after 15+ minutes of no CI activity.
|
||||
|
|
|
|||
|
|
@ -5,13 +5,13 @@
|
|||
- `dev-agent.sh` uses `claude -p` for implementation, runs in git worktree
|
||||
- Lock file: `/tmp/dev-agent.lock` (contains PID)
|
||||
- Status file: `/tmp/dev-agent-status`
|
||||
- Worktrees: `/tmp/harb-worktree-<issue-number>/`
|
||||
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue-number>/`
|
||||
|
||||
## Safe Fixes
|
||||
- Remove stale lock: `rm -f /tmp/dev-agent.lock` (only if PID is dead)
|
||||
- Kill stuck agent: `kill <pid>` then clean lock
|
||||
- Restart on derailed PR: `bash ${FACTORY_ROOT}/dev/dev-agent.sh <issue-number> &`
|
||||
- Clean worktree: `cd /home/debian/harb && git worktree remove /tmp/harb-worktree-<N> --force`
|
||||
- Clean worktree: `cd $PROJECT_REPO_ROOT && git worktree remove /tmp/${PROJECT_NAME}-worktree-<N> --force`
|
||||
- Remove `in-progress` label if agent died without cleanup:
|
||||
```bash
|
||||
codeberg_api DELETE "/issues/<N>/labels/in-progress"
|
||||
|
|
@ -38,7 +38,7 @@
|
|||
|
||||
## Dependency Resolution
|
||||
|
||||
**Trust closed state.** If a dependency issue is closed, the code is on master. Period.
|
||||
**Trust closed state.** If a dependency issue is closed, the code is on the primary branch. Period.
|
||||
|
||||
DO NOT try to find the specific PR that closed an issue. This is over-engineering that causes false negatives:
|
||||
- Codeberg shares issue/PR numbering — no guaranteed relationship
|
||||
|
|
|
|||
|
|
@ -3,14 +3,14 @@
|
|||
## Safe Fixes
|
||||
- Docker cleanup: `sudo docker system prune -f` (keeps images, removes stopped containers + dangling layers)
|
||||
- Truncate factory logs >5MB: `truncate -s 0 <file>`
|
||||
- Remove stale worktrees: check `/tmp/harb-worktree-*`, only if dev-agent not running on them
|
||||
- Remove stale worktrees: check `/tmp/${PROJECT_NAME}-worktree-*`, only if dev-agent not running on them
|
||||
- Woodpecker log_entries: `DELETE FROM log_entries WHERE id < (SELECT max(id) - 100000 FROM log_entries);` then `VACUUM;`
|
||||
- Node module caches in worktrees: `rm -rf /tmp/harb-worktree-*/node_modules/`
|
||||
- Git garbage collection: `cd /home/debian/harb && git gc --prune=now`
|
||||
- Node module caches in worktrees: `rm -rf /tmp/${PROJECT_NAME}-worktree-*/node_modules/`
|
||||
- Git garbage collection: `cd $PROJECT_REPO_ROOT && git gc --prune=now`
|
||||
|
||||
## Dangerous (escalate)
|
||||
- `docker system prune -a --volumes` — deletes ALL images including CI build cache
|
||||
- Deleting anything in `/home/debian/harb/` that's tracked by git
|
||||
- Deleting anything in `$PROJECT_REPO_ROOT/` that's tracked by git
|
||||
- Truncating Woodpecker DB tables other than log_entries
|
||||
|
||||
## Known Disk Hogs
|
||||
|
|
|
|||
|
|
@ -1,39 +1,39 @@
|
|||
# Git Best Practices
|
||||
|
||||
## Environment
|
||||
- Repo: `/home/debian/harb`, remote: `codeberg.org/johba/harb`
|
||||
- Branch: `master` (protected — no direct push, PRs only)
|
||||
- Worktrees: `/tmp/harb-worktree-<issue>/`
|
||||
- Repo: `$PROJECT_REPO_ROOT`, remote: `$PROJECT_REMOTE`
|
||||
- Branch: `$PRIMARY_BRANCH` (protected — no direct push, PRs only)
|
||||
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue>/`
|
||||
|
||||
## Safe Fixes
|
||||
- Abort stale rebase: `cd /home/debian/harb && git rebase --abort`
|
||||
- Switch to master: `git checkout master`
|
||||
- Abort stale rebase: `cd $PROJECT_REPO_ROOT && git rebase --abort`
|
||||
- Switch to $PRIMARY_BRANCH: `git checkout $PRIMARY_BRANCH`
|
||||
- Prune worktrees: `git worktree prune`
|
||||
- Reset dirty state: `git checkout -- .` (only uncommitted changes)
|
||||
- Fetch latest: `git fetch origin master`
|
||||
- Fetch latest: `git fetch origin $PRIMARY_BRANCH`
|
||||
|
||||
## Auto-fixable by Supervisor
|
||||
- **Merge conflict on approved PR**: rebase onto master and force-push
|
||||
- **Merge conflict on approved PR**: rebase onto $PRIMARY_BRANCH and force-push
|
||||
```bash
|
||||
cd /tmp/harb-worktree-<issue> || git worktree add /tmp/harb-worktree-<issue> <branch>
|
||||
cd /tmp/harb-worktree-<issue>
|
||||
git fetch origin master
|
||||
git rebase origin/master
|
||||
cd /tmp/${PROJECT_NAME}-worktree-<issue> || git worktree add /tmp/${PROJECT_NAME}-worktree-<issue> <branch>
|
||||
cd /tmp/${PROJECT_NAME}-worktree-<issue>
|
||||
git fetch origin $PRIMARY_BRANCH
|
||||
git rebase origin/$PRIMARY_BRANCH
|
||||
# If conflict is trivial (NatSpec, comments): resolve and continue
|
||||
# If conflict is code logic: escalate to Clawy
|
||||
git push origin <branch> --force
|
||||
```
|
||||
- **Stale rebase**: `git rebase --abort && git checkout master`
|
||||
- **Wrong branch**: `git checkout master`
|
||||
- **Stale rebase**: `git rebase --abort && git checkout $PRIMARY_BRANCH`
|
||||
- **Wrong branch**: `git checkout $PRIMARY_BRANCH`
|
||||
|
||||
## Dangerous (escalate)
|
||||
- `git reset --hard` on any branch with unpushed work
|
||||
- Deleting remote branches
|
||||
- Force-pushing to any branch
|
||||
- Anything on the master branch directly
|
||||
- Anything on the $PRIMARY_BRANCH branch directly
|
||||
|
||||
## Known Issues
|
||||
- Main repo MUST be on master at all times. Dev work happens in worktrees.
|
||||
- Main repo MUST be on $PRIMARY_BRANCH at all times. Dev work happens in worktrees.
|
||||
- Stale rebases (detached HEAD) break all worktree creation — silent factory stall.
|
||||
- `git worktree add` fails if target directory exists (even empty). Remove first.
|
||||
- Many old branches exist locally (100+). Normal — don't bulk-delete.
|
||||
|
|
|
|||
|
|
@ -7,12 +7,12 @@
|
|||
## Safe Fixes (no permission needed)
|
||||
- Kill stale `claude` processes (>3h old): `pgrep -f "claude" --older 10800 | xargs kill`
|
||||
- Drop filesystem caches: `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
|
||||
- Restart bloated Anvil: `sudo docker restart harb-anvil-1` (grows to 12GB+ over hours)
|
||||
- Restart bloated Anvil: `sudo docker restart ${PROJECT_NAME}-anvil-1` (grows to 12GB+ over hours)
|
||||
- Kill orphan node processes from dead worktrees
|
||||
|
||||
## Dangerous (escalate)
|
||||
- `docker system prune -a --volumes` — kills CI images, hours to rebuild
|
||||
- Stopping harb stack containers — breaks dev environment
|
||||
- Stopping project stack containers — breaks dev environment
|
||||
- OOM that survives all safe fixes — needs human decision on what to kill
|
||||
|
||||
## Known Memory Hogs
|
||||
|
|
@ -26,4 +26,4 @@
|
|||
## Lessons Learned
|
||||
- After killing processes, always `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
|
||||
- Swap doesn't drain from dropping caches alone — it's actual paged-out process memory
|
||||
- Running CI + full harb stack = 14+ containers on 8GB. Only one pipeline at a time.
|
||||
- Running CI + full project stack = 14+ containers on 8GB. Only one pipeline at a time.
|
||||
|
|
|
|||
|
|
@ -75,9 +75,10 @@ if [ "${AVAIL_MB:-9999}" -lt 500 ] || { [ "${SWAP_USED_MB:-0}" -gt 3000 ] && [ "
|
|||
fixed "Dropped filesystem caches"
|
||||
|
||||
# Restart Anvil if it's bloated (>1GB RSS)
|
||||
ANVIL_RSS=$(sudo docker stats harb-anvil-1 --no-stream --format '{{.MemUsage}}' 2>/dev/null | grep -oP '^\S+' | head -1 || echo "0")
|
||||
ANVIL_CONTAINER="${ANVIL_CONTAINER:-${PROJECT_NAME}-anvil-1}"
|
||||
ANVIL_RSS=$(sudo docker stats "$ANVIL_CONTAINER" --no-stream --format '{{.MemUsage}}' 2>/dev/null | grep -oP '^\S+' | head -1 || echo "0")
|
||||
if echo "$ANVIL_RSS" | grep -qP '\dGiB'; then
|
||||
sudo docker restart harb-anvil-1 >/dev/null 2>&1 && fixed "Restarted bloated Anvil (${ANVIL_RSS})"
|
||||
sudo docker restart "$ANVIL_CONTAINER" >/dev/null 2>&1 && fixed "Restarted bloated Anvil (${ANVIL_RSS})"
|
||||
fi
|
||||
|
||||
# Re-check after fixes
|
||||
|
|
@ -116,12 +117,12 @@ if [ "${DISK_PERCENT:-0}" -gt 80 ]; then
|
|||
done
|
||||
|
||||
# Clean old worktrees
|
||||
IDLE_WORKTREES=$(find /tmp/harb-worktree-* -maxdepth 0 -mmin +360 2>/dev/null || true)
|
||||
IDLE_WORKTREES=$(find /tmp/${PROJECT_NAME}-worktree-* -maxdepth 0 -mmin +360 2>/dev/null || true)
|
||||
if [ -n "$IDLE_WORKTREES" ]; then
|
||||
cd "${HARB_REPO_ROOT}" && git worktree prune 2>/dev/null
|
||||
cd "${PROJECT_REPO_ROOT}" && git worktree prune 2>/dev/null
|
||||
for wt in $IDLE_WORKTREES; do
|
||||
# Only remove if dev-agent is not running on it
|
||||
ISSUE_NUM=$(basename "$wt" | sed 's/harb-worktree-//')
|
||||
ISSUE_NUM=$(basename "$wt" | sed "s/${PROJECT_NAME}-worktree-//")
|
||||
if ! pgrep -f "dev-agent.sh ${ISSUE_NUM}" >/dev/null 2>&1; then
|
||||
rm -rf "$wt" && fixed "Removed stale worktree: $wt"
|
||||
fi
|
||||
|
|
@ -153,10 +154,10 @@ fi
|
|||
status "P2: checking factory"
|
||||
|
||||
# CI stuck
|
||||
STUCK_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=2 AND status='running' AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs)
|
||||
STUCK_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='running' AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs)
|
||||
[ "${STUCK_CI:-0}" -gt 0 ] && p2 "CI: ${STUCK_CI} pipeline(s) running >20min"
|
||||
|
||||
PENDING_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=2 AND status='pending' AND EXTRACT(EPOCH FROM now() - to_timestamp(created)) > 1800;" 2>/dev/null | xargs)
|
||||
PENDING_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='pending' AND EXTRACT(EPOCH FROM now() - to_timestamp(created)) > 1800;" 2>/dev/null | xargs)
|
||||
[ "${PENDING_CI:-0}" -gt 0 ] && p2 "CI: ${PENDING_CI} pipeline(s) pending >30min"
|
||||
|
||||
# Dev-agent health
|
||||
|
|
@ -177,19 +178,19 @@ if [ -f "$DEV_LOCK" ]; then
|
|||
fi
|
||||
|
||||
# Git repo health
|
||||
cd "${HARB_REPO_ROOT}" 2>/dev/null || true
|
||||
cd "${PROJECT_REPO_ROOT}" 2>/dev/null || true
|
||||
GIT_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
|
||||
GIT_REBASE=$([ -d .git/rebase-merge ] || [ -d .git/rebase-apply ] && echo "yes" || echo "no")
|
||||
|
||||
if [ "$GIT_REBASE" = "yes" ]; then
|
||||
git rebase --abort 2>/dev/null && git checkout master 2>/dev/null && \
|
||||
fixed "Aborted stale rebase, switched to master" || \
|
||||
git rebase --abort 2>/dev/null && git checkout "${PRIMARY_BRANCH}" 2>/dev/null && \
|
||||
fixed "Aborted stale rebase, switched to ${PRIMARY_BRANCH}" || \
|
||||
p2 "Git: stale rebase, auto-abort failed"
|
||||
fi
|
||||
if [ "$GIT_BRANCH" != "master" ] && [ "$GIT_BRANCH" != "unknown" ]; then
|
||||
git checkout master 2>/dev/null && \
|
||||
fixed "Switched main repo from '${GIT_BRANCH}' to master" || \
|
||||
p2 "Git: on '${GIT_BRANCH}' instead of master"
|
||||
if [ "$GIT_BRANCH" != "${PRIMARY_BRANCH}" ] && [ "$GIT_BRANCH" != "unknown" ]; then
|
||||
git checkout "${PRIMARY_BRANCH}" 2>/dev/null && \
|
||||
fixed "Switched main repo from '${GIT_BRANCH}' to ${PRIMARY_BRANCH}" || \
|
||||
p2 "Git: on '${GIT_BRANCH}' instead of ${PRIMARY_BRANCH}"
|
||||
fi
|
||||
|
||||
# =============================================================================
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue