refactor: make all scripts multi-project via env vars

Replace hardcoded harb references across the entire codebase:
- HARB_REPO_ROOT → PROJECT_REPO_ROOT (with deprecated alias)
- Derive PROJECT_NAME from CODEBERG_REPO slug
- Add PRIMARY_BRANCH (master/main), WOODPECKER_REPO_ID env vars
- Parameterize worktree prefixes, docker container names, branch refs
- Genericize agent prompts (gardener, factory supervisor)
- Update best-practices docs to use $-vars, prefix harb lessons

All project-specific values now flow from .env → lib/env.sh → scripts.
Backward-compatible: existing harb setups work without .env changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
johba 2026-03-14 13:49:09 +01:00
parent f16df6c53e
commit 90ef03a304
16 changed files with 117 additions and 116 deletions

View file

@ -1,6 +1,6 @@
# Factory Supervisor
You are the factory supervisor for `johba/harb`. You were called because
You are the factory supervisor for `$CODEBERG_REPO`. You were called because
`factory-poll.sh` detected an issue it couldn't auto-fix.
## Priority Order
@ -34,9 +34,11 @@ source ${FACTORY_ROOT}/lib/env.sh
This gives you:
- `codeberg_api GET "/pulls?state=open"` — Codeberg API (uses $CODEBERG_TOKEN)
- `wpdb -c "SELECT ..."` — Woodpecker Postgres (uses $WOODPECKER_DB_PASSWORD)
- `woodpecker_api "/repos/2/pipelines"` — Woodpecker REST API (uses $WOODPECKER_TOKEN)
- `woodpecker_api "/repos/$WOODPECKER_REPO_ID/pipelines"` — Woodpecker REST API (uses $WOODPECKER_TOKEN)
- `$REVIEW_BOT_TOKEN` — for posting reviews as the review_bot account
- `$HARB_REPO_ROOT` — path to the harb repo
- `$PROJECT_REPO_ROOT` — path to the target project repo
- `$PROJECT_NAME` — short project name (for worktree prefixes, container names)
- `$PRIMARY_BRANCH` — main branch (master or main)
- `$FACTORY_ROOT` — path to the dark-factory repo
## Escalation

View file

@ -4,15 +4,15 @@
- Woodpecker CI at localhost:8000 (Docker backend)
- Postgres DB: use `wpdb` helper from env.sh
- Woodpecker API: use `woodpecker_api` helper from env.sh
- CI images: pre-built at `registry.niovi.voyage/harb/*:latest`
- Example (harb): CI images pre-built at `registry.niovi.voyage/harb/*:latest`
## Safe Fixes
- Retrigger CI: push empty commit to PR branch
```bash
cd /tmp/harb-worktree-<issue> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
cd /tmp/${PROJECT_NAME}-worktree-<issue> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
```
- Restart woodpecker-agent: `sudo systemctl restart woodpecker-agent`
- View pipeline status: `wpdb -c "SELECT number, status FROM pipelines WHERE repo_id=2 ORDER BY number DESC LIMIT 5;"`
- View pipeline status: `wpdb -c "SELECT number, status FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID ORDER BY number DESC LIMIT 5;"`
- View failed steps: `bash ${FACTORY_ROOT}/lib/ci-debug.sh failures <pipeline-number>`
- View step logs: `bash ${FACTORY_ROOT}/lib/ci-debug.sh logs <pipeline-number> <step-name>`
@ -23,7 +23,7 @@
## Known Issues
- Codeberg rate-limits SSH clones. `git` step fails with exit 128. Retrigger usually works.
- `log_entries` table grows fast (was 5.6GB once). Truncate periodically.
- Running CI + harb stack = 14+ containers on 8GB. Memory pressure is real.
- Example (harb): Running CI + harb stack = 14+ containers on 8GB. Memory pressure is real.
- CI images take hours to rebuild. Never run `docker system prune -a`.
## Lessons Learned
@ -31,10 +31,10 @@
- Exit code 137 = OOM kill. Check memory, kill stale processes, retrigger.
- `node-quality` step fails on eslint/typescript errors — these need code fixes, not CI fixes.
### FEE_DEST address must match DeployLocal.sol
### Example (harb): FEE_DEST address must match DeployLocal.sol
When DeployLocal.sol changes the feeDest address, bootstrap-common.sh must also be updated.
Current feeDest = keccak256('harb.local.feeDest') = 0x8A9145E1Ea4C4d7FB08cF1011c8ac1F0e10F9383.
Symptom: bootstrap step exits 1 after 'Granting recenter access to deployer' with no error — setRecenterAccess reverts because wrong address is impersonated.
### keccak-derived FEE_DEST requires anvil_setBalance before impersonation
### Example (harb): keccak-derived FEE_DEST requires anvil_setBalance before impersonation
When FEE_DEST is a keccak-derived address (e.g. keccak256('harb.local.feeDest')), it has zero ETH balance. Any function that calls `anvil_impersonateAccount` then `cast send --from $FEE_DEST --unlocked` will fail silently (output redirected to LOG_FILE) but exit 1 due to gas deduction failure. Fix: add `cast rpc anvil_setBalance "$FEE_DEST" "0xDE0B6B3A7640000"` before impersonation. Applied in both bootstrap-common.sh and red-team.sh.

View file

@ -10,8 +10,8 @@ Codeberg rate-limits SSH and HTTPS clones. Symptoms:
- **Do NOT retrigger** during a rate-limit storm. Wait 10-15 minutes.
- Check if multiple pipelines failed on `git` step recently:
```bash
wpdb -c "SELECT number, status, to_timestamp(started) FROM pipelines WHERE repo_id=2 AND status='failure' ORDER BY number DESC LIMIT 5;"
wpdb -c "SELECT s.name, s.exit_code FROM steps s JOIN pipelines p ON s.pipeline_id=p.id WHERE p.number=<N> AND p.repo_id=2 AND s.state='failure';"
wpdb -c "SELECT number, status, to_timestamp(started) FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID AND status='failure' ORDER BY number DESC LIMIT 5;"
wpdb -c "SELECT s.name, s.exit_code FROM steps s JOIN pipelines p ON s.pipeline_id=p.id WHERE p.number=<N> AND p.repo_id=$WOODPECKER_REPO_ID AND s.state='failure';"
```
- If multiple `git` failures with exit 128 in the last 15 min → it's rate limiting. Wait.
- Only retrigger after 15+ minutes of no CI activity.

View file

@ -5,13 +5,13 @@
- `dev-agent.sh` uses `claude -p` for implementation, runs in git worktree
- Lock file: `/tmp/dev-agent.lock` (contains PID)
- Status file: `/tmp/dev-agent-status`
- Worktrees: `/tmp/harb-worktree-<issue-number>/`
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue-number>/`
## Safe Fixes
- Remove stale lock: `rm -f /tmp/dev-agent.lock` (only if PID is dead)
- Kill stuck agent: `kill <pid>` then clean lock
- Restart on derailed PR: `bash ${FACTORY_ROOT}/dev/dev-agent.sh <issue-number> &`
- Clean worktree: `cd /home/debian/harb && git worktree remove /tmp/harb-worktree-<N> --force`
- Clean worktree: `cd $PROJECT_REPO_ROOT && git worktree remove /tmp/${PROJECT_NAME}-worktree-<N> --force`
- Remove `in-progress` label if agent died without cleanup:
```bash
codeberg_api DELETE "/issues/<N>/labels/in-progress"
@ -38,7 +38,7 @@
## Dependency Resolution
**Trust closed state.** If a dependency issue is closed, the code is on master. Period.
**Trust closed state.** If a dependency issue is closed, the code is on the primary branch. Period.
DO NOT try to find the specific PR that closed an issue. This is over-engineering that causes false negatives:
- Codeberg shares issue/PR numbering — no guaranteed relationship

View file

@ -3,14 +3,14 @@
## Safe Fixes
- Docker cleanup: `sudo docker system prune -f` (keeps images, removes stopped containers + dangling layers)
- Truncate factory logs >5MB: `truncate -s 0 <file>`
- Remove stale worktrees: check `/tmp/harb-worktree-*`, only if dev-agent not running on them
- Remove stale worktrees: check `/tmp/${PROJECT_NAME}-worktree-*`, only if dev-agent not running on them
- Woodpecker log_entries: `DELETE FROM log_entries WHERE id < (SELECT max(id) - 100000 FROM log_entries);` then `VACUUM;`
- Node module caches in worktrees: `rm -rf /tmp/harb-worktree-*/node_modules/`
- Git garbage collection: `cd /home/debian/harb && git gc --prune=now`
- Node module caches in worktrees: `rm -rf /tmp/${PROJECT_NAME}-worktree-*/node_modules/`
- Git garbage collection: `cd $PROJECT_REPO_ROOT && git gc --prune=now`
## Dangerous (escalate)
- `docker system prune -a --volumes` — deletes ALL images including CI build cache
- Deleting anything in `/home/debian/harb/` that's tracked by git
- Deleting anything in `$PROJECT_REPO_ROOT/` that's tracked by git
- Truncating Woodpecker DB tables other than log_entries
## Known Disk Hogs

View file

@ -1,39 +1,39 @@
# Git Best Practices
## Environment
- Repo: `/home/debian/harb`, remote: `codeberg.org/johba/harb`
- Branch: `master` (protected — no direct push, PRs only)
- Worktrees: `/tmp/harb-worktree-<issue>/`
- Repo: `$PROJECT_REPO_ROOT`, remote: `$PROJECT_REMOTE`
- Branch: `$PRIMARY_BRANCH` (protected — no direct push, PRs only)
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue>/`
## Safe Fixes
- Abort stale rebase: `cd /home/debian/harb && git rebase --abort`
- Switch to master: `git checkout master`
- Abort stale rebase: `cd $PROJECT_REPO_ROOT && git rebase --abort`
- Switch to $PRIMARY_BRANCH: `git checkout $PRIMARY_BRANCH`
- Prune worktrees: `git worktree prune`
- Reset dirty state: `git checkout -- .` (only uncommitted changes)
- Fetch latest: `git fetch origin master`
- Fetch latest: `git fetch origin $PRIMARY_BRANCH`
## Auto-fixable by Supervisor
- **Merge conflict on approved PR**: rebase onto master and force-push
- **Merge conflict on approved PR**: rebase onto $PRIMARY_BRANCH and force-push
```bash
cd /tmp/harb-worktree-<issue> || git worktree add /tmp/harb-worktree-<issue> <branch>
cd /tmp/harb-worktree-<issue>
git fetch origin master
git rebase origin/master
cd /tmp/${PROJECT_NAME}-worktree-<issue> || git worktree add /tmp/${PROJECT_NAME}-worktree-<issue> <branch>
cd /tmp/${PROJECT_NAME}-worktree-<issue>
git fetch origin $PRIMARY_BRANCH
git rebase origin/$PRIMARY_BRANCH
# If conflict is trivial (NatSpec, comments): resolve and continue
# If conflict is code logic: escalate to Clawy
git push origin <branch> --force
```
- **Stale rebase**: `git rebase --abort && git checkout master`
- **Wrong branch**: `git checkout master`
- **Stale rebase**: `git rebase --abort && git checkout $PRIMARY_BRANCH`
- **Wrong branch**: `git checkout $PRIMARY_BRANCH`
## Dangerous (escalate)
- `git reset --hard` on any branch with unpushed work
- Deleting remote branches
- Force-pushing to any branch
- Anything on the master branch directly
- Anything on the $PRIMARY_BRANCH branch directly
## Known Issues
- Main repo MUST be on master at all times. Dev work happens in worktrees.
- Main repo MUST be on $PRIMARY_BRANCH at all times. Dev work happens in worktrees.
- Stale rebases (detached HEAD) break all worktree creation — silent factory stall.
- `git worktree add` fails if target directory exists (even empty). Remove first.
- Many old branches exist locally (100+). Normal — don't bulk-delete.

View file

@ -7,12 +7,12 @@
## Safe Fixes (no permission needed)
- Kill stale `claude` processes (>3h old): `pgrep -f "claude" --older 10800 | xargs kill`
- Drop filesystem caches: `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
- Restart bloated Anvil: `sudo docker restart harb-anvil-1` (grows to 12GB+ over hours)
- Restart bloated Anvil: `sudo docker restart ${PROJECT_NAME}-anvil-1` (grows to 12GB+ over hours)
- Kill orphan node processes from dead worktrees
## Dangerous (escalate)
- `docker system prune -a --volumes` — kills CI images, hours to rebuild
- Stopping harb stack containers — breaks dev environment
- Stopping project stack containers — breaks dev environment
- OOM that survives all safe fixes — needs human decision on what to kill
## Known Memory Hogs
@ -26,4 +26,4 @@
## Lessons Learned
- After killing processes, always `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
- Swap doesn't drain from dropping caches alone — it's actual paged-out process memory
- Running CI + full harb stack = 14+ containers on 8GB. Only one pipeline at a time.
- Running CI + full project stack = 14+ containers on 8GB. Only one pipeline at a time.

View file

@ -75,9 +75,10 @@ if [ "${AVAIL_MB:-9999}" -lt 500 ] || { [ "${SWAP_USED_MB:-0}" -gt 3000 ] && [ "
fixed "Dropped filesystem caches"
# Restart Anvil if it's bloated (>1GB RSS)
ANVIL_RSS=$(sudo docker stats harb-anvil-1 --no-stream --format '{{.MemUsage}}' 2>/dev/null | grep -oP '^\S+' | head -1 || echo "0")
ANVIL_CONTAINER="${ANVIL_CONTAINER:-${PROJECT_NAME}-anvil-1}"
ANVIL_RSS=$(sudo docker stats "$ANVIL_CONTAINER" --no-stream --format '{{.MemUsage}}' 2>/dev/null | grep -oP '^\S+' | head -1 || echo "0")
if echo "$ANVIL_RSS" | grep -qP '\dGiB'; then
sudo docker restart harb-anvil-1 >/dev/null 2>&1 && fixed "Restarted bloated Anvil (${ANVIL_RSS})"
sudo docker restart "$ANVIL_CONTAINER" >/dev/null 2>&1 && fixed "Restarted bloated Anvil (${ANVIL_RSS})"
fi
# Re-check after fixes
@ -116,12 +117,12 @@ if [ "${DISK_PERCENT:-0}" -gt 80 ]; then
done
# Clean old worktrees
IDLE_WORKTREES=$(find /tmp/harb-worktree-* -maxdepth 0 -mmin +360 2>/dev/null || true)
IDLE_WORKTREES=$(find /tmp/${PROJECT_NAME}-worktree-* -maxdepth 0 -mmin +360 2>/dev/null || true)
if [ -n "$IDLE_WORKTREES" ]; then
cd "${HARB_REPO_ROOT}" && git worktree prune 2>/dev/null
cd "${PROJECT_REPO_ROOT}" && git worktree prune 2>/dev/null
for wt in $IDLE_WORKTREES; do
# Only remove if dev-agent is not running on it
ISSUE_NUM=$(basename "$wt" | sed 's/harb-worktree-//')
ISSUE_NUM=$(basename "$wt" | sed "s/${PROJECT_NAME}-worktree-//")
if ! pgrep -f "dev-agent.sh ${ISSUE_NUM}" >/dev/null 2>&1; then
rm -rf "$wt" && fixed "Removed stale worktree: $wt"
fi
@ -153,10 +154,10 @@ fi
status "P2: checking factory"
# CI stuck
STUCK_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=2 AND status='running' AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs)
STUCK_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='running' AND EXTRACT(EPOCH FROM now() - to_timestamp(started)) > 1200;" 2>/dev/null | xargs)
[ "${STUCK_CI:-0}" -gt 0 ] && p2 "CI: ${STUCK_CI} pipeline(s) running >20min"
PENDING_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=2 AND status='pending' AND EXTRACT(EPOCH FROM now() - to_timestamp(created)) > 1800;" 2>/dev/null | xargs)
PENDING_CI=$(wpdb -c "SELECT count(*) FROM pipelines WHERE repo_id=${WOODPECKER_REPO_ID} AND status='pending' AND EXTRACT(EPOCH FROM now() - to_timestamp(created)) > 1800;" 2>/dev/null | xargs)
[ "${PENDING_CI:-0}" -gt 0 ] && p2 "CI: ${PENDING_CI} pipeline(s) pending >30min"
# Dev-agent health
@ -177,19 +178,19 @@ if [ -f "$DEV_LOCK" ]; then
fi
# Git repo health
cd "${HARB_REPO_ROOT}" 2>/dev/null || true
cd "${PROJECT_REPO_ROOT}" 2>/dev/null || true
GIT_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
GIT_REBASE=$([ -d .git/rebase-merge ] || [ -d .git/rebase-apply ] && echo "yes" || echo "no")
if [ "$GIT_REBASE" = "yes" ]; then
git rebase --abort 2>/dev/null && git checkout master 2>/dev/null && \
fixed "Aborted stale rebase, switched to master" || \
git rebase --abort 2>/dev/null && git checkout "${PRIMARY_BRANCH}" 2>/dev/null && \
fixed "Aborted stale rebase, switched to ${PRIMARY_BRANCH}" || \
p2 "Git: stale rebase, auto-abort failed"
fi
if [ "$GIT_BRANCH" != "master" ] && [ "$GIT_BRANCH" != "unknown" ]; then
git checkout master 2>/dev/null && \
fixed "Switched main repo from '${GIT_BRANCH}' to master" || \
p2 "Git: on '${GIT_BRANCH}' instead of master"
if [ "$GIT_BRANCH" != "${PRIMARY_BRANCH}" ] && [ "$GIT_BRANCH" != "unknown" ]; then
git checkout "${PRIMARY_BRANCH}" 2>/dev/null && \
fixed "Switched main repo from '${GIT_BRANCH}' to ${PRIMARY_BRANCH}" || \
p2 "Git: on '${GIT_BRANCH}' instead of ${PRIMARY_BRANCH}"
fi
# =============================================================================