refactor: make all scripts multi-project via env vars

Replace hardcoded harb references across the entire codebase:
- HARB_REPO_ROOT → PROJECT_REPO_ROOT (with deprecated alias)
- Derive PROJECT_NAME from CODEBERG_REPO slug
- Add PRIMARY_BRANCH (master/main), WOODPECKER_REPO_ID env vars
- Parameterize worktree prefixes, docker container names, branch refs
- Genericize agent prompts (gardener, factory supervisor)
- Update best-practices docs to use $-vars, prefix harb lessons

All project-specific values now flow from .env → lib/env.sh → scripts.
Backward-compatible: existing harb setups work without .env changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
johba 2026-03-14 13:49:09 +01:00
parent f16df6c53e
commit 90ef03a304
16 changed files with 117 additions and 116 deletions

View file

@ -4,15 +4,15 @@
- Woodpecker CI at localhost:8000 (Docker backend)
- Postgres DB: use `wpdb` helper from env.sh
- Woodpecker API: use `woodpecker_api` helper from env.sh
- CI images: pre-built at `registry.niovi.voyage/harb/*:latest`
- Example (harb): CI images pre-built at `registry.niovi.voyage/harb/*:latest`
## Safe Fixes
- Retrigger CI: push empty commit to PR branch
```bash
cd /tmp/harb-worktree-<issue> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
cd /tmp/${PROJECT_NAME}-worktree-<issue> && git commit --allow-empty -m "ci: retrigger" --no-verify && git push origin <branch> --force
```
- Restart woodpecker-agent: `sudo systemctl restart woodpecker-agent`
- View pipeline status: `wpdb -c "SELECT number, status FROM pipelines WHERE repo_id=2 ORDER BY number DESC LIMIT 5;"`
- View pipeline status: `wpdb -c "SELECT number, status FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID ORDER BY number DESC LIMIT 5;"`
- View failed steps: `bash ${FACTORY_ROOT}/lib/ci-debug.sh failures <pipeline-number>`
- View step logs: `bash ${FACTORY_ROOT}/lib/ci-debug.sh logs <pipeline-number> <step-name>`
@ -23,7 +23,7 @@
## Known Issues
- Codeberg rate-limits SSH clones. `git` step fails with exit 128. Retrigger usually works.
- `log_entries` table grows fast (was 5.6GB once). Truncate periodically.
- Running CI + harb stack = 14+ containers on 8GB. Memory pressure is real.
- Example (harb): Running CI + harb stack = 14+ containers on 8GB. Memory pressure is real.
- CI images take hours to rebuild. Never run `docker system prune -a`.
## Lessons Learned
@ -31,10 +31,10 @@
- Exit code 137 = OOM kill. Check memory, kill stale processes, retrigger.
- `node-quality` step fails on eslint/typescript errors — these need code fixes, not CI fixes.
### FEE_DEST address must match DeployLocal.sol
### Example (harb): FEE_DEST address must match DeployLocal.sol
When DeployLocal.sol changes the feeDest address, bootstrap-common.sh must also be updated.
Current feeDest = keccak256('harb.local.feeDest') = 0x8A9145E1Ea4C4d7FB08cF1011c8ac1F0e10F9383.
Symptom: bootstrap step exits 1 after 'Granting recenter access to deployer' with no error — setRecenterAccess reverts because wrong address is impersonated.
### keccak-derived FEE_DEST requires anvil_setBalance before impersonation
### Example (harb): keccak-derived FEE_DEST requires anvil_setBalance before impersonation
When FEE_DEST is a keccak-derived address (e.g. keccak256('harb.local.feeDest')), it has zero ETH balance. Any function that calls `anvil_impersonateAccount` then `cast send --from $FEE_DEST --unlocked` will fail silently (output redirected to LOG_FILE) but exit 1 due to gas deduction failure. Fix: add `cast rpc anvil_setBalance "$FEE_DEST" "0xDE0B6B3A7640000"` before impersonation. Applied in both bootstrap-common.sh and red-team.sh.

View file

@ -10,8 +10,8 @@ Codeberg rate-limits SSH and HTTPS clones. Symptoms:
- **Do NOT retrigger** during a rate-limit storm. Wait 10-15 minutes.
- Check if multiple pipelines failed on `git` step recently:
```bash
wpdb -c "SELECT number, status, to_timestamp(started) FROM pipelines WHERE repo_id=2 AND status='failure' ORDER BY number DESC LIMIT 5;"
wpdb -c "SELECT s.name, s.exit_code FROM steps s JOIN pipelines p ON s.pipeline_id=p.id WHERE p.number=<N> AND p.repo_id=2 AND s.state='failure';"
wpdb -c "SELECT number, status, to_timestamp(started) FROM pipelines WHERE repo_id=$WOODPECKER_REPO_ID AND status='failure' ORDER BY number DESC LIMIT 5;"
wpdb -c "SELECT s.name, s.exit_code FROM steps s JOIN pipelines p ON s.pipeline_id=p.id WHERE p.number=<N> AND p.repo_id=$WOODPECKER_REPO_ID AND s.state='failure';"
```
- If multiple `git` failures with exit 128 in the last 15 min → it's rate limiting. Wait.
- Only retrigger after 15+ minutes of no CI activity.

View file

@ -5,13 +5,13 @@
- `dev-agent.sh` uses `claude -p` for implementation, runs in git worktree
- Lock file: `/tmp/dev-agent.lock` (contains PID)
- Status file: `/tmp/dev-agent-status`
- Worktrees: `/tmp/harb-worktree-<issue-number>/`
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue-number>/`
## Safe Fixes
- Remove stale lock: `rm -f /tmp/dev-agent.lock` (only if PID is dead)
- Kill stuck agent: `kill <pid>` then clean lock
- Restart on derailed PR: `bash ${FACTORY_ROOT}/dev/dev-agent.sh <issue-number> &`
- Clean worktree: `cd /home/debian/harb && git worktree remove /tmp/harb-worktree-<N> --force`
- Clean worktree: `cd $PROJECT_REPO_ROOT && git worktree remove /tmp/${PROJECT_NAME}-worktree-<N> --force`
- Remove `in-progress` label if agent died without cleanup:
```bash
codeberg_api DELETE "/issues/<N>/labels/in-progress"
@ -38,7 +38,7 @@
## Dependency Resolution
**Trust closed state.** If a dependency issue is closed, the code is on master. Period.
**Trust closed state.** If a dependency issue is closed, the code is on the primary branch. Period.
DO NOT try to find the specific PR that closed an issue. This is over-engineering that causes false negatives:
- Codeberg shares issue/PR numbering — no guaranteed relationship

View file

@ -3,14 +3,14 @@
## Safe Fixes
- Docker cleanup: `sudo docker system prune -f` (keeps images, removes stopped containers + dangling layers)
- Truncate factory logs >5MB: `truncate -s 0 <file>`
- Remove stale worktrees: check `/tmp/harb-worktree-*`, only if dev-agent not running on them
- Remove stale worktrees: check `/tmp/${PROJECT_NAME}-worktree-*`, only if dev-agent not running on them
- Woodpecker log_entries: `DELETE FROM log_entries WHERE id < (SELECT max(id) - 100000 FROM log_entries);` then `VACUUM;`
- Node module caches in worktrees: `rm -rf /tmp/harb-worktree-*/node_modules/`
- Git garbage collection: `cd /home/debian/harb && git gc --prune=now`
- Node module caches in worktrees: `rm -rf /tmp/${PROJECT_NAME}-worktree-*/node_modules/`
- Git garbage collection: `cd $PROJECT_REPO_ROOT && git gc --prune=now`
## Dangerous (escalate)
- `docker system prune -a --volumes` — deletes ALL images including CI build cache
- Deleting anything in `/home/debian/harb/` that's tracked by git
- Deleting anything in `$PROJECT_REPO_ROOT/` that's tracked by git
- Truncating Woodpecker DB tables other than log_entries
## Known Disk Hogs

View file

@ -1,39 +1,39 @@
# Git Best Practices
## Environment
- Repo: `/home/debian/harb`, remote: `codeberg.org/johba/harb`
- Branch: `master` (protected — no direct push, PRs only)
- Worktrees: `/tmp/harb-worktree-<issue>/`
- Repo: `$PROJECT_REPO_ROOT`, remote: `$PROJECT_REMOTE`
- Branch: `$PRIMARY_BRANCH` (protected — no direct push, PRs only)
- Worktrees: `/tmp/${PROJECT_NAME}-worktree-<issue>/`
## Safe Fixes
- Abort stale rebase: `cd /home/debian/harb && git rebase --abort`
- Switch to master: `git checkout master`
- Abort stale rebase: `cd $PROJECT_REPO_ROOT && git rebase --abort`
- Switch to $PRIMARY_BRANCH: `git checkout $PRIMARY_BRANCH`
- Prune worktrees: `git worktree prune`
- Reset dirty state: `git checkout -- .` (only uncommitted changes)
- Fetch latest: `git fetch origin master`
- Fetch latest: `git fetch origin $PRIMARY_BRANCH`
## Auto-fixable by Supervisor
- **Merge conflict on approved PR**: rebase onto master and force-push
- **Merge conflict on approved PR**: rebase onto $PRIMARY_BRANCH and force-push
```bash
cd /tmp/harb-worktree-<issue> || git worktree add /tmp/harb-worktree-<issue> <branch>
cd /tmp/harb-worktree-<issue>
git fetch origin master
git rebase origin/master
cd /tmp/${PROJECT_NAME}-worktree-<issue> || git worktree add /tmp/${PROJECT_NAME}-worktree-<issue> <branch>
cd /tmp/${PROJECT_NAME}-worktree-<issue>
git fetch origin $PRIMARY_BRANCH
git rebase origin/$PRIMARY_BRANCH
# If conflict is trivial (NatSpec, comments): resolve and continue
# If conflict is code logic: escalate to Clawy
git push origin <branch> --force
```
- **Stale rebase**: `git rebase --abort && git checkout master`
- **Wrong branch**: `git checkout master`
- **Stale rebase**: `git rebase --abort && git checkout $PRIMARY_BRANCH`
- **Wrong branch**: `git checkout $PRIMARY_BRANCH`
## Dangerous (escalate)
- `git reset --hard` on any branch with unpushed work
- Deleting remote branches
- Force-pushing to any branch
- Anything on the master branch directly
- Anything on the $PRIMARY_BRANCH branch directly
## Known Issues
- Main repo MUST be on master at all times. Dev work happens in worktrees.
- Main repo MUST be on $PRIMARY_BRANCH at all times. Dev work happens in worktrees.
- Stale rebases (detached HEAD) break all worktree creation — silent factory stall.
- `git worktree add` fails if target directory exists (even empty). Remove first.
- Many old branches exist locally (100+). Normal — don't bulk-delete.

View file

@ -7,12 +7,12 @@
## Safe Fixes (no permission needed)
- Kill stale `claude` processes (>3h old): `pgrep -f "claude" --older 10800 | xargs kill`
- Drop filesystem caches: `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
- Restart bloated Anvil: `sudo docker restart harb-anvil-1` (grows to 12GB+ over hours)
- Restart bloated Anvil: `sudo docker restart ${PROJECT_NAME}-anvil-1` (grows to 12GB+ over hours)
- Kill orphan node processes from dead worktrees
## Dangerous (escalate)
- `docker system prune -a --volumes` — kills CI images, hours to rebuild
- Stopping harb stack containers — breaks dev environment
- Stopping project stack containers — breaks dev environment
- OOM that survives all safe fixes — needs human decision on what to kill
## Known Memory Hogs
@ -26,4 +26,4 @@
## Lessons Learned
- After killing processes, always `sync && echo 3 | sudo tee /proc/sys/vm/drop_caches`
- Swap doesn't drain from dropping caches alone — it's actual paged-out process memory
- Running CI + full harb stack = 14+ containers on 8GB. Only one pipeline at a time.
- Running CI + full project stack = 14+ containers on 8GB. Only one pipeline at a time.