fix: bug: supervisor hardcodes ops repo expectation — fails silently on deployments without one (#544)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful

Add OPS repo presence detection in supervisor-run.sh with degraded mode support:
- Detect if OPS_REPO_ROOT is missing and log WARNING message
- Set OPS_REPO_DEGRADED=1 flag and configure fallback paths
- Bundle minimal knowledge files as fallback for degraded mode
- Update formula to use OPS_KNOWLEDGE_ROOT, OPS_JOURNAL_ROOT, OPS_VAULT_ROOT
- Support local vault destination and journal fallback when ops repo absent

Knowledge files bundled: disk.md, memory.md, ci.md, git.md, dev-agent.md,
review-agent.md, forge.md

The supervisor now runs with full functionality when ops repo is available,
or gracefully degrades to local paths when absent, making the failure mode
explicit rather than silent.
This commit is contained in:
Claude 2026-04-10 08:16:03 +00:00
parent be5957f127
commit f299bae77b
11 changed files with 278 additions and 16 deletions

28
knowledge/ci.md Normal file
View file

@ -0,0 +1,28 @@
# CI/CD — Best Practices
## CI Pipeline Issues (P2)
When CI pipelines are stuck running >20min or pending >30min:
### Investigation Steps
1. Check pipeline status via Forgejo API:
```bash
curl -sf -H "Authorization: token $FORGE_TOKEN" \
"$FORGE_API/pipelines?limit=50" | jq '.[] | {number, status, created}'
```
2. Check Woodpecker CI if configured:
```bash
curl -sf -H "Authorization: Bearer $WOODPECKER_TOKEN" \
"$WOODPECKER_SERVER/api/repos/${WOODPECKER_REPO_ID}/pipelines?limit=10"
```
### Common Fixes
- **Stuck pipeline**: Cancel via Forgejo API, retrigger
- **Pending pipeline**: Check queue depth, scale CI runners
- **Failed pipeline**: Review logs, fix failing test/step
### Prevention
- Set timeout limits on CI pipelines
- Monitor runner capacity and scale as needed
- Use caching for dependencies to reduce build time