fix: bug: supervisor hardcodes ops repo expectation — fails silently on deployments without one (#544)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful

Add OPS repo presence detection in supervisor-run.sh with degraded mode support:
- Detect if OPS_REPO_ROOT is missing and log WARNING message
- Set OPS_REPO_DEGRADED=1 flag and configure fallback paths
- Bundle minimal knowledge files as fallback for degraded mode
- Update formula to use OPS_KNOWLEDGE_ROOT, OPS_JOURNAL_ROOT, OPS_VAULT_ROOT
- Support local vault destination and journal fallback when ops repo absent

Knowledge files bundled: disk.md, memory.md, ci.md, git.md, dev-agent.md,
review-agent.md, forge.md

The supervisor now runs with full functionality when ops repo is available,
or gracefully degrades to local paths when absent, making the failure mode
explicit rather than silent.
This commit is contained in:
Claude 2026-04-10 08:16:03 +00:00
parent be5957f127
commit f299bae77b
11 changed files with 278 additions and 16 deletions

View file

@ -40,6 +40,12 @@ P3 (degraded PRs, circular deps, stale deps), P4 (housekeeping).
- `PRIMARY_BRANCH`, `CLAUDE_MODEL` (set to sonnet by supervisor-run.sh)
- `WOODPECKER_TOKEN`, `WOODPECKER_SERVER`, `WOODPECKER_DB_PASSWORD`, `WOODPECKER_DB_USER`, `WOODPECKER_DB_HOST`, `WOODPECKER_DB_NAME` — CI database queries
**Degraded mode (Issue #544)**: When `OPS_REPO_ROOT` is not set or the directory doesn't exist, the supervisor runs in degraded mode:
- Uses bundled knowledge files from `$FACTORY_ROOT/knowledge/` instead of ops repo playbooks
- Writes journal locally to `$FACTORY_ROOT/state/supervisor-journal/` (not committed to git)
- Files vault items locally to `$PROJECT_REPO_ROOT/vault/pending/`
- Logs a WARNING message at startup indicating degraded mode
**Lifecycle**: supervisor-run.sh (cron */20) → lock + memory guard → run
preflight.sh (collect metrics) → load formula + context → run claude -p via agent-sdk.sh
→ Claude assesses health, auto-fixes, writes journal → `PHASE:done`.

View file

@ -214,7 +214,9 @@ echo ""
echo "## Pending Vault Items"
_found_vault=false
for _vf in "${OPS_REPO_ROOT}/vault/pending/"*.md; do
# Use OPS_VAULT_ROOT if set (from supervisor-run.sh degraded mode detection), otherwise default to OPS_REPO_ROOT
_va_root="${OPS_VAULT_ROOT:-${OPS_REPO_ROOT}/vault/pending}"
for _vf in "${_va_root}"/*.md; do
[ -f "$_vf" ] || continue
_found_vault=true
_vtitle=$(grep -m1 '^# ' "$_vf" | sed 's/^# //' || basename "$_vf")

View file

@ -50,6 +50,26 @@ WORKTREE="/tmp/${PROJECT_NAME}-supervisor-run"
# shellcheck disable=SC2034 # consumed by agent-sdk.sh and env.sh log()
LOG_AGENT="supervisor"
# ── OPS Repo Detection (Issue #544) ──────────────────────────────────────
# Detect if OPS_REPO_ROOT is available and set degraded mode flag if not.
# This allows the supervisor to run with fallback knowledge files and
# local journal/vault paths when the ops repo is absent.
if [ -z "${OPS_REPO_ROOT:-}" ] || [ ! -d "${OPS_REPO_ROOT}" ]; then
log "WARNING: OPS_REPO_ROOT not set or directory missing — running in degraded mode (no playbooks, no journal continuity, no vault destination)"
export OPS_REPO_DEGRADED=1
# Set fallback paths for degraded mode
export OPS_KNOWLEDGE_ROOT="${FACTORY_ROOT}/knowledge"
export OPS_JOURNAL_ROOT="${FACTORY_ROOT}/state/supervisor-journal"
export OPS_VAULT_ROOT="${PROJECT_REPO_ROOT}/vault/pending"
mkdir -p "$OPS_JOURNAL_ROOT" "$OPS_VAULT_ROOT" 2>/dev/null || true
else
export OPS_REPO_DEGRADED=0
export OPS_KNOWLEDGE_ROOT="${OPS_REPO_ROOT}/knowledge"
export OPS_JOURNAL_ROOT="${OPS_REPO_ROOT}/journal/supervisor"
export OPS_VAULT_ROOT="${OPS_REPO_ROOT}/vault/pending"
mkdir -p "$OPS_JOURNAL_ROOT" "$OPS_VAULT_ROOT" 2>/dev/null || true
fi
# Override log() to append to supervisor-specific log file
# shellcheck disable=SC2034
log() {
@ -105,6 +125,25 @@ export CLAUDE_MODEL="sonnet"
# ── Create worktree (before prompt assembly so trap is set early) ────────
formula_worktree_setup "$WORKTREE"
# Inject OPS repo status into prompt
if [ "${OPS_REPO_DEGRADED:-0}" = "1" ]; then
OPS_STATUS="
## OPS Repo Status
**DEGRADED MODE**: OPS repo is not available. Using bundled knowledge files and local journal/vault paths.
- Knowledge files: ${OPS_KNOWLEDGE_ROOT:-<unset>}
- Journal: ${OPS_JOURNAL_ROOT:-<unset>}
- Vault destination: ${OPS_VAULT_ROOT:-<unset>}
"
else
OPS_STATUS="
## OPS Repo Status
**FULL MODE**: OPS repo available at ${OPS_REPO_ROOT}
- Knowledge files: ${OPS_KNOWLEDGE_ROOT:-<unset>}
- Journal: ${OPS_JOURNAL_ROOT:-<unset>}
- Vault destination: ${OPS_VAULT_ROOT:-<unset>}
"
fi
PROMPT="You are the supervisor agent for ${FORGE_REPO}. Work through the formula below.
You have full shell access and --dangerously-skip-permissions.
@ -117,6 +156,7 @@ ${PREFLIGHT_OUTPUT}
${CONTEXT_BLOCK}$(formula_lessons_block)
${SCRATCH_CONTEXT:+${SCRATCH_CONTEXT}
}
${OPS_STATUS}
Priority order: P0 memory > P1 disk > P2 stopped > P3 degraded > P4 housekeeping
${FORMULA_CONTENT}