fix: [nomad-prep] P3 — add load_secret() abstraction to lib/env.sh (#793)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/pr/smoke-init Pipeline failed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude 2026-04-15 19:15:50 +00:00
parent 1d4e28843e
commit 9dbc43ab23
3 changed files with 225 additions and 1 deletions

View file

@ -6,7 +6,7 @@ sourced as needed.
| File | What it provides | Sourced by |
|---|---|---|
| `lib/env.sh` | Loads `.env`, sets `FACTORY_ROOT`, exports project config (`FORGE_REPO`, `PROJECT_NAME`, etc.), defines `log()`, `forge_api()`, `forge_api_all()` (paginates all pages; accepts optional second TOKEN parameter, defaults to `$FORGE_TOKEN`; handles invalid/empty JSON responses gracefully — returns empty on parse error instead of crashing), `woodpecker_api()`, `wpdb()`, `memory_guard()` (skips agent if RAM < threshold). Auto-loads project TOML if `PROJECT_TOML` is set. Exports per-agent tokens (`FORGE_PLANNER_TOKEN`, `FORGE_GARDENER_TOKEN`, `FORGE_VAULT_TOKEN`, `FORGE_SUPERVISOR_TOKEN`, `FORGE_PREDICTOR_TOKEN`) each falls back to `$FORGE_TOKEN` if not set. **Vault-only token guard (AD-006)**: `unset GITHUB_TOKEN CLAWHUB_TOKEN` so agents never hold external-action tokens only the runner container receives them. **Container note**: when `DISINTO_CONTAINER=1`, `.env` is NOT re-sourced compose already injects env vars (including `FORGE_URL=http://forgejo:3000`) and re-sourcing would clobber them. **Save/restore scope (#364)**: only `FORGE_URL` is preserved across `.env` re-sourcing (compose injects `http://forgejo:3000`, `.env` has `http://localhost:3000`). `FORGE_TOKEN` is NOT preserved so refreshed tokens in `.env` take effect immediately. **Per-agent token override (#762)**: agent run scripts export `FORGE_TOKEN_OVERRIDE=<agent-specific-token>` BEFORE sourcing `env.sh`; `env.sh` applies this override at lines 98-100, ensuring the correct identity survives any re-sourcing of `env.sh` by nested shells or `claude -p` invocations. **Required env var**: `FORGE_PASS` bot password for git HTTP push (Forgejo 11.x rejects API tokens for `git push`, #361). **Hard preconditions (#674)**: `USER` and `HOME` must be exported by the entrypoint before sourcing. When `PROJECT_TOML` is set, `PROJECT_REPO_ROOT`, `PRIMARY_BRANCH`, and `OPS_REPO_ROOT` must also be set (by entrypoint or TOML). | Every agent |
| `lib/env.sh` | Loads `.env`, sets `FACTORY_ROOT`, exports project config (`FORGE_REPO`, `PROJECT_NAME`, etc.), defines `log()`, `forge_api()`, `forge_api_all()` (paginates all pages; accepts optional second TOKEN parameter, defaults to `$FORGE_TOKEN`; handles invalid/empty JSON responses gracefully — returns empty on parse error instead of crashing), `woodpecker_api()`, `wpdb()`, `memory_guard()` (skips agent if RAM < threshold), `load_secret()` (secret-source abstraction see below). Auto-loads project TOML if `PROJECT_TOML` is set. Exports per-agent tokens (`FORGE_PLANNER_TOKEN`, `FORGE_GARDENER_TOKEN`, `FORGE_VAULT_TOKEN`, `FORGE_SUPERVISOR_TOKEN`, `FORGE_PREDICTOR_TOKEN`) each falls back to `$FORGE_TOKEN` if not set. **Vault-only token guard (AD-006)**: `unset GITHUB_TOKEN CLAWHUB_TOKEN` so agents never hold external-action tokens only the runner container receives them. **Container note**: when `DISINTO_CONTAINER=1`, `.env` is NOT re-sourced compose already injects env vars (including `FORGE_URL=http://forgejo:3000`) and re-sourcing would clobber them. **Save/restore scope (#364)**: only `FORGE_URL` is preserved across `.env` re-sourcing (compose injects `http://forgejo:3000`, `.env` has `http://localhost:3000`). `FORGE_TOKEN` is NOT preserved so refreshed tokens in `.env` take effect immediately. **Per-agent token override (#762)**: agent run scripts export `FORGE_TOKEN_OVERRIDE=<agent-specific-token>` BEFORE sourcing `env.sh`; `env.sh` applies this override at lines 98-100, ensuring the correct identity survives any re-sourcing of `env.sh` by nested shells or `claude -p` invocations. **Required env var**: `FORGE_PASS` bot password for git HTTP push (Forgejo 11.x rejects API tokens for `git push`, #361). **Hard preconditions (#674)**: `USER` and `HOME` must be exported by the entrypoint before sourcing. When `PROJECT_TOML` is set, `PROJECT_REPO_ROOT`, `PRIMARY_BRANCH`, and `OPS_REPO_ROOT` must also be set (by entrypoint or TOML). **`load_secret NAME [DEFAULT]` (#793)**: backend-agnostic secret resolution. Precedence: (1) `/secrets/<NAME>.env` Nomad-rendered template, (2) current environment already set by `.env.enc` / compose, (3) `secrets/<NAME>.enc` age-encrypted per-key file (decrypted on demand, cached in process env), (4) DEFAULT or empty. Consumers call `$(load_secret GITHUB_TOKEN)` instead of `${GITHUB_TOKEN}` identical behavior whether secrets come from Docker compose injection or Nomad Vault templates. | Every agent |
| `lib/ci-helpers.sh` | `ci_passed()` — returns 0 if CI state is "success" (or no CI configured). `ci_required_for_pr()` — returns 0 if PR has code files (CI required), 1 if non-code only (CI not required). `is_infra_step()` — returns 0 if a single CI step failure matches infra heuristics (clone/git exit 128, any exit 137, log timeout patterns). `classify_pipeline_failure()` — returns "infra \<reason>" if any failed Woodpecker step matches infra heuristics via `is_infra_step()`, else "code". `ensure_priority_label()` — looks up (or creates) the `priority` label and returns its ID; caches in `_PRIORITY_LABEL_ID`. `ci_commit_status <sha>` — queries Woodpecker directly for CI state, falls back to forge commit status API. `ci_pipeline_number <sha>` — returns the Woodpecker pipeline number for a commit, falls back to parsing forge status `target_url`. `ci_promote <repo_id> <pipeline_num> <environment>` — promotes a pipeline to a named Woodpecker environment (vault-gated deployment: vault approves, vault-fire calls this — vault redesign in progress, see #73-#77). `ci_get_logs <pipeline_number> [--step <name>]` — reads CI logs from Woodpecker SQLite database via `lib/ci-log-reader.py`; outputs last 200 lines to stdout. Requires mounted woodpecker-data volume at /woodpecker-data. | dev-poll, review-poll, review-pr |
| `lib/ci-debug.sh` | CLI tool for Woodpecker CI: `list`, `status`, `logs`, `failures` subcommands. Not sourced — run directly. | Humans / dev-agent (tool access) |
| `lib/ci-log-reader.py` | Python tool: reads CI logs from Woodpecker SQLite database. `<pipeline_number> [--step <name>]` — returns last 200 lines from failed steps (or specified step). Used by `ci_get_logs()` in ci-helpers.sh. Requires `WOODPECKER_DATA_DIR` (default: /woodpecker-data). | ci-helpers.sh |

View file

@ -313,6 +313,68 @@ memory_guard() {
fi
}
# =============================================================================
# SECRET LOADING ABSTRACTION
# =============================================================================
# load_secret NAME [DEFAULT]
#
# Resolves a secret value using the following precedence:
# 1. /secrets/<NAME>.env — Nomad-rendered template (future)
# 2. Current environment — already set by .env.enc, compose, etc.
# 3. secrets/<NAME>.enc — age-encrypted per-key file (decrypted on demand)
# 4. DEFAULT (or empty)
#
# Prints the resolved value to stdout. Caches age-decrypted values in the
# process environment so subsequent calls are free.
# =============================================================================
load_secret() {
local name="$1"
local default="${2:-}"
# 1. Nomad-rendered template (future: Nomad writes /secrets/<NAME>.env)
local nomad_path="/secrets/${name}.env"
if [ -f "$nomad_path" ]; then
# Source into a subshell to extract just the value
local _nomad_val
_nomad_val=$(
set -a
# shellcheck source=/dev/null
source "$nomad_path"
set +a
printf '%s' "${!name:-}"
)
if [ -n "$_nomad_val" ]; then
export "$name=$_nomad_val"
printf '%s' "$_nomad_val"
return 0
fi
fi
# 2. Already in environment (set by .env.enc, compose injection, etc.)
if [ -n "${!name:-}" ]; then
printf '%s' "${!name}"
return 0
fi
# 3. Age-encrypted per-key file: secrets/<NAME>.enc (#777)
local _age_key="${HOME}/.config/sops/age/keys.txt"
local _enc_path="${FACTORY_ROOT}/secrets/${name}.enc"
if [ -f "$_enc_path" ] && [ -f "$_age_key" ] && command -v age &>/dev/null; then
local _dec_val
if _dec_val=$(age -d -i "$_age_key" "$_enc_path" 2>/dev/null) && [ -n "$_dec_val" ]; then
export "$name=$_dec_val"
printf '%s' "$_dec_val"
return 0
fi
fi
# 4. Default (or empty)
if [ -n "$default" ]; then
printf '%s' "$default"
fi
return 0
}
# Source tea helpers (available when tea binary is installed)
if command -v tea &>/dev/null; then
# shellcheck source=tea-helpers.sh

162
tests/smoke-load-secret.sh Normal file
View file

@ -0,0 +1,162 @@
#!/usr/bin/env bash
# tests/smoke-load-secret.sh — Unit tests for load_secret() precedence chain
#
# Covers the 4 precedence cases:
# 1. /secrets/<NAME>.env (Nomad template)
# 2. Current environment
# 3. secrets/<NAME>.enc (age-encrypted per-key file)
# 4. Default / empty fallback
#
# Required tools: bash, age (for case 3)
set -euo pipefail
FACTORY_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
FAILED=0
fail() { printf 'FAIL: %s\n' "$*" >&2; FAILED=1; }
pass() { printf 'PASS: %s\n' "$*"; }
# Set up a temp workspace and fake HOME so age key paths work
test_dir=$(mktemp -d)
fake_home=$(mktemp -d)
trap 'rm -rf "$test_dir" "$fake_home"' EXIT
# Minimal env for sourcing env.sh's load_secret function without the full boot
# We source the function definition directly to isolate the unit under test.
# shellcheck disable=SC2034
export USER="${USER:-test}"
export HOME="$fake_home"
# Source env.sh to get load_secret (and FACTORY_ROOT)
source "${FACTORY_ROOT}/lib/env.sh"
# ── Case 4: Default / empty fallback ────────────────────────────────────────
echo "=== 1/5 Case 4: default fallback ==="
unset TEST_SECRET_FALLBACK 2>/dev/null || true
val=$(load_secret TEST_SECRET_FALLBACK "my-default")
if [ "$val" = "my-default" ]; then
pass "load_secret returns default when nothing is set"
else
fail "Expected 'my-default', got '${val}'"
fi
val=$(load_secret TEST_SECRET_FALLBACK)
if [ -z "$val" ]; then
pass "load_secret returns empty when no default and nothing set"
else
fail "Expected empty, got '${val}'"
fi
# ── Case 2: Environment variable already set ────────────────────────────────
echo "=== 2/5 Case 2: environment variable ==="
export TEST_SECRET_ENV="from-environment"
val=$(load_secret TEST_SECRET_ENV "ignored-default")
if [ "$val" = "from-environment" ]; then
pass "load_secret returns env value over default"
else
fail "Expected 'from-environment', got '${val}'"
fi
unset TEST_SECRET_ENV
# ── Case 3: Age-encrypted per-key file ──────────────────────────────────────
echo "=== 3/5 Case 3: age-encrypted secret ==="
if command -v age &>/dev/null && command -v age-keygen &>/dev/null; then
# Generate a test age key
age_key_dir="${fake_home}/.config/sops/age"
mkdir -p "$age_key_dir"
age-keygen -o "${age_key_dir}/keys.txt" 2>/dev/null
pub_key=$(age-keygen -y "${age_key_dir}/keys.txt")
# Create encrypted secret
secrets_dir="${FACTORY_ROOT}/secrets"
mkdir -p "$secrets_dir"
printf 'age-test-value' | age -r "$pub_key" -o "${secrets_dir}/TEST_SECRET_AGE.enc"
unset TEST_SECRET_AGE 2>/dev/null || true
val=$(load_secret TEST_SECRET_AGE "fallback")
if [ "$val" = "age-test-value" ]; then
pass "load_secret decrypts age-encrypted secret"
else
fail "Expected 'age-test-value', got '${val}'"
fi
# Verify caching: call load_secret directly (not in subshell) so export propagates
unset TEST_SECRET_AGE 2>/dev/null || true
load_secret TEST_SECRET_AGE >/dev/null
if [ "${TEST_SECRET_AGE:-}" = "age-test-value" ]; then
pass "load_secret caches decrypted value in environment (direct call)"
else
fail "Decrypted value not cached in environment"
fi
# Clean up test secret
rm -f "${secrets_dir}/TEST_SECRET_AGE.enc"
rmdir "$secrets_dir" 2>/dev/null || true
unset TEST_SECRET_AGE
else
echo "SKIP: age/age-keygen not found — skipping age decryption test"
fi
# ── Case 1: Nomad template path ────────────────────────────────────────────
echo "=== 4/5 Case 1: Nomad template (/secrets/<NAME>.env) ==="
nomad_dir="/secrets"
if [ -w "$(dirname "$nomad_dir")" ] 2>/dev/null || [ -w "$nomad_dir" ] 2>/dev/null; then
mkdir -p "$nomad_dir"
printf 'TEST_SECRET_NOMAD=from-nomad-template\n' > "${nomad_dir}/TEST_SECRET_NOMAD.env"
# Even with env set, Nomad path takes precedence
export TEST_SECRET_NOMAD="from-env-should-lose"
val=$(load_secret TEST_SECRET_NOMAD "default")
if [ "$val" = "from-nomad-template" ]; then
pass "load_secret prefers Nomad template over env"
else
fail "Expected 'from-nomad-template', got '${val}'"
fi
rm -f "${nomad_dir}/TEST_SECRET_NOMAD.env"
rmdir "$nomad_dir" 2>/dev/null || true
unset TEST_SECRET_NOMAD
else
echo "SKIP: /secrets not writable — skipping Nomad template test (needs root or container)"
fi
# ── Precedence: env beats age ────────────────────────────────────────────
echo "=== 5/5 Precedence: env beats age-encrypted ==="
if command -v age &>/dev/null && command -v age-keygen &>/dev/null; then
age_key_dir="${fake_home}/.config/sops/age"
mkdir -p "$age_key_dir"
[ -f "${age_key_dir}/keys.txt" ] || age-keygen -o "${age_key_dir}/keys.txt" 2>/dev/null
pub_key=$(age-keygen -y "${age_key_dir}/keys.txt")
secrets_dir="${FACTORY_ROOT}/secrets"
mkdir -p "$secrets_dir"
printf 'age-value-should-lose' | age -r "$pub_key" -o "${secrets_dir}/TEST_SECRET_PREC.enc"
export TEST_SECRET_PREC="env-value-wins"
val=$(load_secret TEST_SECRET_PREC "default")
if [ "$val" = "env-value-wins" ]; then
pass "load_secret prefers env over age-encrypted file"
else
fail "Expected 'env-value-wins', got '${val}'"
fi
rm -f "${secrets_dir}/TEST_SECRET_PREC.enc"
rmdir "$secrets_dir" 2>/dev/null || true
unset TEST_SECRET_PREC
else
echo "SKIP: age not found — skipping precedence test"
fi
# ── Summary ───────────────────────────────────────────────────────────────
echo ""
if [ "$FAILED" -ne 0 ]; then
echo "=== SMOKE-LOAD-SECRET TEST FAILED ==="
exit 1
fi
echo "=== SMOKE-LOAD-SECRET TEST PASSED ==="