fix: [nomad-step-2] S2.2 — tools/vault-import.sh (import .env + sops into KV) (#880) #889
No reviewers
Labels
No labels
action
backlog
blocked
bug-report
cannot-reproduce
in-progress
in-triage
needs-triage
prediction/actioned
prediction/dismissed
prediction/unreviewed
priority
rejected
reproduced
tech-debt
underspecified
vision
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: disinto-admin/disinto#889
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/issue-880"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #880
Changes
AI Review
Summary
New tool that imports + sops-encrypted secrets into Vault KV. Security guards (localhost-only, age-key perms), dry-run mode, and idempotency tracking are all present. Two correctness blockers must be fixed before merge.
Blockers
critical
tools/vault-import.sh:301-365, 396: Operation format vs. read variable count mismatch — all bot and runner secrets written as empty strings.Bot operations are pushed as 5 colon-delimited fields:
Runner operations likewise:
But the parse loop at line 396 reads only 4 variables:
For a bot entry
bots:review:token:/fixtures/dot-env-complete:FORGE_REVIEW_TOKEN:source_filecaptures the literaltoken(not the file path)source_keycaptures/fixtures/dot-env-complete:FORGE_REVIEW_TOKENThe file-match test
[ "$source_file" = "$env_file" ]is always false. The fallback grep searches sops_env for^/fixtures/dot-env-complete:FORGE_REVIEW_TOKEN=— no match.source_valueis empty for every bot and runner operation. Every write stores an empty string.Fix: either extend the read to 5 variables (
IFS=':' read -r category source_type kv_subkey source_file source_key) or restructure the operation format to stay at 4 fields.medium
tools/vault-import.sh:139-144:_kv_put_secretsilently swallows all write failures.Both the response body and the
%{http_code}string are redirected to/dev/null. Auth failures, wrong KV mount, and HTTP 4xx/5xx errors all disappear. The script reportscreated/updatedeven when Vault never stored the value. Fix: capture the http_code and abort on non-2xx, or replace this function withhvault_kv_putfromlib/hvault.sh(which already handles this correctly and is available after thesourceat line 275).Additional findings (non-blocking)
medium
tests/vault-import.bats:196-197: Test verifies the modified dev-qwen token by fetching/v1/secret/data/disinto/bots/dev-qwen/token— appending/tokenas a KV path segment. KV v2 stores keys as fields inside the data JSON at the parent path (disinto/bots/dev-qwen); there is no child path/token. This curl will return 404 and the test silently passes because$status -eq 0is only checked if the command returns 0, butrunalways returns 0 — the[ "$status" -eq 0 ]assertion after it would actually catch this. Correct path:/v1/secret/data/disinto/bots/dev-qwenand grep the JSON forMODIFIED-LLAMA-TOKEN.medium
tools/vault-import.sh:428-430vstests/vault-import.bats:156: Runner token KV path is inconsistent. The header comment sayskv/disinto/runner/<NAME>/value(per-token path, key =value). The code writes path=disinto/runner, key=GITHUB_TOKEN. The test checks API pathdisinto/runner/GITHUB_TOKEN— which matches the comment's intent, not the code. Pick one and make them consistent.advisory
tools/vault-import.sh:135:_kv_put_secretusesjq -n(null input) while accumulating a payload across iterations.jq -nignores stdin, so each iteration overwrites the full payload with only the current K=V pair — only the last pair survives. This is harmless today because every call site passes exactly one pair, but the function signature advertises variadic K=V. If a multi-pair call is added later it will silently drop all but the last key. Sincehvault_kv_put(no-n) already handles this correctly, replacing_kv_put_secreteliminates both this and the silent-failure issue above.Verdict
REQUEST_CHANGES — Two blockers: operation-format mismatch writes empty values for all bots/runner secrets; _kv_put_secret silently swallows write failures. Two additional test-path bugs would make tests report false-positives.
Reviewed at
c6691d5| AGENTS.mdAI Review: REQUEST_CHANGES — Two blockers: operation-format mismatch writes empty values for all bots/runner secrets; _kv_put_secret silently swallows write failures. Two additional test-path bugs would make tests report false-positives.
c6691d5ee3todaa165b0a0AI Re-review (round 2)
Previous Findings
Remaining Blocker
critical
tools/vault-import.sh:328-329, 424: Bot operations still write empty strings to wrong keys.Bot operations are pushed as 5 pipe-separated fields:
The parse loop still reads only 4 variables:
For a 5-field bot entry the split lands as: category=bots, field=review, file=token (wrong), key=/path/to/env|FORGE_REVIEW_TOKEN (wrong).
Downstream effects:
[ "$file" = "$env_file" ]=> false ("token" != file path) => falls to sops branch^/path/to/env|FORGE_REVIEW_TOKEN=-- no match in sops_env => source_value=""Every bot secret is written as an empty string to a key named after the role. Fix: add a 5th read variable, e.g.
IFS='|' read -r category role subkey file envvar, and update the bots case block to use subkey as vault_key and envvar for the lookup.New Bug (introduced by this fixup)
medium
tools/vault-import.sh:431: Runner secret values stored with a leading =.The grep correctly matches
GITHUB_TOKEN=github-test-token-abc123. The sed patterns/^GITHUB_TOKEN//strips the key name but leaves the=, so source_value ==github-test-token-abc123.Fix: add the literal = to the sed pattern:
sed "s/^${key}=//".Alternatively, since
eval "$sops_env"was already called at line 313, all sops vars are available in the shell. The runner branch could use${!key:-}(same path as env-file vars) instead of the grep+sed extraction.Verdict
REQUEST_CHANGES — Critical blocker from prior review not fixed: bot operations still parse 5 fields with 4 variables. New sed bug leaves leading = in all runner secret values.
Reviewed at
5bbd948| Previous:c6691d5| AGENTS.mdAI Re-review (round 2): REQUEST_CHANGES — Critical blocker from prior review not fixed: bot operations still parse 5 fields with 4 variables. New sed bug leaves leading = in all runner secret values.
5bbd9487c9tode318d3511AI Re-review (round 3)
Previous Findings
Operation format/read mismatch (critical) → FIXED. The dispatch now counts pipe-separated fields with
awk -F'|' '{print NF}'and usesIFS='|' read -r category field subkey file envvarfor 5-field bot/runner ops.vault_key="$subkey"in the bots case correctly names the keytokenorpassinstead of the role name.Runner secret values with leading
=(medium) → FIXED. Sed pattern is nows/^${envvar}=//— strips both the variable name and the=, leaving just the value.New Issue
critical
tools/vault-import.sh:500, _kv_put_secret: KV v2 POST replaces the entire document; bot token is silently dropped by the subsequent pass write.Bot operations are enqueued as two separate entries for the same Vault path (lines 328-329, 339-340):
Each is dispatched individually to
_kv_put_secret, which POSTs a fresh{"data": {"token": "..."}}or{"data": {"pass": "..."}}to/v1/secret/data/disinto/bots/review.In KV v2 a POST to
/v1/secret/data/{path}creates a new version whose data is exactly what you send — it does not merge with the previous version. After both writes:{"data": {"token": "review-token"}}{"data": {"pass": "review-pass"}}GET returns version 2 only. The token is gone.
The test at
tests/vault-import.bats:141-143checks for both values:CI passes because this test file is not executed in the CI pipeline (it requires a running
vaultbinary and is skipped when absent). The same overwrite affectsdisinto/shared/forge(token + pass + admin_token = 3 sequential writes, only the last survives).What to fix: Collect all key-value pairs that share the same
vault_pathbefore calling_kv_put_secret, so they land in a single POST. Alternatively switch subsequent writes to the KV v2 PATCH endpoint so they merge with the existing version.Secondary note (same function): The accumulation loop inside
_kv_put_secret(line 135) usesjq -n, which ignores stdin, so the loop only preserves the last kv_pair when called with multiple arguments. This doesn't trigger today (always called with one pair), but it means grouping the pairs is not enough — the accumulation loop must also be fixed (drop-nand pipe the growing payload in as input:jq --arg k ... --arg v ... '. * {"data": {($k): $v}}'or equivalent merge).Verdict
REQUEST_CHANGES — Both previous blockers fixed; new critical bug: KV v2 POST overwrites entire document, so bot token+pass written as separate calls silently drops the token
Reviewed at
de318d3| Previous:5bbd948| AGENTS.mdAI Re-review (round 3): REQUEST_CHANGES — Both previous blockers fixed; new critical bug: KV v2 POST overwrites entire document, so bot token+pass written as separate calls silently drops the token
AI Re-review (round 4)
Previous Findings
Remaining Critical Bug
critical
tools/vault-import.sh:512-522(second pass): Unchanged keys are excluded frompaths_to_write, so a KV v2 POST containing only the updated sibling key silently drops the unchanged one.Concrete scenario — second run where
FORGE_REVIEW_TOKENchanged butFORGE_REVIEW_PASSdid not:pass→ reported, counted unchanged, not added topaths_to_write;token→ added topaths_to_write["disinto/bots/review"] = "token=new-token"{"data": {"token": "new-token"}}creates KV v2 version that contains onlytoken;passis gone from the current versionThe initial import works (all keys are
created, so all land inpaths_to_writetogether), which is why the tests pass. But any incremental re-run that changes only one key of a multi-key path silently destroys its sibling.Same exposure exists for
disinto/shared/forge(3 keys: token, pass, admin_token) anddisinto/shared/woodpecker(4 keys).Fix: Include all keys for a path in the write, not just the changed ones. Simplest approach: in the second pass, always add every key to
paths_to_writeregardless of status, and only usestatusfor the reporting/counting decision. The idempotency guarantee comes from KV v2 versioning, not from skipping the write.New Medium Bug
medium
tools/vault-import.sh:137: jq string interpolation is unsafe for secret values containing"or\.If
$vcontains a double-quote or backslash (e.g. a token likeabc\"xyz), the jq expression is malformed — jq exits non-zero,payloadbecomes empty, and the write silently posts{}or fails entirely.This is a regression: the prior code used
--arg k "$k" --arg v "$v"(safe shell-to-jq quoting). The fix removed the-nflag correctly but should have kept--arg:Verdict
REQUEST_CHANGES — KV v2 overwrite: previous fix handles all-new case but still drops unchanged keys when a sibling key at the same path is updated
Reviewed at
d853a21| Previous:de318d3| AGENTS.mdAI Re-review (round 4): REQUEST_CHANGES — KV v2 overwrite: previous fix handles all-new case but still drops unchanged keys when a sibling key at the same path is updated
d853a21022to428fa223d8AI Re-review (round 5)
Previous Findings
paths_to_writeregardless of status (lines 502–526). Comment at line 502–504 explicitly states the invariant. Incremental-update scenario (one sibling changed, other unchanged) is now handled correctly."or\(medium) → FIXED:_kv_put_secretat line 137 now usesjq --arg k "$k" --arg v "$v"throughout, matching the safe quoting pattern.Incremental diff (d853a21..428fa22)
The fix commits also land S2.3 scaffolding that wasn't in the prior review:
lib/hvault.sh—hvault_get_or_empty: Clean extraction. Subshell + EXIT trap pattern is correct (matches the documented RETURN-trap gotcha). 200/404/other case split is correct.tools/vault-apply-policies.sh: Refactored tohvault_get_or_empty— semantically equivalent to the deletedfetch_current_policyhelper. `jq -r '.data.policy // ""`` on a non-empty body preserves the empty-means-create branch. No regression.tools/vault-apply-roles.sh: Correct idempotency loop.role_fields_matchnormalises TTL strings to seconds before comparing (Vault GET returns integers, PUT accepts strings)._duration_to_secondscovers the subset emitted.hvault_get_or_emptyused correctly for absent-vs-present discrimination.lib/init/nomad/vault-nomad-auth.sh: Four-step installer is idempotent. Step 1 (auth enable) checkssys/authbefore writing. Step 2 (config) field-compares before writing. Step 4 (server.hcl) usescmp -sbefore install + SIGHUP. Preconditions check for root + required binaries. ThevaultCLI is deliberately excluded from the binary check (correct: only curl/jq/systemctl needed).nomad/server.hcl: Vault stanza is additive, non-breaking. Jobs not usingvault { role = "..." }are unaffected before S2.3 completes. Comment links S2.3 issue and explains bound_audiences invariant.Advisory
vault-nomad-auth.sh,vault-apply-roles.sh, and thenomad/server.hclvault stanza are S2.3 work bundled into a fix PR titled S2.2. Nothing breaks, but the PR title no longer matches the footprint. Consider updating the PR description to acknowledge the S2.3 pre-land, or split on the next series if the pattern recurs.Tech debt filed
vault-import.shpipe-separator inops_data/paths_to_writesilently truncates values containing|(medium). Extraction at line 510 uses${data%%|*}which stops at the first|; values with embedded pipes are corrupted before reaching the KV write. No impact for current fixtures (no pipes in fixture values) but latent for arbitrary secrets.Verdict
APPROVE — Both prior blockers closed; new S2.3 additions in the fix commits are correct
Reviewed at
428fa22| Previous:d853a21| AGENTS.mdAI Re-review (round 5): APPROVE — Both prior blockers closed; new S2.3 additions in the fix commits are correct