Skip to content

test(KB-037): verify LoopExit contract acceptance criteria and add gap coverage tests#3442

Merged
serrrfirat merged 4 commits into
reborn-integrationfrom
kb/kb-037
May 10, 2026
Merged

test(KB-037): verify LoopExit contract acceptance criteria and add gap coverage tests#3442
serrrfirat merged 4 commits into
reborn-integrationfrom
kb/kb-037

Conversation

@serrrfirat
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat commented May 9, 2026

Summary

Verifies all 22 acceptance criteria from issue #3232 against the merged LoopExit contract (PR #3295, commit 02088937c) and adds 7 new tests covering identified gaps.

Acceptance Criteria Verification (22/22 pass)

AC Description Status
AC1 LoopExit and TurnRunnerOutcome are distinct types
AC2 Completed with raw reply text impossible (deny_unknown_fields)
AC3 Completed { AskUserReply } maps to terminal completed
AC4 Blocked requires gate ref + checkpoint id (non-optional fields)
AC5 Blocked with raw payloads rejected (deny_unknown_fields)
AC6 Blocked(approval/auth/resource) maps to corresponding blocked status
AC7 Cancelled accepted only with host cancellation observed
AC8 Arbitrary graceful stop maps to failed/interrupted
AC9 Failed carries stable sanitized kind
AC10 v1 MaxIterations maps to Failed { IterationLimit }
AC11 RecoveryRequired not in LoopExit enum (only in LoopExitMapping)
AC12 Invalid completed missing refs → failure/recovery
AC13 Invalid blocked missing checkpoint → violation (required field)
AC14 Terminal lock release (is_terminal returns true for Completed/Cancelled/Failed)
AC15 Blocked and recovery keep lock (keeps_active_lock returns true)
AC16 Profile requiring final checkpoint rejects without it
AC17 Interactive completes without final checkpoint (default false)
AC18 Usage/cost not trusted from LoopExit (usage_summary_ref is a ref, not value)
AC19 Tool/process waits dont use Blocked MVP (only Approval/Auth/Resource)
AC20 Delegated outcomes via refs under core variants (DelegatedResult + result_refs)
AC21 No raw secrets in serialized LoopExit (bounded refs + deny_unknown_fields)
AC22 Duplicate exit_id handling — contract accepted / store idempotency deferred to KB-059/KB-060 ✅ contract / ⏭ store

AC22 Note: This PR verifies the contract-layer behavior: duplicate exit_id handling is explicitly not enforced by LoopExit validation. Store-level idempotency remains intentionally deferred to KB-059/KB-060, where durable store tests should add duplicate_exit_id_idempotent_at_store_layer.

New Tests Added (7)

  1. no_reply_with_empty_refs_maps_to_missing_completion_reference_violation — NoReply + empty refs → MissingCompletionReference
  2. delegated_result_with_result_refs_maps_to_trusted_completed — DelegatedResult with result_refs satisfies completion evidence
  3. blocked_auth_and_resource_map_to_correct_blocked_reason — Auth/Resource LoopBlockedKind → correct BlockedReason
  4. all_failure_kinds_produce_stable_sanitized_category_strings — All 9 LoopFailureKind variants → stable snake_case categories
  5. recovery_required_is_not_a_loop_exit_variant — Extended serde rejection of recovery_required shapes
  6. cancelled_with_checkpoint_and_interrupted_refs_maps_to_cancelled_outcome — Full LoopCancelled roundtrip
  7. terminal_statuses_release_lock_and_non_terminal_keep_it — TurnStatus lock semantics for all variants

Quality Gate

  • cargo test -p ironclaw_turns — 76 tests pass (19 in loop_exit_contract, 7 new)
  • cargo clippy -p ironclaw_turns --tests -- -D warnings — zero warnings
  • cargo check -p ironclaw_turns — clean build
  • No production source files modified

Closes #3232

serrrfirat added 3 commits May 7, 2026 09:51
… contract

New tests covering acceptance criteria gaps:
- NoReply with empty refs → MissingCompletionReference violation
- DelegatedResult via result_refs → trusted Completed
- Auth and Resource LoopBlockedKind → correct BlockedReason mapping
- All 9 LoopFailureKind variants → stable sanitized category strings
- RecoveryRequired not a LoopExit variant (extended serde coverage)
- Cancelled with checkpoint + interrupted refs → Cancelled outcome
- Terminal vs non-terminal TurnStatus lock semantics
@github-actions github-actions Bot added size: XS < 10 changed lines (excluding docs) risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels May 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive gap coverage tests for loop exit contracts in crates/ironclaw_turns/tests/loop_exit_contract.rs. The new tests verify validation logic for completed, blocked, failed, and cancelled loop exits, ensure correct mapping of failure categories and blocked reasons, and validate terminal status behavior regarding lock retention. Additionally, the result_ref helper function is now utilized, and its dead_code allowance has been removed. I have no feedback to provide.

@ilblackdragon
Copy link
Copy Markdown
Member

Code Review: PR #3442 — test(KB-037): verify LoopExit contract acceptance criteria

Overview

Pure test-only PR (189 additions, 5 deletions, single file: crates/ironclaw_turns/tests/loop_exit_contract.rs) targeting reborn-integration. Verifies all 22 acceptance criteria from issue #3232 against the merged LoopExit contract (PR #3295) and adds 7 new gap-coverage tests. No production code is touched.

The new tests are well-organized, each maps cleanly to one or more ACs, and the test file's existing helper functions (exit_id, message_ref, result_ref, loop_gate_ref) are reused.

Strengths

  • all_failure_kinds_produce_stable_sanitized_category_strings is the standout test. Enumerates all 9 LoopFailureKind variants in a &[(kind, expected_category)] table and asserts each maps to its snake_case category. This is exactly the regression scaffolding that catches "someone added a new variant without wiring the category" — a class of bug that's invisible to per-variant tests.
  • Test 1 is a sharp edge case. NoReply + empty refs + completion_refs_verified: true would vacuously satisfy a naive verification check (empty list is "all verified"), but the contract correctly produces MissingCompletionReference. Good distinction between "refs you have were verified" vs "you have any refs at all."
  • Test 5 (recovery_required_is_not_a_loop_exit_variant) round-trips three plausible JSON shapes to confirm serde rejects them. Important: the contract says RecoveryRequired is mapping-only, not driver-returnable, and this test makes that contract enforceable.
  • AC checklist in the PR description is thorough. Each AC has a concrete artifact or test reference. That's how to land a verification PR.

Issues / Suggestions

Correctness / clarity

  • AC22 is marked ✅ but the footnote describes it as deferred ("contract-level idempotency not enforced at store level... separate concern for KB-059/KB-060"). That's a pass at the contract layer and a defer at the store layer. Worth restating that in the table — e.g., "✅ contract / ⏭ store (KB-059/KB-060)" — so a future reader doesn't think AC22 is fully covered when in fact the store-level check is intentionally out of scope.
  • Test 3 has an unreachable arm: LoopBlockedKind::Approval => unreachable!() inside a loop that only iterates over [Auth, Resource]. The unreachable!() is technically defensive (forces a compile error if a 4th variant is added), but the cleaner shape is to test all three variants in one parametric loop and drop the unreachable. Either restructure to a 3-element iteration, or drop the Approval arm entirely since the iterator can't produce it. Minor.
  • Test 7 will need updating soon. PR feat(reborn): add concrete TurnRunner worker composition #3446 (kb/kb-001) is adding TurnStatus::BlockedProcess. When that merges, this exhaustive-match-style test on is_terminal() / keeps_active_lock() won't fail-loud — it'll silently miss the new variant since it's iterating over a fixed list. Consider rewriting test 7 with an exhaustive match on TurnStatus (one arm per variant asserting is_terminal/keeps_active_lock) instead of for status in [...]. That way, adding BlockedProcess becomes a compile error rather than a silent test gap.
  • Test 5 could be a bit more thorough. The three rejected shapes all use recovery_required as a top-level field. Worth also trying json!("recovery_required") (string), json!({"type": "recovery_required"}) (tagged-union shape), and a mistaken-tag-with-content shape. Cheap to add and hardens against future serde tag-changes.

Style / conventions

  • Removing #[allow(dead_code)] from result_ref is correct — it's now used by test 2.
  • unwrap() in tests is fine per CLAUDE.md.
  • Imports are grouped sensibly; the new ones (LoopCancelled, LoopCancelledReasonKind, LoopExitViolationKind, TurnStatus) are all justified by the new tests.

Test coverage gaps still worth noting

  • No test of validation-policy permutations against a single exit. All seven new tests fix one validation-policy and check one exit shape. The existing earlier tests in this file likely cover the cross-product, but a quick validation_policy_required_final_checkpoint_when_set_and_relaxed_when_unset parametric test (covering AC16+AC17 in one loop) would be a tighter match to the AC table.
  • AC22 is deferred; KB-059/KB-060 should add an explicit duplicate_exit_id_idempotent_at_store_layer test when those land. Tracking-only — out of scope here.

Risk

risk: low is accurate. Pure additive tests, no production change, contract is already merged. Approve once the AC22 wording in the description is clarified and (ideally) test 7 is rewritten as an exhaustive match so KB-001's BlockedProcess doesn't slip through. The other points are nits.

@serrrfirat
Copy link
Copy Markdown
Collaborator Author

@ilblackdragon addressed the latest review notes in 463467e4:

  • Clarified AC22 in the PR description as ✅ contract / ⏭ store, with store-level idempotency deferred to KB-059/KB-060.
  • Reworked the blocked-kind test to cover Approval/Auth/Resource in one parametric loop and removed the unreachable arm.
  • Rewrote the TurnStatus lock semantics test to use an exhaustive match, so new variants fail loud at compile time.
  • Expanded recovery_required rejection coverage with string/tagged-union/mistaken-tag shapes.
  • Added a validation-policy permutation test for strict vs relaxed final-checkpoint handling.

Verification is green: cargo fmt --check, cargo test -p ironclaw_turns, cargo clippy -p ironclaw_turns --tests -- -D warnings, pre-commit safety, and PR CI all pass.

@serrrfirat serrrfirat merged commit db25f1f into reborn-integration May 10, 2026
14 checks passed
@serrrfirat serrrfirat deleted the kb/kb-037 branch May 10, 2026 19:54
@serrrfirat serrrfirat mentioned this pull request May 11, 2026
23 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules size: XS < 10 changed lines (excluding docs)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants