feat: Add property tests for evaluation safety and determinism (#133)#134
feat: Add property tests for evaluation safety and determinism (#133)#134andreasronge merged 2 commits intomainfrom
Conversation
Adds two new describe blocks to test/support/lisp_generators_test.exs: 1. "evaluation safety" - Property test that verifies generated PTC-Lisp expressions evaluate without crashing the interpreter. Handles both successful evaluation and expected runtime errors gracefully. 2. "determinism" - Property test that verifies the same input always produces the same output across multiple runs with identical context and tools. Both properties dynamically extract tool names from generated source code and provide matching mock tool implementations to ensure evaluation can complete. Fixes #133 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
PR Review: Add property tests for evaluation safety and determinismSummaryThis PR adds two new property tests ("evaluation safety" and "determinism") as specified in issue #133. The implementation correctly follows the spec from What's Good
Issues (Must Fix)
Suggestions (Optional)
SecurityNo concerns. Property tests run in test environment only. DocumentationNo updates needed. These are internal test additions that do not affect public API. VerdictRequest Changes - The duplicated code should be extracted into helper functions before merge. Both extractions are mechanical and follow the same pattern the codebase uses elsewhere (the file already has |
|
@claude please fix these issues from PR review: Issue 1: Tool setup logic duplicationLocation: defp build_tools_for_source(source, default_result \\ :result) do
base_tools = %{"test_tool" => fn _args -> default_result end}
Regex.scan(~r/\(call "([^"]+)"/, source)
|> Enum.reduce(base_tools, fn [_full, tool_name], acc ->
Map.put_new(acc, tool_name, fn _args -> default_result end)
end)
endThen replace both duplicated blocks with calls to this helper. Issue 2: Try/rescue evaluation pattern duplicationLocation: defp safe_run(source, opts) do
try do
PtcRunner.Lisp.run(source, opts)
rescue
_e -> {:error, :runtime_error}
end
endThen replace all three duplicated try/rescue blocks with calls to this helper. Please fix ALL issues above in a SINGLE commit. Make minimal changes to resolve each issue. |
Auto-Triage SummaryDecisions Made
ReasoningFIX_NOW decisions:
IGNORE decision:
Status
|
…nerators_test.exs - Extract tool extraction logic into `build_tools_for_source/2` helper (previously duplicated in evaluation safety and determinism tests) - Extract try/rescue pattern into `safe_run/2` helper (previously duplicated 3 times across both tests) Resolves PR review issues from #134. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed PR Review IssuesAll requested issues from the PR review have been fixed in commit 379154b: Issue 1: Tool setup logic duplication ✅
Issue 2: Try/rescue evaluation pattern duplication ✅
Test Results
The changes follow DRY principles and maintain consistency with existing helper functions in the file. |
PR Review: Add property tests for evaluation safety and determinismSummaryThis PR adds two property tests for evaluation safety and determinism as specified in issue #133. The implementation correctly follows the spec from What's Good
Issues (Must Fix)None. Suggestions (Optional)None. SecurityNo concerns. Property tests run in test environment only. DocumentationNo updates needed. These are internal test additions that do not affect public API. VerdictApprove - The implementation matches the spec requirements from issue #133 and |
Auto-Triage Summary (Cycle 1)Review Status✅ Latest review verdict: APPROVE The PR has been successfully reviewed and all previous issues have been resolved:
Decisions Made
Status
Next StepsAdding |
Summary
Implements property tests for evaluation safety and determinism as specified in issue #133.
Changes:
Test Results
mix precommitpasses all quality checksImplementation Notes
Fixes #133