Skip to content

feat: Add structured output support with generate_program_structured! for E2E tests (#65)#67

Merged
andreasronge merged 2 commits intomainfrom
claude/issue-65-structured-output
Dec 2, 2025
Merged

feat: Add structured output support with generate_program_structured! for E2E tests (#65)#67
andreasronge merged 2 commits intomainfrom
claude/issue-65-structured-output

Conversation

@andreasronge
Copy link
Copy Markdown
Owner

Summary

Implements structured output support for E2E tests using ReqLLM's generate_object! API. This guarantees valid JSON output from LLMs, eliminating markdown fence cleanup issues and improving test reliability.

  • ✅ Adds PtcRunner.Schema.to_llm_schema/0 with flattened schema optimized for LLMs
  • ✅ Adds PtcRunner.TestSupport.LLMClient.generate_program_structured!/1 for structured output mode
  • ✅ Includes comprehensive E2E tests for both text and structured modes
  • ✅ All existing tests pass, no breaking changes

Implementation Details

Schema Flattening (to_llm_schema/0)

  • Uses anyOf pattern to list all operations at the top level
  • Avoids recursive $ref patterns that LLMs struggle with
  • Inline nested operation schemas for clarity
  • Optimized for ReqLLM's structured output API

Structured Output Function (generate_program_structured!/1)

  • Takes only task parameter (simplified API compared to text mode)
  • Uses ReqLLM.generate_object! for guaranteed JSON output
  • Wraps result in program envelope automatically
  • Removes need for markdown fence cleanup

E2E Tests

  • 6 new tests: 3 for text mode (existing), 3 for structured output mode
  • Tests verify filter operations, aggregations, and chained operations
  • Use constraint assertions matching existing test patterns
  • Can be run with: mix test --include e2e

Test Plan

  • ✅ All existing tests pass (300 tests)
  • ✅ Code formatting passes
  • ✅ Credo checks pass (no issues)
  • ✅ Both text and structured output modes work correctly

Fixes #65

… for E2E tests

- Implement PtcRunner.Schema.to_llm_schema/0 with flattened schema optimized for LLMs
- Use anyOf pattern to list all operations at top level, avoiding  recursion
- Add PtcRunner.TestSupport.LLMClient.generate_program_structured!/1 for structured output mode
- Uses ReqLLM.generate_object! for guaranteed valid JSON from LLMs
- Add E2E tests for both text mode (text mode) and structured output mode
- E2E tests verify filter, sum, and chained operations work correctly
- Addresses issue #65 review comments on schema validation and error handling

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

PR Review: feat: Add structured output support with generate_program_structured! for E2E tests (#65)

Summary

This PR adds structured output support for E2E tests by introducing to_llm_schema/0 and generate_program_structured!/1. The implementation is well-structured and follows existing patterns. However, there is a significant gap in unit test coverage for the new schema function.

What's Good

  • Clean separation of concerns: The flattened LLM schema (to_llm_schema/0) correctly avoids recursive $ref patterns that LLMs struggle with
  • Consistent patterns: generate_program_structured!/1 follows the same structure as generate_program!/1 (ensure_api_key!, model/timeout constants)
  • Proper error handling: The case pattern matching handles both wrapped and unwrapped LLM responses (lines 96-104 in llm_client.ex)
  • Good test organization: E2E tests are well-organized into separate describe blocks for text and structured modes
  • Comprehensive documentation: Both functions have proper @doc and @spec

Issues (Must Fix)

  1. Missing unit tests for to_llm_schema/0 - test/ptc_runner/schema_test.exs
    • Problem: The PR adds a new public function to_llm_schema/0 to PtcRunner.Schema but does not add any unit tests for it. The existing to_json_schema/0 has 16 comprehensive unit tests (lines 317-488) that verify structure, operation schemas, type mappings, and JSON validity.
    • Impact: Without unit tests, regressions to the LLM schema generation (e.g., breaking changes to type mappings, missing operations, incorrect structure) would only be caught by E2E tests which are slower and require external API calls.
    • Fix: Add unit tests for to_llm_schema/0 following the same patterns as to_json_schema/0 tests. At minimum, test:
      • Returns valid JSON Schema structure with anyOf (not $ref)
      • Has program property with correct structure
      • Generates 33 operation schemas
      • Each operation schema has required structure
      • Expr types use inline object schema (not $ref)
      • List of expr types have correct inline schema
      • Generated schema is valid JSON
    • Complexity: Mechanical (follow existing test patterns from lines 317-488)

Suggestions (Optional)

  1. OPTIONAL: Consider adding a synchronization test for LLM schema
    • Similar to the test on line 473-487 that verifies to_json_schema/0 matches priv/ptc_schema.json, consider whether the LLM schema should also be persisted to a file for documentation/reference purposes.
    • Benefit: Would provide a static reference for the LLM schema format
    • Complexity: Low (mechanical)

Security

No security concerns. The changes are additive and confined to test infrastructure.

Verdict

Request Changes - The missing unit tests for to_llm_schema/0 is a significant gap. The PR adds a new public API function that is tested only through E2E tests. Following the established pattern in this codebase (where to_json_schema/0 has comprehensive unit tests), to_llm_schema/0 should have equivalent coverage before merge.

@github-actions github-actions bot added the auto-triage-pending Auto-triage in progress label Dec 2, 2025
@andreasronge
Copy link
Copy Markdown
Owner Author

@claude please fix these issues from PR review:

Issue 1: Missing unit tests for to_llm_schema/0

Location: test/ptc_runner/schema_test.exs (after line 488)
Problem: The PR adds a new public function to_llm_schema/0 to PtcRunner.Schema but does not add any unit tests for it. The existing to_json_schema/0 has 16 comprehensive unit tests (lines 317-488) that verify structure, operation schemas, type mappings, and JSON validity. Without unit tests, regressions to the LLM schema generation would only be caught by E2E tests.

Fix: Add unit tests for to_llm_schema/0 in a new describe block following the same patterns as to_json_schema/0 tests. At minimum, add these tests:

  1. test "returns a valid JSON Schema structure" - Verify basic structure with title, type, properties, anyOf (not $ref)
  2. test "schema has program property with correct structure" - Verify program property uses anyOf with operation schemas
  3. test "generates 33 operation schemas" - Count operation schemas in anyOf array
  4. test "each operation schema has required structure" - Verify each operation has type, description, properties, required, additionalProperties
  5. test "literal operation schema is correct" - Test a specific operation
  6. test "operations with no fields have only 'op' as required" - Test count operation
  7. test "operations with optional fields do not include them in required" - Test get operation with optional default field
  8. test "expr types use inline object schema (not $ref)" - Verify let operation's value/in fields use inline object schema with op property (not $ref)
  9. test "list of expr types have correct inline schema" - Verify and operation's conditions field uses array with inline object items
  10. test "list of string types have correct schema" - Test select operation
  11. test "non_neg_integer types have minimum constraint" - Test nth operation
  12. test "recursive operations like pipe validate correctly" - Test pipe operation
  13. test "generated schema is valid JSON" - Verify schema can be encoded and decoded

Follow the exact testing patterns from lines 317-471. The key difference is that to_llm_schema/0 uses anyOf instead of $ref and inline object schemas for expr types.

Please fix ALL issues above in a SINGLE commit. Make minimal changes to resolve each issue.

@andreasronge
Copy link
Copy Markdown
Owner Author

Auto-Triage Summary

Decisions Made

# Issue Decision Action Reasoning
1 Missing unit tests for to_llm_schema/0 FIX_NOW Included in fix request above Mechanical change following existing test patterns (lines 317-488). In-scope: PR added this public function, completing test coverage is directly related. No design decisions needed.
2 Optional: schema synchronization test IGNORE None Explicitly marked as OPTIONAL by reviewer. Not required for PR approval. LLM schema is intentionally different from JSON schema (flattened).

Status

  • FIX_NOW items: 1 (batched in single fix comment above)
  • Issues created: 0
  • Items ignored: 1

Next Steps

Claude Code will process the fix request and add comprehensive unit tests for to_llm_schema/0 following the established testing patterns. Once the fix is committed and all checks pass, the PR will be ready for merge.

@github-actions github-actions bot added auto-triage-cycle-1 Auto-triage cycle 1 auto-triage-complete Auto-triage completed and removed auto-triage-pending Auto-triage in progress labels Dec 2, 2025
Adds 13 new tests for the to_llm_schema/0 function following the same
patterns as the existing to_json_schema/0 tests. Tests verify:
- Basic schema structure with title, type, properties, anyOf
- Program property with correct structure
- Correct count of 33 operation schemas
- Each operation has required structure
- Literal operation schema correctness
- Operations with no fields have only 'op' as required
- Operations with optional fields exclude them from required
- Expr types use inline object schema (not $ref)
- List of expr types have correct inline schema
- List of string types have correct schema
- Non-negative integer types have minimum constraint
- Recursive operations like pipe validate correctly
- Generated schema is valid JSON

Fixes issue from PR #67 review.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@andreasronge
Copy link
Copy Markdown
Owner Author

Summary

Fixed the missing unit tests for to_llm_schema/0 as identified in the PR review.

Changes Made

Added 13 comprehensive unit tests in a new describe block in test/ptc_runner/schema_test.exs (after line 488) that verify:

  1. ✅ Returns a valid JSON Schema structure (title, type, properties, anyOf)
  2. ✅ Schema has program property with correct structure (anyOf, no $ref)
  3. ✅ Generates 33 operation schemas
  4. ✅ Each operation schema has required structure (type, properties, required, additionalProperties)
  5. ✅ Literal operation schema is correct
  6. ✅ Operations with no fields have only 'op' as required (count operation)
  7. ✅ Operations with optional fields exclude them from required (get operation)
  8. ✅ Expr types use inline object schema (not $ref) - let operation
  9. ✅ List of expr types have correct inline schema (and operation)
  10. ✅ List of string types have correct schema (select operation)
  11. ✅ Non-negative integer types have minimum constraint (nth operation)
  12. ✅ Recursive operations like pipe validate correctly
  13. ✅ Generated schema is valid JSON (can be encoded and decoded)

All tests follow the exact patterns from the existing to_json_schema/0 tests (lines 317-471), with the key difference being verification of anyOf and inline object schemas instead of $ref.

Test Results

  • ✅ All 313 tests pass (6 excluded E2E tests)
  • ✅ Code formatting passes
  • ✅ Credo checks pass
  • ✅ Compilation successful

Commit: 0e85186

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

PR Review: feat: Add structured output support with generate_program_structured! for E2E tests (#65)

Summary

Well-implemented PR that adds LLM structured output support with a flattened JSON Schema (to_llm_schema/0) and new generate_program_structured!/1 function. The implementation is clean, follows existing patterns, and includes comprehensive tests.

What's Good

  • Clean schema design: The flattened anyOf pattern correctly avoids $ref recursion that LLMs struggle with
  • Good code reuse: Shares build_required/1 with to_json_schema/0, avoiding duplication
  • Comprehensive tests: 14 new tests for to_llm_schema/0 and 3 E2E structured output tests, following existing test patterns
  • Proper documentation: Updated moduledoc and doc comments explain the difference between text and structured modes
  • Graceful handling: The case in generate_program_structured!/1 handles both wrapped and unwrapped LLM responses
  • Test organization: Existing text mode tests moved to a labeled describe block for clarity

Issues (Must Fix)

None identified. The implementation is complete and consistent.

Suggestions (Optional)

  1. Missing additionalProperties on nested expr objects - Nice-to-have

    • Suggestion: The inline :expr schema (type_to_llm_json_schema(:expr)) at lib/ptc_runner/schema.ex:427-435 does not include "additionalProperties" => false
    • Benefit: Would make the schema stricter and more consistent with the top-level operation schemas
    • Complexity: Trivial - just add the key to both :expr and {:list, :expr} clauses
    • Recommendation: Could be addressed now or in a future PR
  2. Spec on generate_program_structured!/1 could be more specific - Nice-to-have

    • Suggestion: The @spec could document that it raises on API key issues or LLM failures
    • Complexity: Documentation change only
    • Recommendation: Low priority, current docs explain the raises behavior

Security

No concerns. The implementation does not introduce new attack vectors:

  • LLM output is validated by the PTC parser/interpreter before execution
  • API key handling follows existing patterns

Verdict

Approve - Implementation is complete, well-tested, and follows project conventions. The flattened schema approach is the right solution for LLM structured output, and the code quality is high.

@github-actions github-actions bot added auto-triage-pending Auto-triage in progress auto-triage-cycle-2 Auto-triage cycle 2 and removed auto-triage-cycle-1 Auto-triage cycle 1 auto-triage-pending Auto-triage in progress labels Dec 2, 2025
@andreasronge andreasronge enabled auto-merge (squash) December 2, 2025 15:18
@andreasronge andreasronge merged commit fbdaea7 into main Dec 2, 2025
3 checks passed
@andreasronge andreasronge deleted the claude/issue-65-structured-output branch December 2, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-triage-complete Auto-triage completed auto-triage-cycle-2 Auto-triage cycle 2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add structured output support with generate_object! for E2E tests

1 participant