feat: Add structured output support with generate_program_structured! for E2E tests (#65) by andreasronge · Pull Request #67 · andreasronge/ptc_runner

andreasronge · 2025-12-02T15:06:09Z

Summary

Implements structured output support for E2E tests using ReqLLM's generate_object! API. This guarantees valid JSON output from LLMs, eliminating markdown fence cleanup issues and improving test reliability.

✅ Adds PtcRunner.Schema.to_llm_schema/0 with flattened schema optimized for LLMs
✅ Adds PtcRunner.TestSupport.LLMClient.generate_program_structured!/1 for structured output mode
✅ Includes comprehensive E2E tests for both text and structured modes
✅ All existing tests pass, no breaking changes

Implementation Details

Schema Flattening (`to_llm_schema/0`)

Uses anyOf pattern to list all operations at the top level
Avoids recursive $ref patterns that LLMs struggle with
Inline nested operation schemas for clarity
Optimized for ReqLLM's structured output API

Structured Output Function (`generate_program_structured!/1`)

Takes only task parameter (simplified API compared to text mode)
Uses ReqLLM.generate_object! for guaranteed JSON output
Wraps result in program envelope automatically
Removes need for markdown fence cleanup

E2E Tests

6 new tests: 3 for text mode (existing), 3 for structured output mode
Tests verify filter operations, aggregations, and chained operations
Use constraint assertions matching existing test patterns
Can be run with: mix test --include e2e

Test Plan

✅ All existing tests pass (300 tests)
✅ Code formatting passes
✅ Credo checks pass (no issues)
✅ Both text and structured output modes work correctly

Fixes #65

… for E2E tests - Implement PtcRunner.Schema.to_llm_schema/0 with flattened schema optimized for LLMs - Use anyOf pattern to list all operations at top level, avoiding recursion - Add PtcRunner.TestSupport.LLMClient.generate_program_structured!/1 for structured output mode - Uses ReqLLM.generate_object! for guaranteed valid JSON from LLMs - Add E2E tests for both text mode (text mode) and structured output mode - E2E tests verify filter, sum, and chained operations work correctly - Addresses issue #65 review comments on schema validation and error handling 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2025-12-02T15:08:54Z

PR Review: feat: Add structured output support with generate_program_structured! for E2E tests (#65)

Summary

This PR adds structured output support for E2E tests by introducing to_llm_schema/0 and generate_program_structured!/1. The implementation is well-structured and follows existing patterns. However, there is a significant gap in unit test coverage for the new schema function.

What's Good

Clean separation of concerns: The flattened LLM schema (to_llm_schema/0) correctly avoids recursive $ref patterns that LLMs struggle with
Consistent patterns: generate_program_structured!/1 follows the same structure as generate_program!/1 (ensure_api_key!, model/timeout constants)
Proper error handling: The case pattern matching handles both wrapped and unwrapped LLM responses (lines 96-104 in llm_client.ex)
Good test organization: E2E tests are well-organized into separate describe blocks for text and structured modes
Comprehensive documentation: Both functions have proper @doc and @spec

Issues (Must Fix)

Missing unit tests for to_llm_schema/0 - test/ptc_runner/schema_test.exs
- Problem: The PR adds a new public function to_llm_schema/0 to PtcRunner.Schema but does not add any unit tests for it. The existing to_json_schema/0 has 16 comprehensive unit tests (lines 317-488) that verify structure, operation schemas, type mappings, and JSON validity.
- Impact: Without unit tests, regressions to the LLM schema generation (e.g., breaking changes to type mappings, missing operations, incorrect structure) would only be caught by E2E tests which are slower and require external API calls.
- Fix: Add unit tests for to_llm_schema/0 following the same patterns as to_json_schema/0 tests. At minimum, test:
  - Returns valid JSON Schema structure with anyOf (not $ref)
  - Has program property with correct structure
  - Generates 33 operation schemas
  - Each operation schema has required structure
  - Expr types use inline object schema (not $ref)
  - List of expr types have correct inline schema
  - Generated schema is valid JSON
- Complexity: Mechanical (follow existing test patterns from lines 317-488)

Suggestions (Optional)

OPTIONAL: Consider adding a synchronization test for LLM schema
- Similar to the test on line 473-487 that verifies to_json_schema/0 matches priv/ptc_schema.json, consider whether the LLM schema should also be persisted to a file for documentation/reference purposes.
- Benefit: Would provide a static reference for the LLM schema format
- Complexity: Low (mechanical)

Security

No security concerns. The changes are additive and confined to test infrastructure.

Verdict

Request Changes - The missing unit tests for to_llm_schema/0 is a significant gap. The PR adds a new public API function that is tested only through E2E tests. Following the established pattern in this codebase (where to_json_schema/0 has comprehensive unit tests), to_llm_schema/0 should have equivalent coverage before merge.

andreasronge · 2025-12-02T15:10:56Z

@claude please fix these issues from PR review:

Issue 1: Missing unit tests for to_llm_schema/0

Location: test/ptc_runner/schema_test.exs (after line 488)
Problem: The PR adds a new public function to_llm_schema/0 to PtcRunner.Schema but does not add any unit tests for it. The existing to_json_schema/0 has 16 comprehensive unit tests (lines 317-488) that verify structure, operation schemas, type mappings, and JSON validity. Without unit tests, regressions to the LLM schema generation would only be caught by E2E tests.

Fix: Add unit tests for to_llm_schema/0 in a new describe block following the same patterns as to_json_schema/0 tests. At minimum, add these tests:

test "returns a valid JSON Schema structure" - Verify basic structure with title, type, properties, anyOf (not $ref)
test "schema has program property with correct structure" - Verify program property uses anyOf with operation schemas
test "generates 33 operation schemas" - Count operation schemas in anyOf array
test "each operation schema has required structure" - Verify each operation has type, description, properties, required, additionalProperties
test "literal operation schema is correct" - Test a specific operation
test "operations with no fields have only 'op' as required" - Test count operation
test "operations with optional fields do not include them in required" - Test get operation with optional default field
test "expr types use inline object schema (not $ref)" - Verify let operation's value/in fields use inline object schema with op property (not $ref)
test "list of expr types have correct inline schema" - Verify and operation's conditions field uses array with inline object items
test "list of string types have correct schema" - Test select operation
test "non_neg_integer types have minimum constraint" - Test nth operation
test "recursive operations like pipe validate correctly" - Test pipe operation
test "generated schema is valid JSON" - Verify schema can be encoded and decoded

Follow the exact testing patterns from lines 317-471. The key difference is that to_llm_schema/0 uses anyOf instead of $ref and inline object schemas for expr types.

Please fix ALL issues above in a SINGLE commit. Make minimal changes to resolve each issue.

andreasronge · 2025-12-02T15:11:13Z

Auto-Triage Summary

Decisions Made

#	Issue	Decision	Action	Reasoning
1	Missing unit tests for to_llm_schema/0	FIX_NOW	Included in fix request above	Mechanical change following existing test patterns (lines 317-488). In-scope: PR added this public function, completing test coverage is directly related. No design decisions needed.
2	Optional: schema synchronization test	IGNORE	None	Explicitly marked as OPTIONAL by reviewer. Not required for PR approval. LLM schema is intentionally different from JSON schema (flattened).

Status

FIX_NOW items: 1 (batched in single fix comment above)
Issues created: 0
Items ignored: 1

Next Steps

Claude Code will process the fix request and add comprehensive unit tests for to_llm_schema/0 following the established testing patterns. Once the fix is committed and all checks pass, the PR will be ready for merge.

Adds 13 new tests for the to_llm_schema/0 function following the same patterns as the existing to_json_schema/0 tests. Tests verify: - Basic schema structure with title, type, properties, anyOf - Program property with correct structure - Correct count of 33 operation schemas - Each operation has required structure - Literal operation schema correctness - Operations with no fields have only 'op' as required - Operations with optional fields exclude them from required - Expr types use inline object schema (not $ref) - List of expr types have correct inline schema - List of string types have correct schema - Non-negative integer types have minimum constraint - Recursive operations like pipe validate correctly - Generated schema is valid JSON Fixes issue from PR #67 review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

andreasronge · 2025-12-02T15:13:00Z

Summary

Fixed the missing unit tests for to_llm_schema/0 as identified in the PR review.

Changes Made

Added 13 comprehensive unit tests in a new describe block in test/ptc_runner/schema_test.exs (after line 488) that verify:

✅ Returns a valid JSON Schema structure (title, type, properties, anyOf)
✅ Schema has program property with correct structure (anyOf, no $ref)
✅ Generates 33 operation schemas
✅ Each operation schema has required structure (type, properties, required, additionalProperties)
✅ Literal operation schema is correct
✅ Operations with no fields have only 'op' as required (count operation)
✅ Operations with optional fields exclude them from required (get operation)
✅ Expr types use inline object schema (not $ref) - let operation
✅ List of expr types have correct inline schema (and operation)
✅ List of string types have correct schema (select operation)
✅ Non-negative integer types have minimum constraint (nth operation)
✅ Recursive operations like pipe validate correctly
✅ Generated schema is valid JSON (can be encoded and decoded)

All tests follow the exact patterns from the existing to_json_schema/0 tests (lines 317-471), with the key difference being verification of anyOf and inline object schemas instead of $ref.

Test Results

✅ All 313 tests pass (6 excluded E2E tests)
✅ Code formatting passes
✅ Credo checks pass
✅ Compilation successful

Commit: 0e85186

github-actions · 2025-12-02T15:15:16Z

PR Review: feat: Add structured output support with generate_program_structured! for E2E tests (#65)

Summary

Well-implemented PR that adds LLM structured output support with a flattened JSON Schema (to_llm_schema/0) and new generate_program_structured!/1 function. The implementation is clean, follows existing patterns, and includes comprehensive tests.

What's Good

Clean schema design: The flattened anyOf pattern correctly avoids $ref recursion that LLMs struggle with
Good code reuse: Shares build_required/1 with to_json_schema/0, avoiding duplication
Comprehensive tests: 14 new tests for to_llm_schema/0 and 3 E2E structured output tests, following existing test patterns
Proper documentation: Updated moduledoc and doc comments explain the difference between text and structured modes
Graceful handling: The case in generate_program_structured!/1 handles both wrapped and unwrapped LLM responses
Test organization: Existing text mode tests moved to a labeled describe block for clarity

Issues (Must Fix)

None identified. The implementation is complete and consistent.

Suggestions (Optional)

Missing additionalProperties on nested expr objects - Nice-to-have
- Suggestion: The inline :expr schema (type_to_llm_json_schema(:expr)) at lib/ptc_runner/schema.ex:427-435 does not include "additionalProperties" => false
- Benefit: Would make the schema stricter and more consistent with the top-level operation schemas
- Complexity: Trivial - just add the key to both :expr and {:list, :expr} clauses
- Recommendation: Could be addressed now or in a future PR
Spec on generate_program_structured!/1 could be more specific - Nice-to-have
- Suggestion: The @spec could document that it raises on API key issues or LLM failures
- Complexity: Documentation change only
- Recommendation: Low priority, current docs explain the raises behavior

Security

No concerns. The implementation does not introduce new attack vectors:

LLM output is validated by the PTC parser/interpreter before execution
API key handling follows existing patterns

Verdict

Approve - Implementation is complete, well-tested, and follows project conventions. The flattened schema approach is the right solution for LLM structured output, and the code quality is high.

andreasronge mentioned this pull request Dec 2, 2025

Add structured output support with generate_object! for E2E tests #65

Closed

4 tasks

github-actions bot added the auto-triage-pending Auto-triage in progress label Dec 2, 2025

github-actions bot added auto-triage-cycle-1 Auto-triage cycle 1 auto-triage-complete Auto-triage completed and removed auto-triage-pending Auto-triage in progress labels Dec 2, 2025

github-actions bot added auto-triage-pending Auto-triage in progress auto-triage-cycle-2 Auto-triage cycle 2 and removed auto-triage-cycle-1 Auto-triage cycle 1 auto-triage-pending Auto-triage in progress labels Dec 2, 2025

andreasronge enabled auto-merge (squash) December 2, 2025 15:18

andreasronge merged commit fbdaea7 into main Dec 2, 2025
3 checks passed

andreasronge deleted the claude/issue-65-structured-output branch December 2, 2025 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add structured output support with generate_program_structured! for E2E tests (#65)#67

feat: Add structured output support with generate_program_structured! for E2E tests (#65)#67
andreasronge merged 2 commits intomainfrom
claude/issue-65-structured-output

andreasronge commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

andreasronge commented Dec 2, 2025

Uh oh!

andreasronge commented Dec 2, 2025

Uh oh!

andreasronge commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andreasronge commented Dec 2, 2025

Summary

Implementation Details

Schema Flattening (to_llm_schema/0)

Structured Output Function (generate_program_structured!/1)

E2E Tests

Test Plan

Uh oh!

github-actions bot commented Dec 2, 2025

PR Review: feat: Add structured output support with generate_program_structured! for E2E tests (#65)

Summary

What's Good

Issues (Must Fix)

Suggestions (Optional)

Security

Verdict

Uh oh!

andreasronge commented Dec 2, 2025

Issue 1: Missing unit tests for to_llm_schema/0

Uh oh!

andreasronge commented Dec 2, 2025

Auto-Triage Summary

Decisions Made

Status

Next Steps

Uh oh!

andreasronge commented Dec 2, 2025

Summary

Changes Made

Test Results

Uh oh!

github-actions bot commented Dec 2, 2025

PR Review: feat: Add structured output support with generate_program_structured! for E2E tests (#65)

Summary

What's Good

Issues (Must Fix)

Suggestions (Optional)

Security

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Schema Flattening (`to_llm_schema/0`)

Structured Output Function (`generate_program_structured!/1`)