Skip to content

fix: Improve LLM schema descriptions and use Haiku 4.5#73

Merged
andreasronge merged 4 commits intomainfrom
fix/llm-schema-improvements
Dec 2, 2025
Merged

fix: Improve LLM schema descriptions and use Haiku 4.5#73
andreasronge merged 4 commits intomainfrom
fix/llm-schema-improvements

Conversation

@andreasronge
Copy link
Copy Markdown
Owner

Summary

Follow-up to #71 addressing review suggestions:

  • Add consistent descriptions with examples to all comparison operations (neq, gte, lt, lte, contains)
  • Add consistent descriptions with examples to aggregation operations (avg, min, max)
  • Update e2e tests to use Claude Haiku 4.5 (faster, cheaper, and reliably follows nested schemas)
  • Add model compatibility section to README explaining nested anyOf schema requirements

Changes

Schema descriptions

All comparison and aggregation operations now have consistent descriptions with examples:

"gte" => "Field >= value. Example: {op:'gte', field:'age', value:18}"
"avg" => "Average of numeric field. Example: {op:'avg', field:'rating'}"

Model update

Changed from claude-sonnet-4 to claude-haiku-4.5 - both work reliably, but Haiku is faster and cheaper.

README updates

  • Updated example to use Haiku 4.5
  • Added "Model Compatibility" section explaining that not all models handle complex nested anyOf schemas
  • Removed hardcoded description examples that could diverge from actual schema

Test plan

  • All 318 unit tests pass
  • All 6 e2e tests pass with Haiku 4.5
  • Quality checks pass (format, compile, credo)

🤖 Generated with Claude Code

- Add consistent descriptions with examples to all comparison ops
  (neq, gte, lt, lte, contains)
- Add consistent descriptions with examples to aggregation ops
  (avg, min, max)
- Update e2e tests to use Claude Haiku 4.5 (faster, cheaper, reliable)
- Add model compatibility section to README explaining nested anyOf
  schema requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

PR Review: fix: Improve LLM schema descriptions and use Haiku 4.5

Summary

This PR improves the LLM schema by adding consistent example-based descriptions to comparison and aggregation operations, and switches the default model to Claude Haiku 4.5. The changes are well-scoped and correctly applied.

What's Good

  • The example-based descriptions follow a consistent format: "Description. Example: {op:'xxx', ...}"
  • Model change from claude-sonnet-4 to claude-haiku-4.5 is applied consistently across both the LLM client and README
  • The new "Model Compatibility" section in the README is informative without being overly prescriptive
  • Correctly removed the hardcoded description examples from README that could diverge from the actual schema

Issues (Must Fix)

None identified. The PR accomplishes what it sets out to do.

Suggestions (Optional)

  1. Incomplete pattern: Some operations lack examples - Nice-to-have

    • Observation: The PR adds examples to comparison ops (neq, gte, lt, lte, contains) and aggregation ops (avg, min, max), but several other operations still have plain descriptions without examples:
      • literal: "A literal JSON value"
      • load: "Load a resource by name. Use name='input' to load the input data"
      • var: "Reference a variable"
      • let: "Bind a value to a variable"
      • if: "Conditional expression"
      • and/or/not: "Logical AND/OR/NOT of conditions"
      • merge/concat/zip: "Merge/Concatenate/Zip multiple..."
      • select: "Select specific fields from objects"
      • reject: "Reject collection elements based on condition"
      • get: "Get value at path"
      • first/last: "Get first/last element"
      • nth: "Get element at index"
      • call: "Call a tool"
    • Benefit: Complete consistency across all operations would maximize LLM understanding
    • Complexity: Low - mechanical changes following established pattern
    • Recommendation: Out of scope for this PR. Consider a follow-up issue if this becomes a problem with LLM generation quality.
  2. JSON schema sync - Informational

    • Observation: The priv/ptc_schema.json file is updated but appears to be generated from PtcRunner.Schema. Ensure there's a process (or task) to keep this file in sync when schema.ex changes.
    • Recommendation: No action needed for this PR, but worth noting for future maintenance.

Security

No concerns. The changes are purely documentation and configuration updates.

Verdict

Approve - The PR cleanly accomplishes its stated goals: improved descriptions with examples for comparison/aggregation operations and model update to Haiku 4.5. The changes are consistent and well-documented in the README.

@github-actions github-actions bot added the auto-triage-pending Auto-triage in progress label Dec 2, 2025
@andreasronge
Copy link
Copy Markdown
Owner Author

@claude please fix these issues from PR review:

Issue 1: Complete the example-based description pattern for remaining operations

Location: lib/ptc_runner/schema.ex (multiple lines)
Problem: The PR added examples to comparison ops (neq, gte, lt, lte, contains) and aggregation ops (avg, min, max), but several other operations still have plain descriptions without examples. This creates inconsistency in the schema.

Operations needing examples:

  • literal (line 15): "A literal JSON value"
  • load (line 21): "Load a resource by name. Use name='input' to load the input data"
  • var (line 27): "Reference a variable"
  • let (line 35): "Bind a value to a variable"
  • if (line 45): "Conditional expression"
  • and (line 53): "Logical AND of conditions"
  • or (line 59): "Logical OR of conditions"
  • not (line 65): "Logical NOT of a condition"
  • merge (line 73): "Merge multiple objects"
  • concat (line 79): "Concatenate multiple lists"
  • zip (line 85): "Zip multiple lists together"
  • select (line 112): "Select specific fields from objects"
  • reject (line 119): "Reject collection elements based on condition"
  • get (line 179): "Get value at path"
  • first (line 216): "Get first element"
  • last (line 221): "Get last element"
  • nth (line 224): "Get element at index"
  • call (line 232): "Call a tool"

Fix: Add example JSON snippets to each description following the pattern already established in this PR: "Description. Example: {op:'xxx', field:'yyy', ...}"

Examples for guidance:

  • literal: "A literal JSON value. Example: {op:'literal', value:42}"
  • var: "Reference a variable. Example: {op:'var', name:'count'}"
  • let: "Bind a value to a variable. Example: {op:'let', name:'x', value:{op:'literal', value:5}, in:{op:'var', name:'x'}}"
  • if: "Conditional expression. Example: {op:'if', condition:{op:'gt', field:'age', value:18}, then:{op:'literal', value:'adult'}, else:{op:'literal', value:'minor'}}"

Please fix ALL operations listed above in a SINGLE commit. Make minimal changes to resolve the inconsistency. After updating schema.ex, regenerate priv/ptc_schema.json by running: mix test test/ptc_runner/schema_test.exs:473 (the test will show the diff if schema is out of sync).

@andreasronge
Copy link
Copy Markdown
Owner Author

Auto-Triage Summary

Decisions Made

# Issue Decision Action
1 Incomplete pattern: Some operations lack examples FIX_NOW Included in fix request above
2 JSON schema sync process IGNORE Already handled by test at schema_test.exs:473-487

Analysis Details

Issue 1: Incomplete example pattern

  • Reasoning: This is IN-SCOPE because the PR specifically improves LLM schema descriptions with examples. The fix completes the pattern started by the PR across remaining operations.
  • Complexity: Mechanical - simply applying the same "Description. Example: {...}" pattern established by this PR
  • Impact: Low risk, improves LLM understanding consistency

Issue 2: JSON schema sync

  • Reasoning: The reviewer's concern is already addressed. A test exists at test/ptc_runner/schema_test.exs:473-487 that validates priv/ptc_schema.json matches the generated schema. This test will fail if they get out of sync.
  • No action needed: Existing infrastructure handles this

Status

  • FIX_NOW items: 1 (batched in single fix comment above)
  • Issues created: 0
  • Items ignored: 1 (with reasoning documented above)

Next Steps

Waiting for @claude to apply the fixes in the comment above. Once fixes are committed and all checks pass, the PR will be ready to merge.

@github-actions github-actions bot added auto-triage-cycle-1 Auto-triage cycle 1 auto-triage-complete Auto-triage completed and removed auto-triage-pending Auto-triage in progress labels Dec 2, 2025
Add example JSON snippets to operation descriptions for consistency:
- literal, load, var, let, if, and, or, not: control flow and data ops
- merge, concat, zip, select, reject: collection operations
- get, first, last, nth: access operations
- call: tool integration

Also regenerate priv/ptc_schema.json and update schema test expectations
to reflect the new descriptions with examples.

Fixes the inconsistency where only comparison and aggregation operations
had examples while other operations had plain descriptions.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@andreasronge
Copy link
Copy Markdown
Owner Author

Summary of Changes

Fixed Issue 1: Completed example-based description pattern for all remaining operations.

Changes Made:

  1. Schema descriptions updated (lib/ptc_runner/schema.ex):

    • Added example JSON snippets to 18 operations that had plain descriptions
    • Operations updated: literal, load, var, let, if, and, or, not, merge, concat, zip, select, reject, get, first, last, nth, call
    • Examples follow the established pattern: 'Description. Example: {op:xxx, ...}'
  2. Test expectations updated (test/ptc_runner/schema_test.exs):

    • Updated schema test to expect the new description format with examples
    • Test now validates that 'literal' operation includes the example in its description
  3. Schema file regenerated (priv/ptc_schema.json):

    • Regenerated JSON schema to reflect all description updates
    • Schema synchronization test now passes

Testing:

  • Ran mix precommit successfully (format, compile checks, credo, and all tests pass)
  • All 318 tests pass with 0 failures
  • Schema synchronization test verified

Commit:

All changes committed in a single commit: fix: Complete example-based description pattern for all operations

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

PR Review: fix: Improve LLM schema descriptions and use Haiku 4.5

Summary

This PR follows up on #71 by adding consistent example-based descriptions to comparison and aggregation operations, switching to Claude Haiku 4.5 for e2e tests, and updating the README with model compatibility guidance. The changes are well-motivated and improve LLM usability.

What's Good

  • Example-based descriptions follow a consistent pattern: "Description. Example: {op:'op_name', ...}"
  • Good choice to use Haiku 4.5 - faster and cheaper while still reliable
  • README update appropriately sets expectations about model compatibility with complex schemas
  • Tests pass (318 tests, 0 failures)

Issues (Must Fix)

  1. Incomplete pattern application - lib/ptc_runner/schema.ex:109, 115

    • Problem: The PR adds example-based descriptions to many operations but misses filter and map, which still lack examples:
      • Line 109: "Keep items matching condition. Use with gt/lt/eq in 'where' field" - no example
      • Line 115: "Transform collection elements" - no example
    • Impact: These are core collection operations and key for LLM program generation. Missing examples creates inconsistency in the schema descriptions.
    • Fix: Add examples following the same pattern:
      • filter: "Keep items matching condition. Example: {op:'filter', where:{op:'gt', field:'price', value:10}}"
      • map: "Transform collection elements. Example: {op:'map', expr:{op:'get', path:['name']}}"
    • Complexity: Mechanical (same pattern as rest of PR)
  2. Inconsistent pipe description format - lib/ptc_runner/schema.ex:100-101

    • Problem: The pipe description uses a different format than other operations:
      "Sequence of operations. Steps: [load input, then filter/map/sum/count]. Example: pipe with steps [load, filter, count]"
      
      This doesn't follow the {op:'...'} JSON-like format used everywhere else.
    • Impact: Inconsistent guidance to LLMs may affect generation quality.
    • Fix: Use consistent format, e.g., "Sequence of operations. Example: {op:'pipe', steps:[{op:'load', name:'input'}, {op:'filter', where:{op:'gt', field:'price', value:10}}]}"
    • Complexity: Mechanical

Suggestions (Optional)

None - the scope is appropriate.

Security

No concerns.

Verdict

Request Changes - The PR establishes a pattern of adding example-based descriptions but doesn't apply it consistently to all operations. The filter and map operations are missing examples, and pipe uses a different format. These are mechanical fixes following the pattern already established in the PR.

@github-actions github-actions bot added the auto-triage-pending Auto-triage in progress label Dec 2, 2025
@andreasronge
Copy link
Copy Markdown
Owner Author

@claude please fix these issues from PR review:

Issue 1: Add missing examples for filter and map operations

Location: lib/ptc_runner/schema.ex:109, 115
Problem: The PR adds example-based descriptions to operations but misses filter and map, which still lack examples. These are core collection operations important for LLM program generation.
Fix: Add examples following the established pattern:

  • Line 109 filter: Change to "Keep items matching condition. Example: {op:'filter', where:{op:'gt', field:'price', value:10}}"
  • Line 115 map: Change to "Transform collection elements. Example: {op:'map', expr:{op:'get', path:['name']}}"

Issue 2: Inconsistent pipe description format

Location: lib/ptc_runner/schema.ex:100-101
Problem: The pipe description uses a different format than other operations: "Example: pipe with steps [load, filter, count]" instead of the JSON-like format used everywhere else.
Fix: Use consistent format: "Sequence of operations. Example: {op:'pipe', steps:[{op:'load', name:'input'}, {op:'filter', where:{op:'gt', field:'price', value:10}}]}"

Please fix ALL issues above in a SINGLE commit. After updating schema.ex, regenerate priv/ptc_schema.json by running the schema sync test.

@andreasronge
Copy link
Copy Markdown
Owner Author

Auto-Triage Summary

Decisions Made

# Issue Decision Action
1 Missing examples for filter and map operations FIX_NOW Included in fix request above
2 Inconsistent pipe description format FIX_NOW Included in fix request above

Analysis Details

Issue 1: Missing examples for filter/map

  • Reasoning: IN-SCOPE - The PR's entire purpose is adding example-based descriptions. Missing filter and map is incomplete work from the established pattern.
  • Complexity: Mechanical - applying the same "Description. Example: {op:'xxx', ...}" pattern used throughout the PR
  • Impact: Low risk, completes consistency for core collection operations

Issue 2: Inconsistent pipe description format

  • Reasoning: IN-SCOPE - The PR establishes a consistent JSON-like format for all examples, but pipe uses plain text format
  • Complexity: Mechanical - converting to the same format pattern
  • Impact: Low risk, improves consistency for LLM understanding

Status

  • FIX_NOW items: 2 (batched in single fix comment above)
  • Issues created: 0
  • Items ignored: 0

Next Steps

Waiting for @claude to apply the fixes in the comment above. Once fixes are committed and all checks pass, the PR will be ready to merge.

@github-actions github-actions bot added auto-triage-cycle-2 Auto-triage cycle 2 and removed auto-triage-cycle-1 Auto-triage cycle 1 auto-triage-pending Auto-triage in progress labels Dec 2, 2025
- Add example to filter operation: {op:'filter', where:{op:'gt', field:'price', value:10}}
- Add example to map operation: {op:'map', expr:{op:'get', path:['name']}}
- Standardize pipe description format to JSON example: {op:'pipe', steps:[...]}
- Regenerate priv/ptc_schema.json with updated descriptions

All core collection and control flow operations now have consistent,
example-based descriptions to support LLM program generation.
@andreasronge
Copy link
Copy Markdown
Owner Author

Summary

Fixed all three issues from the PR review:

Changes made:

  1. filter operation (line 109): Added example-based description: "Keep items matching condition. Example: {op:'filter', where:{op:'gt', field:'price', value:10}}"

  2. map operation (line 115): Added example-based description: "Transform collection elements. Example: {op:'map', expr:{op:'get', path:['name']}}"

  3. pipe operation (line 100-101): Standardized description format to JSON example format: "Sequence of operations. Example: {op:'pipe', steps:[{op:'load', name:'input'}, {op:'filter', where:{op:'gt', field:'price', value:10}}]}" (changed from the previous informal format)

Additional work:

  • Regenerated priv/ptc_schema.json by running the schema sync mechanism to ensure the file is up-to-date with the schema definitions

Verification:

  • All tests pass (318 tests, 0 failures)
  • Credo code quality checks pass
  • mix precommit successful

Commit: 4dd7818 - "fix: Complete example-based description pattern for all operations"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

PR Review: fix: Improve LLM schema descriptions and use Haiku 4.5

Summary

This PR completes the example-based description pattern for all DSL operations and updates the LLM model from claude-sonnet-4 to claude-haiku-4.5. All operations now have consistent descriptions with inline examples.

What's Good

  • Consistent pattern: All 33 operations now follow the "Description. Example: {op:'name', ...}" format
  • Practical examples: Examples demonstrate actual field usage (e.g., {op:'filter', where:{op:'gt', field:'price', value:10}})
  • Model choice: Haiku 4.5 is a reasonable choice for faster/cheaper E2E tests while maintaining reliability
  • README update: Added helpful model compatibility section explaining nested anyOf schema requirements
  • Synchronization: Both schema.ex and priv/ptc_schema.json are properly updated
  • Test update: Schema test updated to reflect new description format

Issues (Must Fix)

None. The PR is well-scoped and consistently applies the description pattern across all operations.

Suggestions (Optional)

  1. Minor inconsistency in description style - Nice-to-have

    • Observation: Some descriptions use shorthand (Field >= value, Field <= value) while others use full phrases (Field greater than value, Field less than value)
    • Benefit: Perfect consistency in description style
    • Complexity: Trivial (word changes)
    • Recommendation: Not blocking - the examples are the important part for LLM guidance
  2. README model recommendation - Nice-to-have

    • Suggestion: The "May have issues" section is vague ("Some models may not enforce...")
    • Benefit: More actionable guidance for users
    • Complexity: Low (documentation only)
    • Recommendation: Could be deferred to when specific problematic models are identified

Security

No concerns. Changes are limited to operation descriptions and test configuration.

Verdict

Approve - This PR successfully completes the example-based description pattern established in previous work. All operations are consistently updated, tests pass, and the model change is reasonable. Ready to merge.

@github-actions github-actions bot added the auto-triage-pending Auto-triage in progress label Dec 2, 2025
@andreasronge
Copy link
Copy Markdown
Owner Author

@claude please fix these issues from PR review:

Issue 1: Inconsistent description style in comparison operations

Location: lib/ptc_runner/schema.ex:161, 175
Problem: The gte and lte operations use shorthand symbols (>= and <=) in their descriptions, while gt and lt use full phrases ("greater than", "less than"). This creates inconsistency in description style.
Fix: Change to full phrases for consistency:

  • Line 161: Change "Field >= value" to "Field greater than or equal to value"
  • Line 175: Change "Field <= value" to "Field less than or equal to value"

Please fix in a SINGLE commit. Make minimal changes to resolve the inconsistency.

@andreasronge
Copy link
Copy Markdown
Owner Author

Auto-Triage Summary (Cycle 2 of 3)

Decisions Made

# Issue Decision Action
1 Minor inconsistency in description style (gte/lte use symbols, gt/lt use words) FIX_NOW Included in fix request above
2 README model recommendation vagueness IGNORE Out of PR scope and current text is adequate

Analysis Details

Issue 1: Description style inconsistency

  • Reasoning: IN-SCOPE - The PR improves LLM schema descriptions. Having gte/lte use symbols (>=, <=) while gt/lt use words ("greater than", "less than") is inconsistent.
  • Complexity: Mechanical - just replacing "Field >= value" with "Field greater than or equal to value" (and same for lte)
  • Impact: Low risk, improves consistency across comparison operations

Issue 2: README model recommendation vagueness

  • Reasoning: OUT-OF-SCOPE - The PR is about improving schema operation descriptions in schema.ex, not about README documentation quality
  • Complexity: Complex - Would require testing multiple models to identify specific problematic ones
  • Value: Current README text ("Some models may not enforce required fields...") is informative without being misleading. More specificity would require actual testing evidence.
  • Decision: The current documentation is adequate for now

Status

  • FIX_NOW items: 1 (batched in single fix comment above)
  • Issues created: 0
  • Items ignored: 1 (with reasoning documented above)

Next Steps

Waiting for @claude to apply the fix in the comment above. Once the fix is committed and all checks pass, the PR will be ready to merge.

@github-actions github-actions bot added auto-triage-cycle-3 Auto-triage cycle 3 (max) and removed auto-triage-cycle-2 Auto-triage cycle 2 auto-triage-pending Auto-triage in progress labels Dec 2, 2025
@andreasronge
Copy link
Copy Markdown
Owner Author

Summary

Fixed the inconsistent description style in comparison operations:

Changes Made

  • Line 161 (schema.ex): Changed gte description from 'Field >= value' to 'Field greater than or equal to value'
  • Line 175 (schema.ex): Changed lte description from 'Field <= value' to 'Field less than or equal to value'
  • Lines 127 & 420 (priv/ptc_schema.json): Updated the corresponding descriptions in the generated schema file

Verification

  • All tests pass (318 tests, 0 failures)
  • Schema file synchronized with generated schema
  • Code formatted and compiled without warnings

Commit

Committed as: fix: Use full phrases for gte and lte comparison operation descriptions

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

PR Review: fix: Improve LLM schema descriptions and use Haiku 4.5

Summary

Follow-up PR to #71 that completes the pattern of adding example-based descriptions to all operations and updates the LLM model from Sonnet 4 to Haiku 4.5. The PR is well-scoped and delivers on its stated goals.

What's Good

  • Pattern completion: All 27 operations now have consistent "Example: {op:'...', ...}" descriptions. This addresses the review feedback from fix: Use anyOf for nested expressions in LLM schema #71 about inconsistent descriptions.
  • Model pragmatism: Switching from Sonnet 4 to Haiku 4.5 is sensible - faster, cheaper, and the PR confirms it handles the schema reliably.
  • Documentation update: README now includes a "Model Compatibility" section with clear recommendations, replacing hardcoded example descriptions that could drift.
  • Test synchronization: The test at test/ptc_runner/schema_test.exs:476-490 ensures priv/ptc_schema.json stays synchronized with the generated schema.
  • All tests pass: 318 tests, 6 e2e tests (with Haiku 4.5), formatting, and compilation with warnings-as-errors all pass.

Issues (Must Fix)

None identified. The PR scope is well-defined and complete.

Suggestions (Optional)

  1. Minor inconsistency in description style - Nice-to-have
    • Observation: The sum operation uses "Sum numeric field" while count uses "Count items in collection" - slightly different structures.
    • Benefit: More uniform documentation
    • Complexity: Trivial - single word change
    • Recommendation: Not blocking - the current style is clear enough

Security

No concerns. The changes are limited to documentation strings and a model configuration change.

Verdict

Approve - The PR completes the description pattern established in #71, adds useful model compatibility documentation, and all quality checks pass. Ready to merge.

@github-actions github-actions bot added the needs-human-review Requires human review - automation paused label Dec 2, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2025

Auto-Triage: Max Cycles Reached

This PR has gone through 3 triage cycles (review → triage → fix → review...).

What this means:

  • The automated system keeps finding new issues after fixes
  • To prevent infinite loops, no further automated triage will run

Next steps for a human:

  1. Review the PR and any outstanding issues
  2. Make manual fixes if needed
  3. Remove the needs-human-review label to re-enable automation (if desired)

This is an automated message from the auto-triage workflow.

@andreasronge andreasronge merged commit b7a1fd5 into main Dec 2, 2025
2 of 3 checks passed
@andreasronge andreasronge deleted the fix/llm-schema-improvements branch December 2, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-triage-complete Auto-triage completed auto-triage-cycle-3 Auto-triage cycle 3 (max) needs-human-review Requires human review - automation paused

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant