Skip to content

Conversation

@dmarcotte
Copy link
Contributor

@dmarcotte dmarcotte commented Jun 20, 2025

This pull request implements a (near) full Json Schema parser and validator into Kson. This only "(near) full" because:

  • we do not yet support fetching and loading $refs
  • we do not yet expose this ability to any tooling

We now support specifying a Json Schema to validate against as part of a Kson compile. Simple example:

val parseResult = Kson.parseToAst(
    """
        "this string is longer than 10 characters"
    """,
    CoreCompileConfig(schemaJson = """
        {"type": "string", "maxLength": 10 }
    """)
)
// `parseResult` contains an error for the given string's
// location that notes "String length must be <= 10"

Note that support for other Json Schema drafts is planned. We started with Draft 7 since there are many signals that it is the most widely used, and we are now well-positioned to expand support to other drafts. The fidelity of our implementation is confirmed against the excellent test suite provided by JSON-Schema-Test-Suite, which we integrated in #77

Note also that I've been using this particular project as something of a test for LLM coding agents—as a well-specified and well-scoped non-trivial project with an extensive test suite, it seemed ideal as a "real" project to test agentic coding tools. I was not yet able to coax any agentic coding systems into fully implementing this, but they (primary Claude 4) finally came close recently and were very helpful here. For more color on this story, the individual commits document some of the saga.

Comments from the source JSON-Schema-Test-Suite tests were being
inserted into the wrong place in our generated tests (into the schema
Json of the test, making invalid Json).  Fix by rendering them over the
assertion.
Given the KsonApi, the TODO about Json Schema support, and access to
running our `SchemaSuiteTest`, Claude 4 was able to design and implement
this reasonable-looking approach to adding Json Schema support to Kson.

Claude initially set up an object model for various types of schema
it wanted to implement.  When that was in place, I repeatedly prompted
it with the following until it had fixed all the tests:

```
Can you help me get some of these failing tests passing?
`./gradlew jvmTest --tests "org.kson.parser.json.generated.SchemaSuiteTest"`
```

It did a very impressive job of inspecting the failures, categorizing
them, picking a high-value pattern to address, modifying the code to
address that, then reporting back "X tests were failing, now only Y
tests are failing" (Note: I interrupted it at one point to note that
we were deferring support for `$ref`, so any test related to refs should
be left on the SchemaSuite exclude list)

This solution is ultimately the wrong object model (validations should
not be directly associated with types, for instance), and due to this,
hid some gaps in spite of getting the tests passing.  And there is a
LOT of code duplication here.  Regardless: impressive first pass, and
we're doing to refactor from there to a robust solution in subsequent
commits.
Refactor the errors produced by the new schema validation subsystem
to integrate with our parser's MessageSink strategy, properly logging
the errors found as full `LoggedMessages` with location data embedded
The initial agent-assisted implementation of our Json Schema validation
code not only created a complicated monolithic approach to comparing
`KsonValue` objects for equality, but it also duplicated that approach a
number of times.  Greatly improve things by refactoring into proper
equals and hashcode methods on our `KsonValue` classes.
More work towards reshaping the agent-generated JsonSchema support
into something we fully own, understand and endorse.  This commit
fixes a bunch more duplication and provides very good insight into not
only how this implementation actually works, but also how JsonSchema
itself works (and where this impl diverges from that)

The new JsonSchemaTest class demonstrate some of the gaps found.
This commit addresses the problem in `JsonSchemaNumberTest`, but
leaves `JsonSchemaTestType` failing to be fixed more holistically
with a new object model for the validation subsystem
Fully implement of our improved Json Schema support design (modulo
a few todos around interface polish and error messaging).  This commit
is larger that I usually like, but it is focused and mostly represents
a refactor, so I won't artificially break it up.

We now organize around schema validations rather than schema
types.  The schema `type` checker becomes just another validation which
runs up front, and if it passes, then all all other validations that
apply to the current types will be applied.  This mirrors the actual
Json Schema design much better, solving all problems detected in the
previous implementation, including all the code duplication.

SchemaParser now tries to parse every supported schema element exactly
once, creates the appropriate validation classes, and organizes the
validations into a JsonSchema class that can perform the parsed
validations.

The clarity and simplicity this brings to the implementation is very
validating for this approach, and we are now well positioned to
maintain this going forward.
Progress on making refined, helpful, and localizable error message for
Json Schema validation
Claude 4 helped replace the placeholder SCHEMA_VALIDATION_ERROR messages
with more refined, helpful, and localizable messages.  It got cranky at
the end and started making some unrelated changes and deleting some
other MessageType declarations that were being used elsewhere, but
mostly it was super helpful and understood the design goals here.

Prompt: %%
  Everywhere in the @Schema dir that uses SCHEMA_VALIDATION_ERROR is a
  place where we would like to make more refined, localizable.  Note how
  MessageType is organized to be parametrized in such a way that when we
  localize these messages, it will be trivial, as the only dynamic part of
  the messages are params that don't need to be localized.  Do you see
  what I mean? Would you be able to create new MessageTypes to replace
  the too-informal "SCHEMA_VALIDATION_ERROR.create() messages?  Please
  have a look at how the other MessageTypes are implemented for
  inspiration on how to implement your new messages.
%%
Rename schema `node` params to `ksonValue`: Schemas validate KsonValues,
but some of our props/params were called `node`, which, sure, but
`ksonValue` is much more descriptive here.

Also add/improve some docs around our Json Schema implementation while
I'm thinking about code clarity
@dmarcotte dmarcotte marked this pull request as ready for review June 20, 2025 01:03
@dmarcotte dmarcotte merged commit 54d9763 into kson-org:main Jun 20, 2025
1 check passed
@dmarcotte dmarcotte deleted the kson-json-schema branch June 20, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant