[CORE-16282] json: make the schema walk keyword-aware#30865
Open
mnajda-redpanda wants to merge 1 commit into
Open
[CORE-16282] json: make the schema walk keyword-aware#30865mnajda-redpanda wants to merge 1 commit into
mnajda-redpanda wants to merge 1 commit into
Conversation
The bundled-schema walk descended into every nested object, so a key matching a schema keyword (e.g. a property named "id" under draft-04, or "$id"/"$ref" elsewhere) was misread as that keyword and the schema was rejected. Classify keywords by where subschemas appear and only descend into genuine subschema positions, gated by dialect, so keys under "properties", "$defs", etc. are treated as names.
14dccc9 to
b5ac365
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes schema “bundling” traversal in pandaproxy’s JSON Schema handling by making the schema walk keyword-aware (and dialect-aware), so object member names like id, $id, or $ref inside properties, $defs, etc. are not misinterpreted as schema keywords.
Changes:
- Introduces keyword classification to only descend into true subschema-bearing positions (map vs schema vs none), gated by dialect.
- Refactors the bundled-schema collection walk to use the keyword classification rather than descending into all nested objects/arrays.
- Expands unit tests to cover keyword-named properties, dialect gating, and array-position traversal cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/v/pandaproxy/schema_registry/json.cc |
Adds dialect-gated keyword classification and uses it to restrict schema descent during bundled-schema collection/ref fixing. |
src/v/pandaproxy/schema_registry/test/test_json_schema.cc |
Adds/adjusts test cases for keyword-named properties, dialect gating, and bundled-schema validation behavior. |
Comment on lines
159
to
161
| "$comment": "dialect is unkown", | ||
| "$id": "https://example.com/mismatch_id", | ||
| "$schema": "http://json-schema.org/draft-3000/schema#" |
Comment on lines
+340
to
+341
| // draft-06 also uses "id" (not "$id"), so it is affected the same way | ||
| R"json( |
Comment on lines
+2306
to
+2309
| classify_keyword(json_schema_dialect dialect, std::string_view key) { | ||
| using enum subschema_position; | ||
| using enum json_schema_dialect; | ||
| constexpr auto specs = std::to_array<keyword_spec>({ |
Collaborator
CI test resultstest results on build#86101
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bundled-schema walk descended into every nested object, so a key matching a schema keyword (e.g. a property named "id" under draft-04, or "$id"/"$ref" elsewhere) was misread as that keyword and the schema was rejected.
Classify keywords by where subschemas appear and only descend into genuine subschema positions, gated by dialect, so keys under "properties", "$defs", etc. are treated as names.
Backports Required
Release Notes