Skip to content

[CORE-16282] json: make the schema walk keyword-aware#30865

Open
mnajda-redpanda wants to merge 1 commit into
redpanda-data:devfrom
mnajda-redpanda:mnajda/json-keyword-aware-walk
Open

[CORE-16282] json: make the schema walk keyword-aware#30865
mnajda-redpanda wants to merge 1 commit into
redpanda-data:devfrom
mnajda-redpanda:mnajda/json-keyword-aware-walk

Conversation

@mnajda-redpanda

@mnajda-redpanda mnajda-redpanda commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

The bundled-schema walk descended into every nested object, so a key matching a schema keyword (e.g. a property named "id" under draft-04, or "$id"/"$ref" elsewhere) was misread as that keyword and the schema was rejected.

Classify keywords by where subschemas appear and only descend into genuine subschema positions, gated by dialect, so keys under "properties", "$defs", etc. are treated as names.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

The bundled-schema walk descended into every nested object, so a key
matching a schema keyword (e.g. a property named "id" under draft-04,
or "$id"/"$ref" elsewhere) was misread as that keyword and the schema
was rejected.

Classify keywords by where subschemas appear and only descend into
genuine subschema positions, gated by dialect, so keys under
"properties", "$defs", etc. are treated as names.
@mnajda-redpanda mnajda-redpanda force-pushed the mnajda/json-keyword-aware-walk branch from 14dccc9 to b5ac365 Compare June 22, 2026 15:38
@mnajda-redpanda mnajda-redpanda marked this pull request as ready for review June 22, 2026 16:09
@mnajda-redpanda mnajda-redpanda requested review from a team, bartoszpiekny-redpanda, Copilot, dotnwat, nguyen-andrew and pgellert and removed request for a team June 22, 2026 16:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes schema “bundling” traversal in pandaproxy’s JSON Schema handling by making the schema walk keyword-aware (and dialect-aware), so object member names like id, $id, or $ref inside properties, $defs, etc. are not misinterpreted as schema keywords.

Changes:

  • Introduces keyword classification to only descend into true subschema-bearing positions (map vs schema vs none), gated by dialect.
  • Refactors the bundled-schema collection walk to use the keyword classification rather than descending into all nested objects/arrays.
  • Expands unit tests to cover keyword-named properties, dialect gating, and array-position traversal cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/v/pandaproxy/schema_registry/json.cc Adds dialect-gated keyword classification and uses it to restrict schema descent during bundled-schema collection/ref fixing.
src/v/pandaproxy/schema_registry/test/test_json_schema.cc Adds/adjusts test cases for keyword-named properties, dialect gating, and bundled-schema validation behavior.

Comment on lines 159 to 161
"$comment": "dialect is unkown",
"$id": "https://example.com/mismatch_id",
"$schema": "http://json-schema.org/draft-3000/schema#"
Comment on lines +340 to +341
// draft-06 also uses "id" (not "$id"), so it is affected the same way
R"json(
Comment on lines +2306 to +2309
classify_keyword(json_schema_dialect dialect, std::string_view key) {
using enum subschema_position;
using enum json_schema_dialect;
constexpr auto specs = std::to_array<keyword_spec>({
@vbotbuildovich

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#86101
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkTopicFailoverTests test_producer_ids_failover {"storage_mode": "tiered"} integration https://buildkite.com/redpanda/redpanda/builds/86101#019ef022-129c-4cde-8558-84712dbf1e15 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0032, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkTopicFailoverTests&test_method=test_producer_ids_failover

@pgellert pgellert left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants