Skip to content

pandaproxy/sr: add global_context compatibility fallback#29713

Merged
nguyen-andrew merged 7 commits into
redpanda-data:devfrom
nguyen-andrew:sr/global-context-config
Mar 11, 2026
Merged

pandaproxy/sr: add global_context compatibility fallback#29713
nguyen-andrew merged 7 commits into
redpanda-data:devfrom
nguyen-andrew:sr/global-context-config

Conversation

@nguyen-andrew

@nguyen-andrew nguyen-andrew commented Feb 26, 2026

Copy link
Copy Markdown
Member

Previously, the schema registry's GET /config and GET /config/{subject} endpoints did not support the global context (.__GLOBAL) fallback tier. When no context-level compatibility config was set, lookups returned the hardcoded default directly, ignoring any config set on the global context. This diverges from the reference implementation's fallback hierarchy, where .__GLOBAL sits between per-context config and the hardcoded default.

This PR implements the full multi-tier fallback chain. With the defaultToGlobal query parameter enabled:

  • default_context: default → global → hardcoded
  • non-default context: context → global → hardcoded
  • global_context: global → hardcoded
  • subject in default_context: subject → default → global → hardcoded
  • subject in non-default: subject → context → global → hardcoded
  • subject in global_context: subject → global → hardcoded

With the defaultToGlobal query parameter disabled, only the default context and the global context resolve to the hardcoded default; all other lookups return compatibility_not_found.

GET /config and GET /config/{subject} now consult the global context (.__GLOBAL) when defaultToGlobal=true and no context-level config is set. Users who have set compatibility on .__GLOBAL will see it reflected in lookups from other contexts.

Part of CORE-15192.

Additionally, this change also fixes a bug where attempting to retrieve a config for non-existent subjects immediately returned a not found error (regardless of the defaultToGlobal parameter), which did not match the reference implementation.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

Bug Fixes

  • Compatibility lookups on non-existent subjects can now fall through to context-level resolution with defaultToGlobal=true instead of always returning an error.

@nguyen-andrew nguyen-andrew self-assigned this Feb 26, 2026
Copilot AI review requested due to automatic review settings February 26, 2026 05:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Schema Registry compatibility lookups to include the .__GLOBAL context as an intermediate fallback tier (behind the defaultToGlobal query parameter), aligning the resolution hierarchy with the reference implementation.

Changes:

  • Implement multi-tier compatibility fallback for context-level and subject-level lookups, including .__GLOBAL as an intermediate fallback when defaultToGlobal=true.
  • Update store/sharded-store APIs and add extensive unit tests covering fallback behavior across default, non-default, and global contexts.
  • (Stacked) Add reserved subject/context validation (__GLOBAL, __EMPTY, .__GLOBAL) with a new subject_invalid error mapping and corresponding tests.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/rptest/tests/schema_registry_test.py Adds an rptest covering reserved subject/context rejection behavior.
src/v/pandaproxy/schema_registry/types.h Introduces global_context and default_top_level_compat; declares subject/context validation helper.
src/v/pandaproxy/schema_registry/types.cc Implements validate_context_subject() throwing subject_invalid.
src/v/pandaproxy/schema_registry/store.h Reworks compatibility resolution to support default_to_global and consult global_context appropriately.
src/v/pandaproxy/schema_registry/sharded_store.h Updates get_compatibility(context, ...) signature to include fallback flag.
src/v/pandaproxy/schema_registry/sharded_store.cc Wires fallback-aware context compatibility through to the underlying store.
src/v/pandaproxy/schema_registry/handlers.cc Plumbs defaultToGlobal into GET /config and context-only GET /config/{subject} paths; adds reserved-name validation calls.
src/v/pandaproxy/schema_registry/seq_writer.cc Updates internal compatibility reads to use the new fallback-aware context API.
src/v/pandaproxy/schema_registry/errors.h Adds compatibility_not_found(context) and subject_invalid(...) helpers.
src/v/pandaproxy/schema_registry/error.h / error.cc Adds subject_invalid to SR error codes and maps it to reply error codes.
src/v/pandaproxy/error.h / error.cc Adds/mappings for reply error code 42208 (subject_invalid).
src/v/pandaproxy/schema_registry/test/sharded_store.cc Adds comprehensive sharded-store tests validating fallback chains across contexts/subjects.
src/v/pandaproxy/schema_registry/test/store.cc Updates store tests for the new fallback-aware context compatibility API; removes an invalid/obsolete test.
src/v/pandaproxy/schema_registry/test/consume_to_store.cc Updates test to call the new context compatibility API.
src/v/pandaproxy/schema_registry/test/context_subject.cc Adds unit tests for reserved subject/context validation and error code correctness.

Comment on lines +316 to +324
auto fallback = parse::query_param<std::optional<default_to_global>>(
*rq.req, "defaultToGlobal")
.value_or(default_to_global::no);

// Ensure we see latest writes
co_await rq.service().writer().read_sync();

auto res = co_await rq.service().schema_store().get_compatibility(
default_context);
default_context, fallback);

Copilot AI Feb 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new global_context fallback behavior is exposed via the defaultToGlobal query param on GET /config and GET /config/{subject}, but there is no rptest/integration coverage asserting that setting compatibility on :.__GLOBAL: is reflected when defaultToGlobal=true (and not reflected when it is absent/false). Adding an integration test would help catch regressions in the handler/query-param wiring (as opposed to only store-level resolution logic).

Copilot uses AI. Check for mistakes.
@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch from e3c7db8 to 1c15ab3 Compare February 26, 2026 05:24
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force push to rebase against latest base

@vbotbuildovich

vbotbuildovich commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator

Retry command for Build#81116

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/audit_log_test.py::AuditLogTestSchemaRegistryACLs.test_sr_audit_context_config_authz@{"audit_transport_mode":"kclient"}
tests/rptest/tests/audit_log_test.py::AuditLogTestSchemaRegistryACLs.test_sr_audit_context_config_authz@{"audit_transport_mode":"rpc"}

@vbotbuildovich

vbotbuildovich commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#81116
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
AuditLogTestSchemaRegistryACLs test_sr_audit_context_config_authz {"audit_transport_mode": "kclient"} integration https://buildkite.com/redpanda/redpanda/builds/81116#019c9877-daa5-4104-8eb7-7fd21ab0f543 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AuditLogTestSchemaRegistryACLs&test_method=test_sr_audit_context_config_authz
AuditLogTestSchemaRegistryACLs test_sr_audit_context_config_authz {"audit_transport_mode": "kclient"} integration https://buildkite.com/redpanda/redpanda/builds/81116#019c987b-bc91-48c8-8328-c287cf59389e FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AuditLogTestSchemaRegistryACLs&test_method=test_sr_audit_context_config_authz
AuditLogTestSchemaRegistryACLs test_sr_audit_context_config_authz {"audit_transport_mode": "rpc"} integration https://buildkite.com/redpanda/redpanda/builds/81116#019c9877-daa6-4794-beaf-bd3f133d959d FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AuditLogTestSchemaRegistryACLs&test_method=test_sr_audit_context_config_authz
AuditLogTestSchemaRegistryACLs test_sr_audit_context_config_authz {"audit_transport_mode": "rpc"} integration https://buildkite.com/redpanda/redpanda/builds/81116#019c987b-bc92-49d6-a2cc-44456788a5a4 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AuditLogTestSchemaRegistryACLs&test_method=test_sr_audit_context_config_authz
ScalingUpTest test_moves_with_local_retention {"use_topic_property": false} integration https://buildkite.com/redpanda/redpanda/builds/81116#019c9877-daa4-4e18-8145-5ce01420e84a FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0063, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_moves_with_local_retention
test results on build#81192
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/81192#019c9c9f-b55e-4897-89d0-6e974debb9fe FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
RpkDebugBundleTest test_debug_bundle null integration https://buildkite.com/redpanda/redpanda/builds/81192#019c9ca2-5a01-462c-9cca-310c9a80c695 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RpkDebugBundleTest&test_method=test_debug_bundle
SchemaRegistryContextTest test_reserved_subject_names_rejected null integration https://buildkite.com/redpanda/redpanda/builds/81192#019c9c9f-b55b-4b13-bc9b-d5ba39aac72f FAIL 0/11 The test was found to be new, and no failures are allowed https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=SchemaRegistryContextTest&test_method=test_reserved_subject_names_rejected
SchemaRegistryContextTest test_reserved_subject_names_rejected null integration https://buildkite.com/redpanda/redpanda/builds/81192#019c9ca2-5a00-40d2-9db6-1db904f6a844 FAIL 0/11 The test was found to be new, and no failures are allowed https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=SchemaRegistryContextTest&test_method=test_reserved_subject_names_rejected
test results on build#81202
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
CloudTopicsL0GCNodeFailureTest test_node_failure_mid_gc {"cloud_storage_type": 2} integration https://buildkite.com/redpanda/redpanda/builds/81202#019c9d1e-69e7-4e6d-88af-df58147c805f FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0091, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=CloudTopicsL0GCNodeFailureTest&test_method=test_node_failure_mid_gc
test results on build#81221
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/81221#019c9fd5-4297-4986-a0b6-0e5572f7f0ed FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0013, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
test results on build#81274
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
CloudTopicsL0GCNodeFailureTest test_node_failure_mid_gc {"cloud_storage_type": 2} integration https://buildkite.com/redpanda/redpanda/builds/81274#019ca263-699e-48d4-89c1-2a6a609a85e3 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0058, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=CloudTopicsL0GCNodeFailureTest&test_method=test_node_failure_mid_gc
test results on build#81336
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
VerifyConsumerOffsetsThruUpgrades test_consumer_group_offsets {"versions_to_upgrade": 1} integration https://buildkite.com/redpanda/redpanda/builds/81336#019cb243-a185-49b0-8df3-bf104e71d29e FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0041, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=VerifyConsumerOffsetsThruUpgrades&test_method=test_consumer_group_offsets
test results on build#81512
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
NodesDecommissioningTest test_flipping_decommission_recommission {"cloud_topic": true, "node_is_alive": true} integration https://buildkite.com/redpanda/redpanda/builds/81512#019cd45f-25e8-476d-bbb9-fe819e86257a FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_flipping_decommission_recommission

@pgellert pgellert left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far

Comment thread src/v/pandaproxy/schema_registry/test/store.cc
Comment thread src/v/pandaproxy/schema_registry/store.h Outdated
Comment thread src/v/pandaproxy/schema_registry/store.h Outdated
@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch 2 times, most recently from 9b807fb to 19012db Compare February 27, 2026 00:40
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force pushes:

  • 1: Needed to tweak test_sr_audit_context_config_authz test to use fallback.
  • 2: Rebase on latest base

@vbotbuildovich

vbotbuildovich commented Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Retry command for Build#81192

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/random_node_operations_smoke_test.py::RedpandaNodeOperationsSmokeTest.test_node_ops_smoke_test@{"cloud_storage_type":1,"mixed_versions":false}
tests/rptest/tests/schema_registry_test.py::SchemaRegistryContextTest.test_reserved_subject_names_rejected

@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch from 19012db to c749564 Compare February 27, 2026 02:55
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force push to condense fallback code.

@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch from c749564 to d273045 Compare February 27, 2026 03:12
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force push to rebase on latest base.

@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force push to add back test_store_invalid_subject_compat but change it to use pps::default_to_global::no to preserve the original intended test behavior and coverage. This reflects the behavior expected by seq_writer when calling _store.get_compatibility(sub, default_to_global::no);

@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch from 6616407 to 8fdbe6a Compare February 28, 2026 03:33
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force push to rebase on base.

Comment thread tests/rptest/tests/audit_log_test.py
Comment thread src/v/pandaproxy/schema_registry/store.h
Comment thread src/v/pandaproxy/schema_registry/store.h
Comment thread src/v/pandaproxy/schema_registry/test/store.cc
@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch 2 times, most recently from 933631f to 6e0d3ef Compare March 3, 2026 05:31
@nguyen-andrew

nguyen-andrew commented Mar 3, 2026

Copy link
Copy Markdown
Member Author

Force pushes:
1. Rebase against latest dev.
2. Incorporated PR comment suggestions & restructured commits to better separate the changes into 3 distinct sections:

  • Implementing context config fallback resolution (this was missing before this PR)
  • Fixing bug in subject config fallback resolution (this was previously not aligned with the reference implementation).
  • Adding :.__GLOBAL: context to fallback chain

The resulting code is roughly the same as before these force pushes; unit tests were moved around for better organization. It'd likely be easier for the reviewer to review the new series of commits instead of looking at the changes between force pushes.

@nguyen-andrew nguyen-andrew requested a review from pgellert March 3, 2026 05:44
pgellert
pgellert previously approved these changes Mar 4, 2026

@pgellert pgellert left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for bearing with all the change requests!

@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force pushes:
1 - rebase to stack PR on top of 29768.
2 - adding global context/defaultToGlobal support to /mode & /mode/{subject}

@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch from 6a8e0c5 to 6d30353 Compare March 9, 2026 20:35
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force push for formatting

@pgellert pgellert left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to update this with a rebase, but otherwise this looks good to go

Comment thread src/v/pandaproxy/schema_registry/handlers.cc
Wires the `default_to_global` parameter into the context-only overload
of `get_compatibility` for consistency with the subject-based overload,
and threads the `defaultToGlobal` query param through to context-only
lookups. The parameter itself is unused for now and the fallback logic
is left as a TODO for a follow-up commit.
Replace the placeholder get_compatibility(context) implementation with
proper fallback semantics: non-default contexts return an error when no
config is set and fallback is disabled, while the default context always
resolves to the hardcoded default. Adds sharded_store tests covering all
context/fallback combinations and updates audit_log_test accordingly.
The reference implementation does not require a subject to exist
before its compatibility config can be queried. Fix the fallback
path so a missing subject falls through to context-level resolution
instead of immediately returning compatibility_not_found. Add
sharded_store tests covering subject config resolution in both
default and non-default contexts with and without fallback enabled.
The reference implementation supports a special global_context
(.__GLOBAL) that sits at the top of the compatibility fallback
hierarchy. Redpanda's get_compatibility had no awareness of this
context: when no context-level config was set, both the context
and subject overloads fell through to the hardcoded default
directly, skipping global_context entirely.

Add global_context as a fallback tier so that the resolution
chains become:

  With fallback enabled:
    context:  context config → global config → hardcoded default
    subject:  subject config → context config → global → hardcoded

  global_context itself always resolves to its own config or the
  hardcoded default regardless of the fallback flag, and subjects
  within global_context always fall through to global_context
  resolution.

Extend sharded_store tests to cover all context type (default,
non-default, global) * subject presence * fallback flag
combinations, verifying that each tier in the chain is consulted
in order and that clearing a tier correctly exposes the next one.
Wire the `default_to_global` parameter into the context-only overload
of `get_mode` for consistency with the subject-based overload,
and thread the `defaultToGlobal` query param through to context-only
lookups. The parameter itself is unused for now and the fallback logic
is left for a follow-up commit.
Replace the placeholder get_mode(context) implementation with
proper fallback semantics: non-default contexts return an error when no
mode is set and fallback is disabled, while the default context always
resolves to the hardcoded default. Adds sharded_store tests covering all
context/fallback combinations for both context-level and subject-level
mode lookups.
The reference implementation supports a special global_context
(.__GLOBAL) that sits at the top of the mode fallback
hierarchy. Redpanda's get_mode had no awareness of this
context: when no context-level mode was set, both the context
and subject overloads fell through to the hardcoded default
directly, skipping global_context entirely.

Add global_context as a fallback tier so that the resolution
chains become:

  With fallback enabled:
    context:  context mode → global mode → hardcoded default
    subject:  subject mode → context mode → global → hardcoded

  global_context itself always resolves to its own mode or the
  hardcoded default regardless of the fallback flag, and subjects
  within global_context always fall through to global_context
  resolution.

Extend sharded_store tests to cover all context type (default,
non-default, global) * subject presence * fallback flag
combinations, verifying that each tier in the chain is consulted
in order and that clearing a tier correctly exposes the next one.
@nguyen-andrew nguyen-andrew force-pushed the sr/global-context-config branch from 6d30353 to 304348c Compare March 10, 2026 14:15
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force push to rebase on latest dev

@nguyen-andrew nguyen-andrew requested a review from pgellert March 10, 2026 15:34
@nguyen-andrew nguyen-andrew merged commit cb2647a into redpanda-data:dev Mar 11, 2026
22 checks passed
@nguyen-andrew nguyen-andrew deleted the sr/global-context-config branch March 11, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants