kafka: add a reserved Redpanda-specific Kafka API-key range by nguyen-andrew · Pull Request #30731 · redpanda-data/redpanda

nguyen-andrew · 2026-06-06T03:35:01Z

Shadow linking needs to read the roles on its source cluster. Those roles are
normally read over the Admin API, but some deployment modes do not expose it,
leaving the Kafka API as the only available path to the source cluster. Reading
roles over that path requires a Redpanda-specific Kafka API, which did not exist
before.

Apache assigns Kafka API keys sequentially and is still under 100 today, so
placing Redpanda's range well above that keeps it clear of future standard
assignments and of keys other Kafka implementations may already use. This PR
reserves a Redpanda range starting at key 15000 and makes request dispatch,
flex-version (tagged-field) parsing, and per-key metrics aware of it, without
bloating the standard-range data structures. The plumbing is kept separate from
the standard tables so max_api_key() continues to derive only from standard
Apache keys.

DescribeRedpandaRoles (key 15000) is the first API in the range and the one
shadow linking needs; here it serves as the proving example and simply returns
an empty role list. The API is recognized and dispatched internally but is
intentionally not advertised in ApiVersions yet, so it is not externally
discoverable until it returns real data. That, along with the role_store wiring
and enumeration, lands in the stacked follow-up PR.

This is the base of a two-PR stack; the follow-up (describe-redpanda-roles-api)
wires the cluster role_store into the handler, returns real role data with
name filters, and advertises the API in ApiVersions as its final step.

Backports Required

Release Notes

none

Copilot

Pull request overview

Introduces a reserved Redpanda-specific Kafka API key range (base key 15000) and the supporting plumbing so these keys can be dispatched, parsed (flex/tagged fields), and measured (per-key metrics/probes) without inflating the standard “dense” key-indexed tables. Adds DescribeRedpandaRoles (15000) as the first custom API, with wire schema + round-trip tests and a stub handler that enforces cluster DESCRIBE authorization and returns an empty role list.

Changes:

Add DescribeRedpandaRoles protocol schemata, generated message types, and encode/decode round-trip tests.
Extend handler dispatch, flex-version lookup, and handler probe/metrics plumbing to support a rebased custom-key table for the reserved range.
Fix throughput-controlled API-key bitmap indexing so out-of-range keys don’t throw and disconnect clients.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/v/kafka/server/tests/handler_probe_test.cc	Adds a regression test ensuring reserved-range keys map to custom probe storage without OOB access.
src/v/kafka/server/tests/handler_interface_test.cc	Adds tests for reserved-range dispatch and flex-version/schema recognition.
src/v/kafka/server/tests/BUILD	Registers new server-side unit tests and adds needed deps.
src/v/kafka/server/handlers/handlers.h	Adds a separate `redpanda_request_types` handler type-list for reserved-range APIs.
src/v/kafka/server/handlers/handler_probe.h	Adds `_custom_probes` storage for reserved-range per-handler probes.
src/v/kafka/server/handlers/handler_probe.cc	Initializes and routes reserved-range probes via rebased offsets.
src/v/kafka/server/handlers/handler_interface.cc	Adds reserved-range dispatch LUT (`make_custom_lut`) and lookup path in `handler_for_key`.
src/v/kafka/server/handlers/describe_redpanda_roles.h	Declares the new `DescribeRedpandaRoles` handler type.
src/v/kafka/server/handlers/describe_redpanda_roles.cc	Implements the handler with authorization + audit failure handling and empty response.
src/v/kafka/server/connection_context.cc	Bounds-checkes throughput-controlled API-key bitmap accesses to avoid `std::out_of_range`.
src/v/kafka/server/BUILD	Adds the new handler sources/headers and protocol deps to the server library.
src/v/kafka/protocol/types.h	Introduces `redpanda_api_key_base` constant (`15000`).
src/v/kafka/protocol/tests/describe_redpanda_roles_test.cc	Adds round-trip encode/decode tests for new request/response data types.
src/v/kafka/protocol/tests/BUILD	Registers the new protocol unit test.
src/v/kafka/protocol/schemata/generator.py	Adds new struct type names to the schema generator’s struct-type list.
src/v/kafka/protocol/schemata/generator.bzl	Adds `describe_redpanda_roles` to the list of generated messages.
src/v/kafka/protocol/schemata/describe_redpanda_roles_response.json	Defines the wire schema for the new response (flexible v0+).
src/v/kafka/protocol/schemata/describe_redpanda_roles_request.json	Defines the wire schema for the new request (nullable filter list, flexible v0+).
src/v/kafka/protocol/messages.h	Adds new request schema include and defines protocol-side `redpanda_request_types`.
src/v/kafka/protocol/flex_versions.cc	Adds a rebased flex-version table for reserved-range API keys and routes lookups accordingly.
src/v/kafka/protocol/describe_redpanda_roles.h	Adds protocol request/response wrapper types for `DescribeRedpandaRoles`.

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.

nguyen-andrew · 2026-06-06T04:51:12Z

Force pushes to address copilot review comments and pull auth out of describe_redpanda_roles_handler (to be added in the follow up stacked PR).

nguyen-andrew · 2026-06-06T04:51:40Z

/ci-repeat 1

vbotbuildovich · 2026-06-06T06:14:07Z

CI test results

test results on build#85479

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	ShadowLinkingReplicationTests	test_with_restart	{"storage_mode": "tiered"}	integration	https://buildkite.com/redpanda/redpanda/builds/85479#019e9b52-f2b3-416b-a8da-c1b55f67fff1	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0460, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1317, p1=0.2436, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart

test results on build#85564

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	ShadowLinkTopicFailoverTests	test_link_failover	{"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "tiered_cloud", "with_failures": false}	integration	https://buildkite.com/redpanda/redpanda/builds/85564#019eae81-1655-41c3-b35b-cea4bb5a3673	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkTopicFailoverTests&test_method=test_link_failover
FLAKY(PASS)	ShadowLinkingReplicationTests	test_auto_prefix_trimming	{"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": true}	integration	https://buildkite.com/redpanda/redpanda/builds/85564#019eae81-1651-4092-a0f7-3ac192c498af	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0044, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS)	WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/85564#019eae85-b4ac-40f2-ab7d-d3768feca5b8	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0954, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.2598, p1=0.0494, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

test results on build#85837

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	ShadowLinkingReplicationTests	test_auto_prefix_trimming	{"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "tiered", "with_failures": false}	integration	https://buildkite.com/redpanda/redpanda/builds/85837#019ecdce-e0c4-4eaa-aecf-1333982145ec	28/31	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0218, p0=0.1392, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.4114, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS)	NodeWiseRecoveryTest	test_recovery_local_data_missing	{"wait_for_final_manifest_uploads": true}	integration	https://buildkite.com/redpanda/redpanda/builds/85837#019ecdcb-ce5f-4fed-a173-f09ff36681d6	28/31	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0428, p0=0.3697, reject_threshold=0.0100. adj_baseline=0.1230, p1=0.2684, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodeWiseRecoveryTest&test_method=test_recovery_local_data_missing
FLAKY(PASS)	ShadowLinkingRandomOpsTest	test_node_operations	{"failures": true, "workload_set": "cloud_combos"}	integration	https://buildkite.com/redpanda/redpanda/builds/85837#019ecdce-e0c4-4684-a44c-b979524cb162	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0189, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations

test results on build#85886

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	ShadowLinkingReplicationTests	test_auto_prefix_trimming	{"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": false}	integration	https://buildkite.com/redpanda/redpanda/builds/85886#019ed316-8bdd-4c6f-9c71-ecfc55bf7543	19/21	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0283, p0=0.4373, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3917, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS)	ConsumerGroupBalancingTest	test_coordinator_nodes_balance	null	integration	https://buildkite.com/redpanda/redpanda/builds/85886#019ed316-8bde-4a33-bfdf-822a8137bae9	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerGroupBalancingTest&test_method=test_coordinator_nodes_balance
FLAKY(PASS)	NodeWiseRecoveryTest	test_recovery_local_data_missing	{"wait_for_final_manifest_uploads": true}	integration	https://buildkite.com/redpanda/redpanda/builds/85886#019ed313-3ab0-4ee8-ba86-0cdefba7cc12	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0169, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodeWiseRecoveryTest&test_method=test_recovery_local_data_missing

test results on build#86041

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	ShadowLinkingReplicationTests	test_auto_prefix_trimming	{"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": true}	integration	https://buildkite.com/redpanda/redpanda/builds/86041#019ee103-c18d-4cc6-9b07-1f44c389adcb	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0353, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1022, p1=0.3402, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming

test results on build#86063

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	NodeWiseRecoveryTest	test_recovery_local_data_missing	{"wait_for_final_manifest_uploads": false}	integration	https://buildkite.com/redpanda/redpanda/builds/86063#019ee280-83ed-460b-b8b1-feee0e9cc178	19/21	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0455, p0=0.6062, reject_threshold=0.0100. adj_baseline=0.1305, p1=0.2443, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodeWiseRecoveryTest&test_method=test_recovery_local_data_missing

pgellert · 2026-06-11T10:02:16Z

    ],
 )
+
+redpanda_cc_btest(


New tests should use gtest > btest

pgellert · 2026-06-11T10:25:31Z

+    const auto& tc_keys = _kafka_throughput_controlled_api_keys();
+    const auto rk = r_data.request_key;
+    if (
+      rk() >= 0 && static_cast<size_t>(rk()) < tc_keys.size()
+      && tc_keys[rk()]) {


What are the implications of not including the new endpoints in request_types and instead splitting it out into a separate redpanda_request_types? Are there any other uses of max_api_key and the request_types type that we would need to think about other than this _kafka_throughput_controlled_api_keys?

The main implications are:

The reserved range doesn't modify or impact the densely packed/key-indexed array structures for the standard request types.

Any code on the standard range that also wants the custom range needs a separate path to do so.

Both are intentional to have minimal impact on the standard Kafka API hot path and its structures. Bringing the reserved keys into request_types would resize those key-indexed structures with mostly empty space, so I split it out to avoid that.

The structures that get a rebased parallel structure are the flex-version table, the dispatch LUT, and the handler-probe vector. The throughput bitmap in this comment is the fourth key-indexed structure, handled with a bounds-check instead since I don't believe the reserved APIs are throughput-controlled and just need to be ignored, but happy to change it though if you think they should be.

The other request_types users (serialize_apis for ApiVersions, the client variant in api_types.h) iterate the type list; we'll be updating those in follow-on PRs too (ApiVersions for advertising the API, and the client variant when we implement the client side for shadow linking).

pgellert · 2026-06-11T10:26:52Z

+    const auto& tc_keys = _kafka_throughput_controlled_api_keys();
+    const auto hk = hdr.key;
+    if (
+      hk() >= 0 && static_cast<size_t>(hk()) < tc_keys.size()
+      && tc_keys[hk()]) {


nit: this 3x duplicated code could be shared through a helper

pgellert · 2026-06-11T10:35:14Z

+ * standard, max-key-sized LUT remains untouched.
+ */
+template<typename... Ts>
+constexpr auto make_custom_lut(type_list<Ts...>) {


nit: Could make_lut and make_custom_lut be the same if we add a function argument for the base offset?

pgellert · 2026-06-11T10:37:43Z

+  , _custom_probes([]<typename... Ts>(type_list<Ts...>) {
+      static_assert(
+        sizeof...(Ts) > 0,
+        "redpanda_request_types must be non-empty: std::max() over an empty "
+        "pack is ill-formed, and the reserved-range probe table assumes at "
+        "least one custom API");
+      constexpr int base = static_cast<int>(redpanda_api_key_base());
+      static_assert(
+        ((static_cast<int>(Ts::api::key()) >= base) && ...),
+        "redpanda_request_types keys must be >= redpanda_api_key_base; a "
+        "smaller key gives a negative rebased table offset (out of bounds)");
+      constexpr int max_offset = std::max(
+        {(static_cast<int>(Ts::api::key()) - base)...});
+      static_assert(
+        max_offset < static_cast<int>(sizeof...(Ts)) * 10,
+        "custom Redpanda API probe table is too sparse; allocate keys densely "
+        "from redpanda_api_key_base");
+      return static_cast<size_t>(max_offset + 1);
+  }(redpanda_request_types{})) {


Should we just define a max_redpanda_api_key() instead and then this would just be probably something like max_redpanda_api_key() - redpanda_api_key_base() + 1?

pgellert · 2026-06-11T10:40:21Z

+
+    // Reserved Redpanda API key range: separate, rebased storage so the dense
+    // _probes vector is not enlarged by five-digit keys.
+    const int base = static_cast<int>(redpanda_api_key_base());
+    [&]<typename... Ts>(type_list<Ts...>) {


How come this is so complex and templated vs the for loop above? Could we simplify this by just iterating over the index range of _custom_probes instead, similar to above?

pgellert · 2026-06-11T10:50:21Z

+    // Stub proving reserved-range dispatch end to end: returns an empty,
+    // well-formed response. Role enumeration and the cluster DESCRIBE
+    // authorization that guards it are added in the follow-on PR, once the API
+    // returns real data.


cluster DESCRIBE

This effectively lowers the access level required for listing roles from superuser (through the admin API) to requiring a cluster DESCRIBE access level. That seems reasonable to me, but we should think through how it fits into the BYOC / Serverless access model.

Yes, very good point. We might also be able to work backwards: what is the access level we want, and then force that even in a bespoke way here. For example, maybe part of setting up the shadow woudl be adding a special user just for the one connection that will be invoking this API.

Good points, I can nail that down outside of this PR in the follow-on that implements the handler. I think the bespoke suggestion makes sense. Maybe a separately-managed set of shadow-linking users (similar to how superusers is its own configured set), or maybe custom ACL resource(s) related to shadow linking or the role store. Wdyt?

It's fine to tackle things in follow up PRs. Can you make tickets so we can lose track?

Filed CORE-16643 to track this and capture some of my findings.

TL;DR of some things I found:

Serverless might be out of scope, since it looks like it doesn't support roles.

For BYOC/Dedicated, it looks like DescribeAcls also requires describe cluster permissions, so a shadow principal will need that permission to shadow ACLs anyways. So the question is whether role names warrant tighter control than the broad describe cluster required for ACL reads. Can settle that outside of this PR.

dotnwat · 2026-06-11T23:14:51Z

-    if (_kafka_throughput_controlled_api_keys().at(r_data.request_key)) {
+    const auto& tc_keys = _kafka_throughput_controlled_api_keys();
+    const auto rk = r_data.request_key;
+    if (
+      rk() >= 0 && static_cast<size_t>(rk()) < tc_keys.size()


arguably this should throw if rk() < 0 ?

I don't think we need to since invalid keys (including negative) get rejected at the request router, so ignoring/skipping them here in the throughput-control path seems okay

just in isolation here it's pretty hard to tell. the .at accessor would throw no matter the value of r_data.request_key. but now, we fall through if rk() < 0 which is inherently incorrect? my point is just that the semantics here changed, so reviewing it needs the extra explanation to understand the context, or, the code can preserve the same semantics.

Yea you're right, it does change the semantics. I've reworked it to preserve the original behavior and carve out only the reserved range to not throw.

dotnwat · 2026-06-11T23:24:45Z

+// resized by the five-digit custom keys.
+using redpanda_request_types
+  = make_request_types<describe_redpanda_roles_handler>;
+


With this PR does the new APIs show up in ApiVersionResponse? I'm guessing no because I don't see the above request_types() changed, but maybe there is another way to accomplish the same thing. Is that something you intentionally left out?

Right, intentionally left out of ApiVersionResponse here. This PR is just the dispatch/flex/metrics plumbing plus a stub describe_redpanda_roles handler that returns an empty response, so I left the API unadvertised for now. Planning to advertise it in the follow-on PR that actually implements the handler.

nguyen-andrew · 2026-06-16T00:04:31Z

Force-pushed to address review feedback:

Deduplicated the repeated table-sizing/guard logic (LUT, flex map, probe vector) into one api_table_span helper; make_lut now takes a Base param so make_custom_lut is gone. Added api_table_span_test.
Simplified custom-probe setup to a plain runtime loop over handler_for_key (gap-tolerant).
Switch to gtest.

dotnwat · 2026-06-16T21:33:52Z

-      connection_ctx->_kafka_throughput_controlled_api_keys().at(request_key)) {
+      request_key() >= 0 && static_cast<size_t>(request_key()) < tc_keys.size()
+      && tc_keys[request_key()]) {


i dont' think i quite understand these changes to avoid .at and do direct indexing. it would make sense if .at would throw when it shouldn't, but then i'd expect us to do have additional checks for the special cases. what am i missing?

Updated to keep the previous behavior while carving out the new reserved range.

nguyen-andrew · 2026-06-17T00:45:08Z

Force pushed to address the throughput-control feedback:

Added max_redpanda_api_key()
Replaced the bitmap bounds-check with a shared is_throughput_controlled() helper that exempts only the reserved range while keeping the previous behavior.
Folded the throughput-bitmap commit into the reserved-range commit.

Copilot

Pull request overview

Copilot reviewed 35 out of 35 changed files in this pull request and generated 1 comment.

nguyen-andrew · 2026-06-19T05:11:38Z

/microbench

nguyen-andrew · 2026-06-19T17:19:11Z

Force push to reword some commits and address copilot comments.
Force push to optimize for the common case + address PR comment

nguyen-andrew · 2026-06-19T17:19:44Z

/ci-repeat 1
release
skip-rebase

nguyen-andrew · 2026-06-19T19:22:11Z

/microbench

vbotbuildovich · 2026-06-19T21:17:33Z

Performance change detected in https://buildkite.com/redpanda/redpanda/builds/86046#019ee156-5d06-47b4-baac-3efac73e8cf8:

Performance changes detected in 3 tests
async_algorithm_rpbench.algo_bench.std_for_each_small: allocs -> 0.00pct
async_algorithm_rpbench.algo_bench.std_for_each_small: inst -> +0.10pct
raft_replicate_rpbench.raft_benchmark_fixture.quorum_ack_replicate_16KiB: inst -> +18.24pct

See https://redpandadata.atlassian.net/wiki/x/LQAqLg for docs

nguyen-andrew · 2026-06-19T23:48:40Z

raft_replicate_rpbench.raft_benchmark_fixture.quorum_ack_replicate_16KiB: inst +18.24% appears to be a false signal.

There doesn't appear to be a raft dependency path to the changed code

❯❯ bazel query 'somepath(//src/v/raft/tests:raft_replicate_rpbench, //src/v/kafka/...)'
INFO: Empty results

For comparison, a bench that does depend on the changed code returns a path, and that one wasn't flagged by /microbench:

❯❯ bazel query 'somepath(//src/v/kafka/server/tests:kafka_produce_partition_rpbench, //src/v/kafka/...)'
//src/v/kafka/server/tests:kafka_produce_partition_rpbench
//src/v/kafka/server/tests:kafka_produce_partition_rpbench_binary
//src/v/kafka/server:server
//src/v/kafka/protocol/schemata:delete_acls_response
//src/v/kafka/protocol/schemata:delete_acls_response.cc
//src/v/kafka/protocol/schemata:delete_acls_response_genrule

Looking at the historical perf results db, this run's value (439,092) is +3.6% above the recent-dev median (~424k), and a dev commit measured 439,114 three days earlier. The +18.24% appears to be computed against a lower stale baseline, likely before dev drifted up to ~424k.

The compile-time type_list helper was defined identically in both kafka/protocol/messages.h and kafka/server/handlers/handlers.h. Extract the duplicate into a new kafka/protocol/type_list.h that both include, so there is a single definition. Pure refactor: no behavior change.

The handler-side request type-list and its alias share the names request_types/make_request_types with the wire-side lists in messages.h, which makes the two ambiguous to talk about as the code grows. Rename the handler-side list to handler_request_types and its alias to make_handler_request_types; the wire-side names are left unchanged. Pure refactor: no behavior change.

Introduce the wire protocol for DescribeRedpandaRoles, the first Redpanda-specific Kafka API. It occupies api key 15000, the start of a reserved range (>= 15000) carved out for Redpanda APIs so they never collide with upstream Kafka key assignments. Add the request/response schema JSON, register the message and its three nested struct types with the schemata generator, and add the wire-type header with its encode/decode/format_to surface. A gtest round-trips a populated request and response to lock the encoding. This commit only defines wire encoding/decoding; the API is not yet reachable through dispatch.

Add the server-side handler for DescribeRedpandaRoles as a single_stage_handler over the wire type. The handle() specialization decodes the request, logs it, and returns a well-formed empty response with error_code::none; real role enumeration and authorization are left to follow-on work. Wire the handler .cc/.h into the server library and depend on the DescribeRedpandaRoles protocol target. The handler compiles and links but is not yet routed by api-key dispatch.

A reserved Redpanda API key (>= 15000) sits ~14k indices above the largest standard Kafka key, so a single dense array indexed by api_key would have to allocate the entire gap. Add api_key_indexed_array, a dense array-backed map from api_key to T that stores two disjoint regions: the standard range [0, StandardSize) and the reserved range [ReservedBase, ReservedBase + ReservedSize). Indexing routes a key to its region, so callers treat it as one array without paying for the gap. Every member is constexpr, so the same type backs both the compile-time dispatch/flex tables and the run-time probe/throughput tables. A static_assert enforces that the reserved region cannot overlap the standard range, which would otherwise let standard keys silently alias reserved storage in operator[]. Also introduce redpanda_api_key_base (15000), the start of the reserved range, used as the container's default ReservedBase.

Stand up the scaffolding that sizes the dual-region container from the actual API lists. Add the reserved-range wire and handler type-lists (redpanda_request_types in messages.h, redpanda_handler_request_types in handlers.h) alongside the existing standard lists, kept separate so the standard dispatch/flex/metering tables stay sized to the standard range. Define standard_api_key_table_size and reserved_api_key_table_size in api_key_table.h and static_assert them against the parallel api lists so they cannot silently drift, then define the api_key_table<T> alias over api_key_indexed_array sized to both. A dense array is only worthwhile when the keys it spans are packed. Guard that here against the api lists (the real sizing inputs) with a named sparsity factor, so an accidentally sparse key fails to compile rather than silently bloating the backing array.

The flexible-version lookup built g_flex_mapping as a single dense array sized to the largest standard key and filled only from the standard wire list, so a reserved key (>= 15000) was out of range and never resolved as in-schema or flexible. Rebuild g_flex_mapping as an api_key_table<api_version> filled from both the standard and reserved wire lists. Drop the local consteval max_api_key helper (the table now owns its sizing), index by api_key directly, and replace the hand-rolled bounds check in is_api_in_schema with the table's contains(). Reserved-range APIs now participate in flex negotiation. A gtest covers that standard keys are unchanged and that the reserved DescribeRedpandaRoles key resolves in-schema and as flexible.

Handler-probe storage was a single std::vector<handler_probe> sized max_api_key()+2, with the trailing slot doubling as the synthetic unknown-handler bucket. A reserved key (>= 15000) would have forced that vector to span the whole gap, and the trailing-slot trick does not generalize to two disjoint regions. Store the probes in an api_key_table<handler_probe> and lift the unknown-handler probe out into its own member, since it is not a real key. Build the table by iterating both regions with for_each, skipping keys without a registered handler, and set up the unknown-handler probe explicitly so its prior behavior (histogram only under latency-all, labelled via the max_api_key()+1 sentinel) is preserved. Reserved-range handlers are now metered like any other.

The dispatch lookup built a single flat std::array indexed directly by api_key, which both rejected reserved keys (>= 15000) as too sparse and could not address them without spanning the whole gap. Rebuild make_lut to take the standard and reserved handler lists and populate an api_key_table<handler>, so a reserved request routes to its handler. handler_for_key and api_name_to_key now resolve keys and names across both regions. Add dispatch tests for a reserved key and for every name resolving back to its key.

The throughput-controlled API-key bitmap was a std::vector<bool> sized only to the largest standard key, so a reserved request (>= 15000) threaded through the pre-dispatch throttle check would index out of range and throw. Retrofit convert_api_names_to_key_bitmap and the connection_context binding/member from std::vector<bool> to api_key_table<bool>, covering both ranges. Standard-key behavior is preserved: a key in the gap still throws out_of_range exactly as the old vector's at() did. Reserved keys now read false (not controlled) by default and can be opted in by naming the API in kafka_throughput_controlled_api_keys. A gtest covers named standard keys, reserved keys returning false rather than throwing, out-of-range still throwing, and opting a reserved key in. The redpanda_thread_fixture call site is updated in lockstep so every translation unit constructing a connection_context still compiles.

Add a fixture test proving the full reserved-range path: a DescribeRedpandaRoles request at api key 15000 survives the pre-dispatch throttle check, routes through the dispatch LUT to the stub handler, and returns a well-formed response. This exercises the wire schema, the handler, reserved-range dispatch, and the dual-region throughput bitmap together end to end.

nguyen-andrew · 2026-06-20T00:32:54Z

Force push to rebase on latest dev.

nguyen-andrew · 2026-06-20T05:49:33Z

/microbench

vbotbuildovich · 2026-06-20T07:49:00Z

Performance change detected in https://buildkite.com/redpanda/redpanda/builds/86069#019ee394-c751-4598-9faf-f9fd95b91810:

Performance changes detected in 4 tests
cloud_storage_rpbench.cstore_bench.column_store_append_with_intervening_lookup_10k_1k: inst -> +0.04pct
cloud_storage_rpbench.cstore_bench.column_store_lower_bound_result: inst -> -1.59pct
cloud_storage_rpbench.cstore_bench.column_store_upper_bound_result: inst -> -1.59pct
role_store_bench_rpbench.role_store_bench.role_authz_512_roles_1Ki_members: inst -> -0.01pct

See https://redpandadata.atlassian.net/wiki/x/LQAqLg for docs

dotnwat

lgtm. nice work

dotnwat · 2026-06-22T23:05:57Z

+        }
+        return nullptr;
+    }
+    constexpr T* find(api_key key) noexcept {


fyi, you should be able to use the c++23 deducing self feature to avoid duplicating code for each of the const/non-const versions of these functions.

Copilot AI review requested due to automatic review settings June 6, 2026 03:35

Copilot started reviewing on behalf of nguyen-andrew June 6, 2026 03:35 View session

github-actions Bot added area/build area/redpanda labels Jun 6, 2026

Copilot AI reviewed Jun 6, 2026

View reviewed changes

Comment thread src/v/kafka/server/handlers/handler_probe.cc Outdated

Comment thread src/v/kafka/server/tests/handler_interface_test.cc Outdated

Comment thread src/v/kafka/protocol/tests/describe_redpanda_roles_test.cc Outdated

nguyen-andrew self-assigned this Jun 6, 2026

nguyen-andrew marked this pull request as draft June 6, 2026 03:40

nguyen-andrew force-pushed the kafka-redpanda-api-range branch from 5f702f7 to fc9aa50 Compare June 6, 2026 04:04

nguyen-andrew requested a review from Copilot June 6, 2026 04:07

Copilot started reviewing on behalf of nguyen-andrew June 6, 2026 04:07 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

nguyen-andrew force-pushed the kafka-redpanda-api-range branch from fc9aa50 to 335a5bd Compare June 6, 2026 04:44

nguyen-andrew mentioned this pull request Jun 6, 2026

kafka: implement DescribeRedpandaRoles role enumeration #30732

Open

7 tasks

nguyen-andrew marked this pull request as ready for review June 9, 2026 22:07

nguyen-andrew requested review from a team, dotnwat, nvartolomei and pgellert and removed request for a team June 9, 2026 22:08

pgellert reviewed Jun 11, 2026

View reviewed changes

dotnwat reviewed Jun 11, 2026

View reviewed changes

nguyen-andrew force-pushed the kafka-redpanda-api-range branch from 335a5bd to 65ae2e2 Compare June 15, 2026 23:57

nguyen-andrew requested a review from pgellert June 16, 2026 14:51

dotnwat reviewed Jun 16, 2026

View reviewed changes

nguyen-andrew force-pushed the kafka-redpanda-api-range branch from 65ae2e2 to 055ec44 Compare June 17, 2026 00:33

nguyen-andrew force-pushed the kafka-redpanda-api-range branch 2 times, most recently from effec6f to 0c5da3c Compare June 19, 2026 03:53

nguyen-andrew requested a review from Copilot June 19, 2026 03:57

Copilot started reviewing on behalf of nguyen-andrew June 19, 2026 03:58 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Comment thread src/v/kafka/server/handlers/handler_interface.h

nguyen-andrew force-pushed the kafka-redpanda-api-range branch from 0c5da3c to 778fd29 Compare June 19, 2026 17:17

nguyen-andrew added 11 commits June 20, 2026 00:12

nguyen-andrew force-pushed the kafka-redpanda-api-range branch from 778fd29 to 2c4b827 Compare June 20, 2026 00:30

nguyen-andrew requested a review from dotnwat June 22, 2026 14:29

dotnwat approved these changes Jun 22, 2026

View reviewed changes

nguyen-andrew merged commit 84702c7 into redpanda-data:dev Jun 23, 2026
19 checks passed

Conversation

nguyen-andrew commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

nguyen-andrew commented Jun 6, 2026

Uh oh!

nguyen-andrew commented Jun 6, 2026

Uh oh!

vbotbuildovich commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nguyen-andrew commented Jun 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nguyen-andrew commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

nguyen-andrew commented Jun 6, 2026 •

edited

Loading

vbotbuildovich commented Jun 6, 2026 •

edited

Loading