kafka: add a reserved Redpanda-specific Kafka API-key range#30731
Conversation
There was a problem hiding this comment.
Pull request overview
Introduces a reserved Redpanda-specific Kafka API key range (base key 15000) and the supporting plumbing so these keys can be dispatched, parsed (flex/tagged fields), and measured (per-key metrics/probes) without inflating the standard “dense” key-indexed tables. Adds DescribeRedpandaRoles (15000) as the first custom API, with wire schema + round-trip tests and a stub handler that enforces cluster DESCRIBE authorization and returns an empty role list.
Changes:
- Add
DescribeRedpandaRolesprotocol schemata, generated message types, and encode/decode round-trip tests. - Extend handler dispatch, flex-version lookup, and handler probe/metrics plumbing to support a rebased custom-key table for the reserved range.
- Fix throughput-controlled API-key bitmap indexing so out-of-range keys don’t throw and disconnect clients.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/kafka/server/tests/handler_probe_test.cc | Adds a regression test ensuring reserved-range keys map to custom probe storage without OOB access. |
| src/v/kafka/server/tests/handler_interface_test.cc | Adds tests for reserved-range dispatch and flex-version/schema recognition. |
| src/v/kafka/server/tests/BUILD | Registers new server-side unit tests and adds needed deps. |
| src/v/kafka/server/handlers/handlers.h | Adds a separate redpanda_request_types handler type-list for reserved-range APIs. |
| src/v/kafka/server/handlers/handler_probe.h | Adds _custom_probes storage for reserved-range per-handler probes. |
| src/v/kafka/server/handlers/handler_probe.cc | Initializes and routes reserved-range probes via rebased offsets. |
| src/v/kafka/server/handlers/handler_interface.cc | Adds reserved-range dispatch LUT (make_custom_lut) and lookup path in handler_for_key. |
| src/v/kafka/server/handlers/describe_redpanda_roles.h | Declares the new DescribeRedpandaRoles handler type. |
| src/v/kafka/server/handlers/describe_redpanda_roles.cc | Implements the handler with authorization + audit failure handling and empty response. |
| src/v/kafka/server/connection_context.cc | Bounds-checkes throughput-controlled API-key bitmap accesses to avoid std::out_of_range. |
| src/v/kafka/server/BUILD | Adds the new handler sources/headers and protocol deps to the server library. |
| src/v/kafka/protocol/types.h | Introduces redpanda_api_key_base constant (15000). |
| src/v/kafka/protocol/tests/describe_redpanda_roles_test.cc | Adds round-trip encode/decode tests for new request/response data types. |
| src/v/kafka/protocol/tests/BUILD | Registers the new protocol unit test. |
| src/v/kafka/protocol/schemata/generator.py | Adds new struct type names to the schema generator’s struct-type list. |
| src/v/kafka/protocol/schemata/generator.bzl | Adds describe_redpanda_roles to the list of generated messages. |
| src/v/kafka/protocol/schemata/describe_redpanda_roles_response.json | Defines the wire schema for the new response (flexible v0+). |
| src/v/kafka/protocol/schemata/describe_redpanda_roles_request.json | Defines the wire schema for the new request (nullable filter list, flexible v0+). |
| src/v/kafka/protocol/messages.h | Adds new request schema include and defines protocol-side redpanda_request_types. |
| src/v/kafka/protocol/flex_versions.cc | Adds a rebased flex-version table for reserved-range API keys and routes lookups accordingly. |
| src/v/kafka/protocol/describe_redpanda_roles.h | Adds protocol request/response wrapper types for DescribeRedpandaRoles. |
5f702f7 to
fc9aa50
Compare
fc9aa50 to
335a5bd
Compare
|
Force pushes to address copilot review comments and pull auth out of |
|
/ci-repeat 1 |
CI test resultstest results on build#85479
test results on build#85564
test results on build#85837
test results on build#85886
test results on build#86041
test results on build#86063
|
| ], | ||
| ) | ||
|
|
||
| redpanda_cc_btest( |
There was a problem hiding this comment.
New tests should use gtest > btest
| const auto& tc_keys = _kafka_throughput_controlled_api_keys(); | ||
| const auto rk = r_data.request_key; | ||
| if ( | ||
| rk() >= 0 && static_cast<size_t>(rk()) < tc_keys.size() | ||
| && tc_keys[rk()]) { |
There was a problem hiding this comment.
What are the implications of not including the new endpoints in request_types and instead splitting it out into a separate redpanda_request_types? Are there any other uses of max_api_key and the request_types type that we would need to think about other than this _kafka_throughput_controlled_api_keys?
There was a problem hiding this comment.
The main implications are:
- The reserved range doesn't modify or impact the densely packed/key-indexed array structures for the standard request types.
- Any code on the standard range that also wants the custom range needs a separate path to do so.
Both are intentional to have minimal impact on the standard Kafka API hot path and its structures. Bringing the reserved keys into request_types would resize those key-indexed structures with mostly empty space, so I split it out to avoid that.
The structures that get a rebased parallel structure are the flex-version table, the dispatch LUT, and the handler-probe vector. The throughput bitmap in this comment is the fourth key-indexed structure, handled with a bounds-check instead since I don't believe the reserved APIs are throughput-controlled and just need to be ignored, but happy to change it though if you think they should be.
The other request_types users (serialize_apis for ApiVersions, the client variant in api_types.h) iterate the type list; we'll be updating those in follow-on PRs too (ApiVersions for advertising the API, and the client variant when we implement the client side for shadow linking).
| const auto& tc_keys = _kafka_throughput_controlled_api_keys(); | ||
| const auto hk = hdr.key; | ||
| if ( | ||
| hk() >= 0 && static_cast<size_t>(hk()) < tc_keys.size() | ||
| && tc_keys[hk()]) { |
There was a problem hiding this comment.
nit: this 3x duplicated code could be shared through a helper
| * standard, max-key-sized LUT remains untouched. | ||
| */ | ||
| template<typename... Ts> | ||
| constexpr auto make_custom_lut(type_list<Ts...>) { |
There was a problem hiding this comment.
nit: Could make_lut and make_custom_lut be the same if we add a function argument for the base offset?
| , _custom_probes([]<typename... Ts>(type_list<Ts...>) { | ||
| static_assert( | ||
| sizeof...(Ts) > 0, | ||
| "redpanda_request_types must be non-empty: std::max() over an empty " | ||
| "pack is ill-formed, and the reserved-range probe table assumes at " | ||
| "least one custom API"); | ||
| constexpr int base = static_cast<int>(redpanda_api_key_base()); | ||
| static_assert( | ||
| ((static_cast<int>(Ts::api::key()) >= base) && ...), | ||
| "redpanda_request_types keys must be >= redpanda_api_key_base; a " | ||
| "smaller key gives a negative rebased table offset (out of bounds)"); | ||
| constexpr int max_offset = std::max( | ||
| {(static_cast<int>(Ts::api::key()) - base)...}); | ||
| static_assert( | ||
| max_offset < static_cast<int>(sizeof...(Ts)) * 10, | ||
| "custom Redpanda API probe table is too sparse; allocate keys densely " | ||
| "from redpanda_api_key_base"); | ||
| return static_cast<size_t>(max_offset + 1); | ||
| }(redpanda_request_types{})) { |
There was a problem hiding this comment.
Should we just define a max_redpanda_api_key() instead and then this would just be probably something like max_redpanda_api_key() - redpanda_api_key_base() + 1?
|
|
||
| // Reserved Redpanda API key range: separate, rebased storage so the dense | ||
| // _probes vector is not enlarged by five-digit keys. | ||
| const int base = static_cast<int>(redpanda_api_key_base()); | ||
| [&]<typename... Ts>(type_list<Ts...>) { |
There was a problem hiding this comment.
How come this is so complex and templated vs the for loop above? Could we simplify this by just iterating over the index range of _custom_probes instead, similar to above?
| // Stub proving reserved-range dispatch end to end: returns an empty, | ||
| // well-formed response. Role enumeration and the cluster DESCRIBE | ||
| // authorization that guards it are added in the follow-on PR, once the API | ||
| // returns real data. |
There was a problem hiding this comment.
cluster DESCRIBE
This effectively lowers the access level required for listing roles from superuser (through the admin API) to requiring a cluster DESCRIBE access level. That seems reasonable to me, but we should think through how it fits into the BYOC / Serverless access model.
There was a problem hiding this comment.
Yes, very good point. We might also be able to work backwards: what is the access level we want, and then force that even in a bespoke way here. For example, maybe part of setting up the shadow woudl be adding a special user just for the one connection that will be invoking this API.
There was a problem hiding this comment.
Good points, I can nail that down outside of this PR in the follow-on that implements the handler. I think the bespoke suggestion makes sense. Maybe a separately-managed set of shadow-linking users (similar to how superusers is its own configured set), or maybe custom ACL resource(s) related to shadow linking or the role store. Wdyt?
There was a problem hiding this comment.
It's fine to tackle things in follow up PRs. Can you make tickets so we can lose track?
There was a problem hiding this comment.
Filed CORE-16643 to track this and capture some of my findings.
TL;DR of some things I found:
- Serverless might be out of scope, since it looks like it doesn't support roles.
- For BYOC/Dedicated, it looks like DescribeAcls also requires
describeclusterpermissions, so a shadow principal will need that permission to shadow ACLs anyways. So the question is whether role names warrant tighter control than the broaddescribeclusterrequired for ACL reads. Can settle that outside of this PR.
| if (_kafka_throughput_controlled_api_keys().at(r_data.request_key)) { | ||
| const auto& tc_keys = _kafka_throughput_controlled_api_keys(); | ||
| const auto rk = r_data.request_key; | ||
| if ( | ||
| rk() >= 0 && static_cast<size_t>(rk()) < tc_keys.size() |
There was a problem hiding this comment.
arguably this should throw if rk() < 0 ?
There was a problem hiding this comment.
I don't think we need to since invalid keys (including negative) get rejected at the request router, so ignoring/skipping them here in the throughput-control path seems okay
There was a problem hiding this comment.
just in isolation here it's pretty hard to tell. the .at accessor would throw no matter the value of r_data.request_key. but now, we fall through if rk() < 0 which is inherently incorrect? my point is just that the semantics here changed, so reviewing it needs the extra explanation to understand the context, or, the code can preserve the same semantics.
There was a problem hiding this comment.
Yea you're right, it does change the semantics. I've reworked it to preserve the original behavior and carve out only the reserved range to not throw.
| // resized by the five-digit custom keys. | ||
| using redpanda_request_types | ||
| = make_request_types<describe_redpanda_roles_handler>; | ||
|
|
There was a problem hiding this comment.
With this PR does the new APIs show up in ApiVersionResponse? I'm guessing no because I don't see the above request_types() changed, but maybe there is another way to accomplish the same thing. Is that something you intentionally left out?
There was a problem hiding this comment.
Right, intentionally left out of ApiVersionResponse here. This PR is just the dispatch/flex/metrics plumbing plus a stub describe_redpanda_roles handler that returns an empty response, so I left the API unadvertised for now. Planning to advertise it in the follow-on PR that actually implements the handler.
335a5bd to
65ae2e2
Compare
|
Force-pushed to address review feedback:
|
| connection_ctx->_kafka_throughput_controlled_api_keys().at(request_key)) { | ||
| request_key() >= 0 && static_cast<size_t>(request_key()) < tc_keys.size() | ||
| && tc_keys[request_key()]) { |
There was a problem hiding this comment.
i dont' think i quite understand these changes to avoid .at and do direct indexing. it would make sense if .at would throw when it shouldn't, but then i'd expect us to do have additional checks for the special cases. what am i missing?
There was a problem hiding this comment.
Updated to keep the previous behavior while carving out the new reserved range.
65ae2e2 to
055ec44
Compare
|
Force pushed to address the throughput-control feedback:
|
effec6f to
0c5da3c
Compare
|
/microbench |
0c5da3c to
778fd29
Compare
|
Force push to reword some commits and address copilot comments. |
|
/ci-repeat 1 |
|
/microbench |
|
Performance change detected in https://buildkite.com/redpanda/redpanda/builds/86046#019ee156-5d06-47b4-baac-3efac73e8cf8: See https://redpandadata.atlassian.net/wiki/x/LQAqLg for docs |
|
There doesn't appear to be a raft dependency path to the changed code For comparison, a bench that does depend on the changed code returns a path, and that one wasn't flagged by Looking at the historical perf results db, this run's value (439,092) is +3.6% above the recent-dev median (~424k), and a dev commit measured 439,114 three days earlier. The +18.24% appears to be computed against a lower stale baseline, likely before dev drifted up to ~424k. |
The compile-time type_list helper was defined identically in both kafka/protocol/messages.h and kafka/server/handlers/handlers.h. Extract the duplicate into a new kafka/protocol/type_list.h that both include, so there is a single definition. Pure refactor: no behavior change.
The handler-side request type-list and its alias share the names request_types/make_request_types with the wire-side lists in messages.h, which makes the two ambiguous to talk about as the code grows. Rename the handler-side list to handler_request_types and its alias to make_handler_request_types; the wire-side names are left unchanged. Pure refactor: no behavior change.
Introduce the wire protocol for DescribeRedpandaRoles, the first Redpanda-specific Kafka API. It occupies api key 15000, the start of a reserved range (>= 15000) carved out for Redpanda APIs so they never collide with upstream Kafka key assignments. Add the request/response schema JSON, register the message and its three nested struct types with the schemata generator, and add the wire-type header with its encode/decode/format_to surface. A gtest round-trips a populated request and response to lock the encoding. This commit only defines wire encoding/decoding; the API is not yet reachable through dispatch.
Add the server-side handler for DescribeRedpandaRoles as a single_stage_handler over the wire type. The handle() specialization decodes the request, logs it, and returns a well-formed empty response with error_code::none; real role enumeration and authorization are left to follow-on work. Wire the handler .cc/.h into the server library and depend on the DescribeRedpandaRoles protocol target. The handler compiles and links but is not yet routed by api-key dispatch.
A reserved Redpanda API key (>= 15000) sits ~14k indices above the largest standard Kafka key, so a single dense array indexed by api_key would have to allocate the entire gap. Add api_key_indexed_array, a dense array-backed map from api_key to T that stores two disjoint regions: the standard range [0, StandardSize) and the reserved range [ReservedBase, ReservedBase + ReservedSize). Indexing routes a key to its region, so callers treat it as one array without paying for the gap. Every member is constexpr, so the same type backs both the compile-time dispatch/flex tables and the run-time probe/throughput tables. A static_assert enforces that the reserved region cannot overlap the standard range, which would otherwise let standard keys silently alias reserved storage in operator[]. Also introduce redpanda_api_key_base (15000), the start of the reserved range, used as the container's default ReservedBase.
Stand up the scaffolding that sizes the dual-region container from the actual API lists. Add the reserved-range wire and handler type-lists (redpanda_request_types in messages.h, redpanda_handler_request_types in handlers.h) alongside the existing standard lists, kept separate so the standard dispatch/flex/metering tables stay sized to the standard range. Define standard_api_key_table_size and reserved_api_key_table_size in api_key_table.h and static_assert them against the parallel api lists so they cannot silently drift, then define the api_key_table<T> alias over api_key_indexed_array sized to both. A dense array is only worthwhile when the keys it spans are packed. Guard that here against the api lists (the real sizing inputs) with a named sparsity factor, so an accidentally sparse key fails to compile rather than silently bloating the backing array.
The flexible-version lookup built g_flex_mapping as a single dense array sized to the largest standard key and filled only from the standard wire list, so a reserved key (>= 15000) was out of range and never resolved as in-schema or flexible. Rebuild g_flex_mapping as an api_key_table<api_version> filled from both the standard and reserved wire lists. Drop the local consteval max_api_key helper (the table now owns its sizing), index by api_key directly, and replace the hand-rolled bounds check in is_api_in_schema with the table's contains(). Reserved-range APIs now participate in flex negotiation. A gtest covers that standard keys are unchanged and that the reserved DescribeRedpandaRoles key resolves in-schema and as flexible.
Handler-probe storage was a single std::vector<handler_probe> sized max_api_key()+2, with the trailing slot doubling as the synthetic unknown-handler bucket. A reserved key (>= 15000) would have forced that vector to span the whole gap, and the trailing-slot trick does not generalize to two disjoint regions. Store the probes in an api_key_table<handler_probe> and lift the unknown-handler probe out into its own member, since it is not a real key. Build the table by iterating both regions with for_each, skipping keys without a registered handler, and set up the unknown-handler probe explicitly so its prior behavior (histogram only under latency-all, labelled via the max_api_key()+1 sentinel) is preserved. Reserved-range handlers are now metered like any other.
The dispatch lookup built a single flat std::array indexed directly by api_key, which both rejected reserved keys (>= 15000) as too sparse and could not address them without spanning the whole gap. Rebuild make_lut to take the standard and reserved handler lists and populate an api_key_table<handler>, so a reserved request routes to its handler. handler_for_key and api_name_to_key now resolve keys and names across both regions. Add dispatch tests for a reserved key and for every name resolving back to its key.
The throughput-controlled API-key bitmap was a std::vector<bool> sized only to the largest standard key, so a reserved request (>= 15000) threaded through the pre-dispatch throttle check would index out of range and throw. Retrofit convert_api_names_to_key_bitmap and the connection_context binding/member from std::vector<bool> to api_key_table<bool>, covering both ranges. Standard-key behavior is preserved: a key in the gap still throws out_of_range exactly as the old vector's at() did. Reserved keys now read false (not controlled) by default and can be opted in by naming the API in kafka_throughput_controlled_api_keys. A gtest covers named standard keys, reserved keys returning false rather than throwing, out-of-range still throwing, and opting a reserved key in. The redpanda_thread_fixture call site is updated in lockstep so every translation unit constructing a connection_context still compiles.
Add a fixture test proving the full reserved-range path: a DescribeRedpandaRoles request at api key 15000 survives the pre-dispatch throttle check, routes through the dispatch LUT to the stub handler, and returns a well-formed response. This exercises the wire schema, the handler, reserved-range dispatch, and the dual-region throughput bitmap together end to end.
778fd29 to
2c4b827
Compare
|
Force push to rebase on latest dev. |
|
/microbench |
|
Performance change detected in https://buildkite.com/redpanda/redpanda/builds/86069#019ee394-c751-4598-9faf-f9fd95b91810: See https://redpandadata.atlassian.net/wiki/x/LQAqLg for docs |
| } | ||
| return nullptr; | ||
| } | ||
| constexpr T* find(api_key key) noexcept { |
There was a problem hiding this comment.
fyi, you should be able to use the c++23 deducing self feature to avoid duplicating code for each of the const/non-const versions of these functions.
Shadow linking needs to read the roles on its source cluster. Those roles are
normally read over the Admin API, but some deployment modes do not expose it,
leaving the Kafka API as the only available path to the source cluster. Reading
roles over that path requires a Redpanda-specific Kafka API, which did not exist
before.
Apache assigns Kafka API keys sequentially and is still under 100 today, so
placing Redpanda's range well above that keeps it clear of future standard
assignments and of keys other Kafka implementations may already use. This PR
reserves a Redpanda range starting at key 15000 and makes request dispatch,
flex-version (tagged-field) parsing, and per-key metrics aware of it, without
bloating the standard-range data structures. The plumbing is kept separate from
the standard tables so
max_api_key()continues to derive only from standardApache keys.
DescribeRedpandaRoles(key 15000) is the first API in the range and the oneshadow linking needs; here it serves as the proving example and simply returns
an empty role list. The API is recognized and dispatched internally but is
intentionally not advertised in ApiVersions yet, so it is not externally
discoverable until it returns real data. That, along with the
role_storewiringand enumeration, lands in the stacked follow-up PR.
This is the base of a two-PR stack; the follow-up (
describe-redpanda-roles-api)wires the cluster
role_storeinto the handler, returns real role data withname filters, and advertises the API in ApiVersions as its final step.
Backports Required
Release Notes