Skip to content

kafka: implement DescribeRedpandaRoles role enumeration#30732

Open
nguyen-andrew wants to merge 5 commits into
redpanda-data:devfrom
nguyen-andrew:describe-redpanda-roles-api
Open

kafka: implement DescribeRedpandaRoles role enumeration#30732
nguyen-andrew wants to merge 5 commits into
redpanda-data:devfrom
nguyen-andrew:describe-redpanda-roles-api

Conversation

@nguyen-andrew

@nguyen-andrew nguyen-andrew commented Jun 6, 2026

Copy link
Copy Markdown
Member

Shadow linking needs to read the roles on its source cluster, and in some
deployment modes the Admin API is not exposed, leaving the Kafka API as the only
available path. #30731 (now merged) built the reserved Redpanda API-key range and
dispatched DescribeRedpandaRoles as a stub returning an empty role list, proving
the dispatch machinery; see that PR for the full rationale.

This PR makes the API functional: it reads the cluster role_store, returns
roles and their members, and honors the request's name filters. Advertising the
API in ApiVersions lands after enumeration and filters work, so it becomes
externally discoverable exactly when its behavior is complete.

A final drive-by commit reuses the new role_store::roles_with_members() helper
to collapse the admin v2 list_roles endpoint to a single pass. It is unrelated
to the Kafka API and could land independently.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

@nguyen-andrew nguyen-andrew marked this pull request as ready for review June 6, 2026 05:04
Copilot AI review requested due to automatic review settings June 6, 2026 05:04
@nguyen-andrew nguyen-andrew self-assigned this Jun 6, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Redpanda-reserved Kafka API DescribeRedpandaRoles (key 15000) end-to-end: schema + protocol types, server handler that enumerates roles/members from the controller’s role_store (with name filters + authz), dispatch/metrics/flex-version plumbing for the reserved key range, and finally advertises the API via ApiVersions.

Changes:

  • Add DescribeRedpandaRoles request/response schemas + protocol types, and implement the Kafka handler to enumerate roles/members with optional name filters and cluster-level DESCRIBE authorization.
  • Extend Kafka dispatch tables, flex-version lookups, and per-handler probe metrics to support the reserved Redpanda API-key range without resizing dense standard-range tables.
  • Wire security::role_store into the Kafka server/request context and add focused unit/integration tests plus ApiVersions advertisement coverage.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/v/redpanda/tests/fixture.cc Pass role_store into Kafka server wiring in the test fixture.
src/v/redpanda/application_services.cc Wire controller role_store into Kafka server construction.
src/v/kafka/server/tests/handler_probe_test.cc New test ensuring reserved-range keys map to custom probe storage safely.
src/v/kafka/server/tests/handler_interface_test.cc Extend tests to cover reserved-range handler lookup + flex-version behavior.
src/v/kafka/server/tests/describe_redpanda_roles_test.cc New fixture-based integration tests for role enumeration, filtering, and authz.
src/v/kafka/server/tests/BUILD Add new tests and required deps.
src/v/kafka/server/tests/api_versions_test.cc Assert reserved-range APIs are advertised via ApiVersions.
src/v/kafka/server/server.h Add role_store dependency + accessor on the Kafka server.
src/v/kafka/server/server.cc Plumb role_store through server constructor initialization.
src/v/kafka/server/request_context.h Expose role_store() via request context for handlers.
src/v/kafka/server/handlers/handlers.h Register DescribeRedpandaRoles as a custom (reserved-range) handler type list.
src/v/kafka/server/handlers/handler_probe.h Add rebased probe storage for reserved-range API keys.
src/v/kafka/server/handlers/handler_probe.cc Implement reserved-range probe table sizing/setup and routing in get_probe.
src/v/kafka/server/handlers/handler_interface.cc Add reserved-range rebased dispatch LUT and extend handler_for_key.
src/v/kafka/server/handlers/describe_redpanda_roles.h New handler definition for DescribeRedpandaRoles.
src/v/kafka/server/handlers/describe_redpanda_roles.cc New handler implementation: authz + audit, enumerate roles/members, apply name filters.
src/v/kafka/server/handlers/api_versions.cc Include custom reserved-range APIs in the supported API list.
src/v/kafka/server/connection_context.cc Harden throughput-control key checks to avoid out-of-bounds for non-standard keys.
src/v/kafka/server/BUILD Add new handler sources/headers and protocol deps to the Kafka server library.
src/v/kafka/server/app.h Add role_store forward decl + init signature updates.
src/v/kafka/server/app.cc Pass role_store into server construction.
src/v/kafka/protocol/types.h Define redpanda_api_key_base (15000) for reserved-range routing.
src/v/kafka/protocol/tests/describe_redpanda_roles_test.cc New protocol round-trip tests for request/response encoding/decoding.
src/v/kafka/protocol/tests/BUILD Add protocol test target and deps.
src/v/kafka/protocol/schemata/generator.py Allow new struct types used by the schema generator.
src/v/kafka/protocol/schemata/generator.bzl Register the new message schemata for generation.
src/v/kafka/protocol/schemata/describe_redpanda_roles_response.json New schema: response with roles + members.
src/v/kafka/protocol/schemata/describe_redpanda_roles_request.json New schema: request with optional role-name filters.
src/v/kafka/protocol/messages.h Add custom protocol redpanda_request_types entry for schema/flex plumbing.
src/v/kafka/protocol/flex_versions.cc Add reserved-range rebased flexible-version mapping and schema membership checks.
src/v/kafka/protocol/describe_redpanda_roles.h New protocol wrapper types for request/response.

Comment thread src/v/kafka/server/handlers/describe_redpanda_roles.cc Outdated
Comment thread src/v/kafka/server/tests/handler_probe_test.cc Outdated
Comment thread src/v/kafka/server/tests/api_versions_test.cc
@vbotbuildovich

vbotbuildovich commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#85480
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkingReplicationTests test_auto_prefix_trimming {"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": false} integration https://buildkite.com/redpanda/redpanda/builds/85480#019e9b60-3443-48d7-aa69-2fd2f3b617bf 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0034, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS) ShadowLinkingReplicationTests test_auto_prefix_trimming {"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": true} integration https://buildkite.com/redpanda/redpanda/builds/85480#019e9b63-3c60-4555-9f46-9313ee76d790 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0034, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS) TopicRecoveryTest test_many_partitions {"check_mode": "check_manifest_and_segment_metadata", "cloud_storage_type": 2} integration https://buildkite.com/redpanda/redpanda/builds/85480#019e9b60-3441-4e20-8fd9-988c95918e70 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0006, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecoveryTest&test_method=test_many_partitions
test results on build#86134
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) NodeWiseRecoveryTest test_node_wise_recovery {"dead_node_count": 1} integration https://buildkite.com/redpanda/redpanda/builds/86134#019ef346-1a71-4d00-a137-91fcbed76492 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0290, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodeWiseRecoveryTest&test_method=test_node_wise_recovery
FLAKY(PASS) NodeWiseRecoveryTest test_recovery_local_data_missing {"wait_for_final_manifest_uploads": true} integration https://buildkite.com/redpanda/redpanda/builds/86134#019ef346-1a6c-47f5-8e61-ce363768da63 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0497, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1418, p1=0.2167, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodeWiseRecoveryTest&test_method=test_recovery_local_data_missing
FLAKY(FAIL) ShadowLinkingRandomOpsTest test_node_operations {"failures": true, "workload_set": "cloud_combos"} integration https://buildkite.com/redpanda/redpanda/builds/86134#019ef346-1a6b-45b5-a7b0-dcfe0576858a 17/21 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0163, p0=0.0040, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations

@nguyen-andrew nguyen-andrew marked this pull request as draft June 8, 2026 15:11
role_store::get() reconstructs a role's members by scanning the entire
member store, so enumerating roles with a per-role get() is quadratic in
the total number of memberships. Add roles_with_members(pred), which
returns the roles whose name satisfies pred together with their members
in one pass over the member store (linear), including matching roles with
no members. Results are returned as role_with_members (a role_name paired
with its role).
Thread the cluster role_store from the controller through
server_app::init into the kafka server, exposed via server::role_store()
and request_context::role_store(), mirroring the existing
credential_store and authorizer plumbing. No behavior change yet; the
DescribeRedpandaRoles handler consumes it in a later commit.
@nguyen-andrew nguyen-andrew force-pushed the describe-redpanda-roles-api branch from 96cf239 to c69d69a Compare June 23, 2026 04:46
Replace the placeholder empty response with a read from the cluster
role_store, guarded by cluster DESCRIBE authorization. Roles and their
members are gathered in a single pass via
role_store::roles_with_members(); name filters become a predicate
applied during that pass. A null or empty role_name_filters describes
all roles; otherwise only the named roles are returned, with
nonexistent names skipped silently (the v0 response schema carries only
a top-level error code, no per-role error field).

A fixture test covers the populated, empty-cluster, name-filtered,
missing-name, and unauthorized cases.
list_roles built its response with range() plus a per-role get(),
quadratic in total memberships. Reuse role_store::roles_with_members(),
added earlier on this branch for the DescribeRedpandaRoles Kafka API, to
gather all roles and their members in one pass.

This is a drive-by cleanup of the existing admin v2 endpoint; it is not
part of the new Kafka API and could land independently once the helper
is in tree.
@nguyen-andrew nguyen-andrew force-pushed the describe-redpanda-roles-api branch from c69d69a to 4dd66c2 Compare June 23, 2026 04:47
@nguyen-andrew

Copy link
Copy Markdown
Member Author

/ci-repeat 1

@vbotbuildovich

vbotbuildovich commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Retry command for Build#86127

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_auto_prefix_trimming@{"source_cluster_spec":{"cluster_type":"redpanda"},"storage_mode":"tiered","with_failures":false}
tests/rptest/tests/partition_force_reconfiguration_test.py::NodeWiseRecoveryTest.test_node_wise_recovery@{"dead_node_count":2}

describe_redpanda_roles_response response;
response.data.error_code = error_code::none;
co_return co_await ctx.respond(std::move(response));
auto authz = ctx.authorized(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to gate on DESCRIBE CLUSTER, the same as DescribeAcls.

Roles are otherwise readable only via the superuser-only Admin API, but the consumer (shadow linking) reads them over the Kafka API from a source where the Admin API may be unreachable (BYOC). Therefore, the gate must be non-superuser and ACL-grantable.

Reusing DESCRIBE CLUSTER is effectively free, but coarse-grained because it bundles role and ACL reads together.

A dedicated role ACL resource type would provide a least-privilege solution, but requires meaningful implementation work, including requiring support across downstream clients like rpk or Console.

If role names later require tighter access control, we could switch and introduce a dedicated role resource type.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread src/v/security/role.h
Comment on lines +178 to +183
/// A role paired with its name, as produced by role_store enumeration
/// (role itself stores only members, not its name).
struct role_with_members {
role_name name;
role role;
};
Comment on lines +61 to +70
void create_user(
const ss::sstring& username, security::scram_credential credentials) {
app.controller->get_security_frontend()
.local()
.create_user(
security::credential_user{username},
std::move(credentials),
model::timeout_clock::now() + 5s)
.get();
}
@vbotbuildovich

Copy link
Copy Markdown
Collaborator

Retry command for Build#86134

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/shadow_linking_rnot_test.py::ShadowLinkingRandomOpsTest.test_node_operations@{"failures":true,"workload_set":"cloud_combos"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants