cluster/observability: fix empty partition in decom / reconfiguration status ouputs#29047
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue where decommissioning and reconfiguration status outputs show empty partition metadata (partition size and completion percentage) even when partitions are actively moving. The root cause was that reconciliation state was being queried from local shards instead of partition leaders, resulting in incomplete information when the queried node doesn't host the replica or isn't the leader.
Key changes:
- Routes reconciliation state queries to partition leaders instead of local shards
- Adds
local_sizeto serialization fields inrecovery_state - Adds comprehensive test coverage for decommission status reporting
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/v/cluster/controller_api.h |
Adds partition_leaders_table dependency and new method get_partition_leader_reconciliation_state |
src/v/cluster/controller_api.cc |
Implements leader-based reconciliation state queries and refactors partition reconfiguration state gathering to use concurrent queries |
src/v/cluster/types.h |
Updates recovery_state serde version and includes local_size in serialization |
src/v/cluster/types.cc |
Updates recovery_state output operator to include local_size |
src/v/redpanda/admin/partition.cc |
Changes admin handler to use new leader-based reconfiguration state API |
src/v/cluster/controller.cc |
Passes partition_leaders reference to controller_api constructor |
tests/rptest/tests/nodes_decommissioning_test.py |
Adds test validating decommission status is properly reported across all nodes |
| co_await ss::max_concurrent_for_each( | ||
| partitions, | ||
| 16, |
There was a problem hiding this comment.
The magic number 16 for max concurrency should be extracted as a named constant or configuration parameter to make its purpose clear and allow easier tuning.
| err_msg="Decommission status not reported as in_progress on all nodes", | ||
| retry_on_exc=True, | ||
| ) | ||
| self._set_recovery_rate(2 << 30) |
There was a problem hiding this comment.
The magic number 2 << 30 (2GB) should be extracted as a named constant to clarify that this is setting a high recovery rate to allow decommissioning to complete.
CI test resultstest results on build#78085
|
|
/backport v25.3.x |
|
/backport v25.2.x |
|
Failed to create a backport PR to v25.2.x branch. I tried: |
Often we seen outputs like this
Sometimes the partition size info and completion percentage don’t show up even though the partition is actually moving just fine. This happens because the recovery state isn’t being polled from the leader; instead, it’s gathered from the shards local to whatever node you’re querying. If that node doesn’t have the replica or isn’t the leader, the info is missing. The fix is to route reconciliation state queries to the leader node instead.
Fixes: https://redpandadata.atlassian.net/browse/CORE-14975
Backports Required
Release Notes
Bug Fixes