-
Notifications
You must be signed in to change notification settings - Fork 651
cluster/utils: unified partition notifications API #26929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/dt |
/dt |
Retry command for Build#69395please wait until all jobs are finished before running the slash command
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction - overall I do think we could simplify it further.
src/v/cluster/utils/notifications.h
Outdated
virtual notification_id_type | ||
register_leadership_notification(notification_cb_t) | ||
= 0; | ||
virtual void unregister_leadership_notification(notification_id_type) = 0; | ||
|
||
virtual notification_id_type | ||
register_partition_manage_notification(const model::ns&, notification_cb_t) | ||
= 0; | ||
virtual void unregister_partition_manage_notification(notification_id_type) | ||
= 0; | ||
|
||
virtual notification_id_type register_partition_unmanage_notification( | ||
const model::ns&, notification_cb_t) | ||
= 0; | ||
virtual void | ||
unregister_partition_unmanage_notification(notification_id_type) | ||
= 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever care about the distinction between these 3 cases anywhere for this "follow leadership" use case? Can we just merge them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good point .. we could probably merge all of them into a single call.
src/v/cluster/utils/notifications.h
Outdated
// Unified interface for partition level notifications. | ||
// Supports notifications for partition leadership changes, | ||
// partition management (creation, deletion), and partition properties changes. | ||
class notifications { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will think about naming, perhaps a bit too generic. Maybe partition_leadership_notifier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about partition_change_notifier? (since this is not limited to leadership alone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah SGTM 👍
src/v/cluster/utils/notifications.h
Outdated
, is_leader(is_leader) | ||
, topic_cfg(std::move(topic_cfg)) {} | ||
model::ntp ntp; | ||
model::term_id term; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAGNI? I don't know where we would care about term for these use cases...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started without term but perhaps need it for DR.. the rationale is term_id acts as a simple fence in some cases.. for example we try to process a notification at term=0 but the processing doesn't succeed (we queue it again) but meanwhile there is a notification processed in term term=1 which succeeds and then the future term=0 op becomes a noop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd personally rather see added when needed, but fine to keep it...
CI test resultstest results on build#69395
test results on build#69528
test results on build#69557
test results on build#69779
|
/dt |
please hold off on reviews until ci is green (will request reviews then). |
Retry command for Build#69498please wait until all jobs are finished before running the slash command
|
/dt |
/dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to keep as draft?
virtual notification_id_type | ||
register_partition_notifications(notification_cb_t) | ||
= 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some use cases (at least Wasm) doesn't care about topic configuration changes... Maybe those changes are rare enough it doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configuration changes are rare but they probably are useful (eg: datalake and DR both need them)..
void partition_change_notifier_impl::unregister_partition_notifications( | ||
notification_id_type id) { | ||
auto it = _notification_ids.find(id); | ||
if (it != _notification_ids.end()) [[unlikely]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not [[unlikely]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops I mixed up early return and unlikely.. nice spot fixed.
leadership_change, id, partition->ntp(), partition); | ||
}); | ||
nstate.managed = _partition_mgr.local().register_manage_notification( | ||
// do we need any other? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cloud topics I need it for kafka_internal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
enum notification_type { | ||
leadership_change, | ||
partition_management, | ||
partition_unmanagement, | ||
partition_properties_change | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a newcomer it was not clear what "partition_management" or "partition_unmanagement" means. Can we add a brief description of these events and when they happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added some docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also renamed to replica_assigned and replica_unassigned..feels like that is a bit clearer than manage/unmanage
// Unified interface for partition level notifications. | ||
// Supports notifications for partition leadership changes, | ||
// partition management (creation, deletion), and partition properties changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we elaborate on this? If you register a notification using this class then you get notifications for all kafka namespace'd NTPs that are owned by this shard, as well as notifications when they are not "owned" by this shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, its only for partition replicas on the shard.
Retry command for Build#69528please wait until all jobs are finished before running the slash command
|
I didn't want to kill the running BK job when moving out of draft. |
b590390
to
0eda5fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits/suggestions, but otherwise LGTM. Thanks for this!
nstate.properties = _topic_table.local().register_ntp_delta_notification( | ||
[this, id](cluster::topic_table::ntp_delta_range_t range) mutable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to filter out redpanda
namespace here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left it as is for now since its easy to filter them out on the subscriptions.
// kafka namespace | ||
nstate.kafka_replica_assigned | ||
= _partition_mgr.local().register_manage_notification( | ||
model::kafka_namespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If most use cases only need one or the other, maybe we want to pass in a namespace to the constructor. For Cloud Topics, I only need a single internal topic to follow leadership on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DR needs all updates (eg: controller and kafka), datalake API only needs kafka.. so its bit nonunform.. for now I left it as is since its straight forward to filter out on the caller side.
, is_leader(is_leader) | ||
, topic_cfg(std::move(topic_cfg)) {} | ||
model::term_id term; | ||
bool is_leader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am wondering if it would be sometimes helpful to actually include current leader_id ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the usecases this is intended for rn (Datalake and DR), they don't use leader_id, how about we extend this if needed.
auto id = _notification_id++; | ||
auto exceptional_cleanup = ss::defer( | ||
[this, &nstate] { do_unregister_partition_notifications(nstate); }); | ||
nstate.leadership = _group_mgr.local().register_leadership_notification( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the notification will only be triggered on a node/shard that is currently hosting a replica for given partition. Is that intentional ?
Mayb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's intentional. Currently, non-local changes are only supported for leadership changes, not for the manage/unmanage API. To maintain uniformity, this API only supports shard-local updates.
This design targets use cases that follow leadership. For other patterns there are low level APIs anyway.
Unified interface for all partition level notifications. Avoids direct dependencies on raft::group_mgr/partition_mgr/topic_table.
This is a bit half baked because cluster partition stuff is deeply intertwined with datalake_mgr but still showcases the usage of this API.
great stuff |
, is_leader(is_leader) | ||
, topic_cfg(std::move(topic_cfg)) {} | ||
model::term_id term; | ||
bool is_leader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: bool_class?
chunked_hash_map<notification_id_type, notification_state> | ||
_notification_ids; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can use src/v/utils/notification_list.h
to remove this boilerplate code
One stop shop for all partition notifications. Supports leadership changes, manage unmanage scenarios and property change notifications. Lately we have been implementing a lot of follow the current leader scenarios and this API helps simplify that flow while avoiding explicit dependency on raft::group_mgr and friends.
Preparatory change for a future PR.
Backports Required
Release Notes