Skip to content

bugfix: race among _connecting and cluster metadata #2189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hackaugusto
Copy link

@hackaugusto hackaugusto commented Jan 5, 2021

Fixes #2188

A call to maybe_connect can be performed while the cluster metadata is
being updated. If that happens, the assumption that every entry in
_connecting has metadata won't hold. The existing assert will then
raise on every subsequent call to poll driving the client instance
unusable.

This fixes the issue by ignoring connetion request to nodes that do not
have the metadata available anymore.


This change is Reviewable

@hackaugusto
Copy link
Author

I'm not 100% sure if the fix is good enough, there are multiple paths that lead to calls to maybe_connect, and I'm not sure if they can recover from the node disappearing.

A call to `maybe_connect` can be performed while the cluster metadata is
being updated. If that happens, the assumption that every entry in
`_connecting` has metadata won't hold. The existing assert will then
raise on every subsequent call to `poll` driving the client instance
unusable.

This fixes the issue by ignoring connetion request to nodes that do not
have the metadata available anymore.
@hackaugusto hackaugusto force-pushed the fix-broker-id-metadata-assert branch from 8214728 to e15ed37 Compare January 7, 2021 11:56
hackaugusto added a commit to aiven/kafka-python that referenced this pull request Jan 18, 2021
A call to `maybe_connect` can be performed while the cluster metadata is
being updated. If that happens, the assumption that every entry in
`_connecting` has metadata won't hold. The existing assert will then
raise on every subsequent call to `poll` driving the client instance
unusable.

This fixes the issue by ignoring connetion request to nodes that do not
have the metadata available anymore.
hackaugusto added a commit to aiven/kafka-python that referenced this pull request Jan 19, 2021
A call to `maybe_connect` can be performed while the cluster metadata is
being updated. If that happens, the assumption that every entry in
`_connecting` has metadata won't hold. The existing assert will then
raise on every subsequent call to `poll` driving the client instance
unusable.

This fixes the issue by ignoring connetion request to nodes that do not
have the metadata available anymore.
ivanyu added a commit to aiven/kafka-python that referenced this pull request Jan 20, 2021
bugfix: race among _connecting and cluster metadata (dpkp#2189)
@hackaugusto hackaugusto closed this Aug 7, 2023
@hackaugusto hackaugusto deleted the fix-broker-id-metadata-assert branch August 7, 2023 09:26
rushidave pushed a commit to aiven/kafka-python that referenced this pull request Jan 3, 2024
A call to `maybe_connect` can be performed while the cluster metadata is
being updated. If that happens, the assumption that every entry in
`_connecting` has metadata won't hold. The existing assert will then
raise on every subsequent call to `poll` driving the client instance
unusable.

This fixes the issue by ignoring connetion request to nodes that do not
have the metadata available anymore.
rushidave pushed a commit to aiven/kafka-python that referenced this pull request Feb 29, 2024
A call to `maybe_connect` can be performed while the cluster metadata is
being updated. If that happens, the assumption that every entry in
`_connecting` has metadata won't hold. The existing assert will then
raise on every subsequent call to `poll` driving the client instance
unusable.

This fixes the issue by ignoring connetion request to nodes that do not
have the metadata available anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant