Skip to content

bugfix: fix infinite loop on KafkaAdminClient #2194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hackaugusto
Copy link

@hackaugusto hackaugusto commented Jan 14, 2021

PR for #2193

An infinite loop may happen with the following pattern:

self._send_request_to_node(self._client.least_loaded_node(), request)

The problem happens when self._client's cluster metadata is out-of-date, and the
result of least_loaded_node() is a node that has been removed from the cluster but
the client is unware of it. When this happens _send_request_to_node will enter an
infinite loop waiting for the chosen node to become available, which won't happen,
resulting in an infinite loop.

This commit introduces a new method named _send_request_to_least_loaded_node which
handles the case above. This is done by regularly checking if the target node is
available in the cluster metadata, and if not, a new node is chosen.

Notes:

  • This does not yet cover every call site to _send_request_to_node, there are some
    other places were similar race conditions may happen.
  • The code above does not guarantee that the request itself will be sucessful, since
    it is still possible for the target node to exit, however, it does remove the
    infinite loop which can render client code unusable.

This change is Reviewable

An infinite loop may happen with the following pattern:

    self._send_request_to_node(self._client.least_loaded_node(), request)

The problem happens when `self._client`'s cluster metadata is out-of-date, and the
result of `least_loaded_node()` is a node that has been removed from the cluster but
the client is unware of it. When this happens `_send_request_to_node` will enter an
infinite loop waiting for the chosen node to become available, which won't happen,
resulting in an infinite loop.

This commit introduces a new method named `_send_request_to_least_loaded_node` which
handles the case above. This is done by regularly checking if the target node is
available in the cluster metadata, and if not, a new node is chosen.

Notes:

- This does not yet cover every call site to `_send_request_to_node`, there are some
  other places were similar race conditions may happen.
- The code above does not guarantee that the request itself will be sucessful, since
  it is still possible for the target node to exit, however, it does remove the
  infinite loop which can render client code unusable.
If the value `_controller_id` is out-of-date and the node was removed
from the cluster, `_send_request_to_node` would enter an infinite loop.
hackaugusto added a commit to aiven/kafka-python that referenced this pull request Jan 18, 2021
An infinite loop may happen with the following pattern:

    self._send_request_to_node(self._client.least_loaded_node(), request)

The problem happens when `self._client`'s cluster metadata is out-of-date, and the
result of `least_loaded_node()` is a node that has been removed from the cluster but
the client is unware of it. When this happens `_send_request_to_node` will enter an
infinite loop waiting for the chosen node to become available, which won't happen,
resulting in an infinite loop.

This commit introduces a new method named `_send_request_to_least_loaded_node` which
handles the case above. This is done by regularly checking if the target node is
available in the cluster metadata, and if not, a new node is chosen.

Notes:

- This does not yet cover every call site to `_send_request_to_node`, there are some
  other places were similar race conditions may happen.
- The code above does not guarantee that the request itself will be sucessful, since
  it is still possible for the target node to exit, however, it does remove the
  infinite loop which can render client code unusable.
hackaugusto added a commit to aiven/kafka-python that referenced this pull request Jan 18, 2021
If the value `_controller_id` is out-of-date and the node was removed
from the cluster, `_send_request_to_node` would enter an infinite loop.
@hackaugusto hackaugusto closed this Aug 7, 2023
@hackaugusto hackaugusto deleted the fix-infinite-loop-with-kafka-admin branch August 7, 2023 09:26
rushidave pushed a commit to aiven/kafka-python that referenced this pull request Jan 3, 2024
If the value `_controller_id` is out-of-date and the node was removed
from the cluster, `_send_request_to_node` would enter an infinite loop.
rushidave pushed a commit to aiven/kafka-python that referenced this pull request Feb 29, 2024
An infinite loop may happen with the following pattern:

    self._send_request_to_node(self._client.least_loaded_node(), request)

The problem happens when `self._client`'s cluster metadata is out-of-date, and the
result of `least_loaded_node()` is a node that has been removed from the cluster but
the client is unware of it. When this happens `_send_request_to_node` will enter an
infinite loop waiting for the chosen node to become available, which won't happen,
resulting in an infinite loop.

This commit introduces a new method named `_send_request_to_least_loaded_node` which
handles the case above. This is done by regularly checking if the target node is
available in the cluster metadata, and if not, a new node is chosen.

Notes:

- This does not yet cover every call site to `_send_request_to_node`, there are some
  other places were similar race conditions may happen.
- The code above does not guarantee that the request itself will be sucessful, since
  it is still possible for the target node to exit, however, it does remove the
  infinite loop which can render client code unusable.
rushidave pushed a commit to aiven/kafka-python that referenced this pull request Feb 29, 2024
If the value `_controller_id` is out-of-date and the node was removed
from the cluster, `_send_request_to_node` would enter an infinite loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants