Skip to content

Fix ClusterClient behavior when cluster topology is refreshed. Fix several places where connections might leak.#3917

Merged
petyaslavova merged 6 commits intomasterfrom
ps_dont_close_conn_pool_on_cluster_topology_refresh
Jan 27, 2026
Merged

Fix ClusterClient behavior when cluster topology is refreshed. Fix several places where connections might leak.#3917
petyaslavova merged 6 commits intomasterfrom
ps_dont_close_conn_pool_on_cluster_topology_refresh

Conversation

@petyaslavova
Copy link
Collaborator

Preserve connection pools during cluster topology refresh

Previously, when a cluster topology refresh occurred (due to connection errors, failovers, or MOVED responses), the entire connection pool was disconnected and the node's redis connection was set to None. This caused unnecessary connection churn and could lead to race conditions where in-use connections were abruptly disconnected mid-command.

This change improves the behavior by preserving existing connection pools during topology changes.
In-use connections are now marked for lazy reconnection (they complete their current operation and reconnect on next use), while free/idle connections are disconnected immediately to clear stale state like READONLY mode.
Additionally, failing nodes are moved to the end of the cache rather than being removed, so they're tried last during reinitialization - this also improves the behaviour described in #3693.
Transaction connection lifecycle has also been improved to properly release connections back to the pool on errors, preventing connection leaks.

@jit-ci
Copy link

jit-ci bot commented Jan 22, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves cluster topology refresh behavior by preserving connection pools during topology changes rather than destroying and recreating them. Previously, cluster topology refreshes would disconnect all connections and set the node's redis connection to None, causing unnecessary connection churn and potential race conditions with in-use connections.

Changes:

  • Connection pools are now preserved during topology refresh with lazy reconnection for in-use connections and immediate disconnection for idle connections to clear stale state
  • Failing nodes are moved to the end of the cache rather than removed, improving node selection behavior during reinitialization
  • Transaction connection lifecycle improved to properly release connections on errors, preventing leaks

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
redis/cluster.py Implements connection pool preservation during topology refresh, adds move_node_to_end_of_cached_nodes method, and fixes transaction connection leaks in error paths
redis/asyncio/cluster.py Async version of connection pool preservation, adds ClusterNode methods for connection lifecycle management (disconnect_if_needed, update_active_connections_for_reconnect, disconnect_free_connections)
redis/asyncio/connection.py Adds reset_should_reconnect method to clear reconnect flag after disconnect
tests/test_cluster.py Updates test assertion to reflect new behavior where all node connections are reused during topology refresh
tests/test_cluster_transaction.py Adds proper cleanup in try/finally blocks to ensure mock connections don't interfere with teardown, sets host/port on mock connections for find_connection_owner
tests/test_asyncio/test_cluster_transaction.py Similar cleanup improvements for async tests, removes mock connections from node's _free list after tests
tests/test_asyncio/test_cluster.py Adds cleanup in try/finally for mock connections, adds comprehensive unit tests for new ClusterNode connection handling methods and move_node_to_end_of_cached_nodes
tests/conftest.py Adds host/port attributes to mock_connection fixture needed by find_connection_owner

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

redis/asyncio/cluster.py:1354

  • If disconnect_if_needed raises an exception, the connection that was acquired will not be released back to the pool, potentially causing a connection leak. Consider wrapping the disconnect call in a try-except block or moving the disconnect handling into a try-finally structure to ensure the connection is always released. Alternatively, handle disconnect errors gracefully within disconnect_if_needed itself.
    async def execute_command(self, *args: Any, **kwargs: Any) -> Any:
        # Acquire connection
        connection = self.acquire_connection()
        # Handle lazy disconnect for connections marked for reconnect
        await self.disconnect_if_needed(connection)

        # Execute command
        await connection.send_packed_command(connection.pack_command(*args), False)

        # Read response
        try:
            return await self.parse_response(connection, args[0], **kwargs)
        finally:
            # Release connection
            self._free.append(connection)

redis/asyncio/cluster.py:1381

  • If disconnect_if_needed raises an exception at line 1360, the connection that was acquired will not be released back to the pool at line 1379, potentially causing a connection leak. Consider wrapping the entire method body in a try-finally block to ensure the connection is always released, or handle disconnect errors gracefully within disconnect_if_needed itself.
    async def execute_pipeline(self, commands: List["PipelineCommand"]) -> bool:
        # Acquire connection
        connection = self.acquire_connection()
        # Handle lazy disconnect for connections marked for reconnect
        await self.disconnect_if_needed(connection)

        # Execute command
        await connection.send_packed_command(
            connection.pack_commands(cmd.args for cmd in commands), False
        )

        # Read responses
        ret = False
        for cmd in commands:
            try:
                cmd.result = await self.parse_response(
                    connection, cmd.args[0], **cmd.kwargs
                )
            except Exception as e:
                cmd.result = e
                ret = True

        # Release connection
        self._free.append(connection)

        return ret

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@petyaslavova petyaslavova force-pushed the ps_dont_close_conn_pool_on_cluster_topology_refresh branch from f2de9c1 to 71a5913 Compare January 26, 2026 08:05
@petyaslavova petyaslavova added the maintenance Maintenance (CI, Releases, etc) label Jan 26, 2026
@petyaslavova petyaslavova force-pushed the ps_dont_close_conn_pool_on_cluster_topology_refresh branch from 1087ffd to bdd9d11 Compare January 27, 2026 14:39
@petyaslavova petyaslavova force-pushed the ps_dont_close_conn_pool_on_cluster_topology_refresh branch from 1a57ee1 to 854dfa9 Compare January 27, 2026 15:25
@petyaslavova petyaslavova merged commit c40ec52 into master Jan 27, 2026
101 of 104 checks passed
@petyaslavova petyaslavova deleted the ps_dont_close_conn_pool_on_cluster_topology_refresh branch January 27, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Maintenance (CI, Releases, etc)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants