Implementing failover retries #50

jeffreyzant · 2025-07-17T13:46:43Z

It's been a while since I've opened #17, but a rolling update of Redis caused some information not to be stored due to the downtime that occurs when Sentinel has to elect a new leader. The basic way of connecting hasn't changed and should work on most installs, but it might be best to release this as a new major version if you're okay with the changes.

In short:

The Redis methods perform a limited number of retries when the RedisException is retryable or when DNS resolution errors occur. The number of retries and the delay are configurable — by default, 20 retries with a 1-second delay.
When the maximum number of retries is reached, a RetryRedisException is thrown.
Replaced php-cs-fixer with laravel/pint.
Dropped PHP 8.1 support.
The GitHub tests now use the same start-redis-cluster.sh script we use for local testing.
The start-redis-cluster.sh script now supports automatic restarts when a node goes down.
Added and improved unit tests to verify that retrying works as expected.

We rely heavily on this logic in production, so I’d love to help keep maintaining it.

Namoshek

Thank you for this PR, this looks great and really thorough. I'm pretty much ok with everything, I just noticed one minor detail. But maybe I'm just blind...

Also looks like some of the assertions are flaky, maybe there is something else to assert like a unique connection id (CLIENT ID) or so?

src/Connectors/PhpRedisSentinelConnector.php

… port

jeffreyzant · 2025-07-18T09:15:40Z

I've also adjusted the tests to check if the Redis run_id is changed. This is identifier is unique for an instance and differs on restarts.

Namoshek · 2025-07-18T13:15:00Z

Looks good to me, I'm releasing this as a major version as suggested.

streamingsystems · 2025-08-02T01:41:37Z

Hi All,

We are running into a random error on our system where redis will timeout with this message:

connection timed out

In tracing through the code I see that this message is not in the array:

ERROR_MESSAGES_INDICATING_UNAVAILABILITY

As such, the shouldRetryRedisException method is returning false, and (from what I can tell) the system is not retrying.

I am wondering if the omission of this error is on purpose as maybe the timeout does not qualify as something that indicates unavailability.

We have Redis running in production and as you might guess there are lots and lots of hits to redis and it all works perfectly but randomly we will get this timeout. Since it's not retried it the exception will propagate up and cause our job to fail.

Thanks!

-Rob

Namoshek · 2025-08-02T07:45:05Z

@streamingsystems I think you should open a new issue witht his information. To me this sounds like you are using some kind of connection pool that is not utilized frequently enough and some connections time out. However, I'm not opposed to add retry handling for this case. Maybe we can even make the error messages configurable (on top of the defaults), so that it is easier to test the changes before merging them.

streamingsystems · 2025-08-02T15:34:57Z

Ok thanks, I forked your repo and am running some tests and will open a separate issue. Have a great day.

jeffreyzant · 2025-08-11T10:03:44Z

Looks like a good addition! 👍

jeffreyzant added 15 commits June 27, 2025 16:38

setup retry logic

2669998

move retry logic, only retry exceptions that are connection related

cda6a06

restructure and rename logic

ddd6b51

refactor, improve tests and add monitor/restart to the testing cluster

4526dd8

use laravel/pint for codestyle fixer

1159b14

add debug command to the testing cluster

b6e8b32

ignore name resolution errors on disconnecting

67964eb

allow the initial connection to be retried aswell

a812e0a

move retry logic to the RetryService

0dc479a

try to trigger github actions

8920076

fix styling and enable debug command on redis

ac1baa2

drop php8.2 in tests

95dac31

fix workflow tests, rename RetryService to Manager, improve testability

eb49118

wrap retries for testability, add initial connection test

b94706c

format RetryManager

8f2c5cd

jeffreyzant changed the title ~~#17 implementing failover retries~~ Implementing failover retries Jul 17, 2025

Namoshek reviewed Jul 17, 2025

View reviewed changes

src/Connectors/PhpRedisSentinelConnector.php Show resolved Hide resolved

add RetryContext for config sharing, test on server/run_id instead of…

ef4ef25

… port

Namoshek merged commit 978d82b into Namoshek:master Jul 18, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementing failover retries #50

Implementing failover retries #50

Uh oh!

jeffreyzant commented Jul 17, 2025

Uh oh!

Namoshek left a comment •

edited

Loading

Uh oh!

Uh oh!

jeffreyzant commented Jul 18, 2025

Uh oh!

Uh oh!

Namoshek commented Jul 18, 2025

Uh oh!

streamingsystems commented Aug 2, 2025

Uh oh!

Namoshek commented Aug 2, 2025

Uh oh!

streamingsystems commented Aug 2, 2025

Uh oh!

jeffreyzant commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implementing failover retries #50

Implementing failover retries #50

Uh oh!

Conversation

jeffreyzant commented Jul 17, 2025

Uh oh!

Namoshek left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeffreyzant commented Jul 18, 2025

Uh oh!

Uh oh!

Namoshek commented Jul 18, 2025

Uh oh!

streamingsystems commented Aug 2, 2025

Uh oh!

Namoshek commented Aug 2, 2025

Uh oh!

streamingsystems commented Aug 2, 2025

Uh oh!

jeffreyzant commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Namoshek left a comment •

edited

Loading