Skip to content

Conversation

@jeffreyzant
Copy link
Contributor

It's been a while since I've opened #17, but a rolling update of Redis caused some information not to be stored due to the downtime that occurs when Sentinel has to elect a new leader. The basic way of connecting hasn't changed and should work on most installs, but it might be best to release this as a new major version if you're okay with the changes.

In short:

  • The Redis methods perform a limited number of retries when the RedisException is retryable or when DNS resolution errors occur. The number of retries and the delay are configurable — by default, 20 retries with a 1-second delay.
  • When the maximum number of retries is reached, a RetryRedisException is thrown.
  • Replaced php-cs-fixer with laravel/pint.
  • Dropped PHP 8.1 support.
  • The GitHub tests now use the same start-redis-cluster.sh script we use for local testing.
  • The start-redis-cluster.sh script now supports automatic restarts when a node goes down.
  • Added and improved unit tests to verify that retrying works as expected.

We rely heavily on this logic in production, so I’d love to help keep maintaining it.

@jeffreyzant jeffreyzant changed the title #17 implementing failover retries Implementing failover retries Jul 17, 2025
Copy link
Owner

@Namoshek Namoshek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR, this looks great and really thorough. I'm pretty much ok with everything, I just noticed one minor detail. But maybe I'm just blind...

Also looks like some of the assertions are flaky, maybe there is something else to assert like a unique connection id (CLIENT ID) or so?

@jeffreyzant
Copy link
Contributor Author

I've also adjusted the tests to check if the Redis run_id is changed. This is identifier is unique for an instance and differs on restarts.

@Namoshek Namoshek merged commit 978d82b into Namoshek:master Jul 18, 2025
6 checks passed
@Namoshek
Copy link
Owner

Looks good to me, I'm releasing this as a major version as suggested.

@streamingsystems
Copy link
Contributor

Hi All,

We are running into a random error on our system where redis will timeout with this message:

connection timed out

In tracing through the code I see that this message is not in the array:

ERROR_MESSAGES_INDICATING_UNAVAILABILITY

As such, the shouldRetryRedisException method is returning false, and (from what I can tell) the system is not retrying.

I am wondering if the omission of this error is on purpose as maybe the timeout does not qualify as something that indicates unavailability.

We have Redis running in production and as you might guess there are lots and lots of hits to redis and it all works perfectly but randomly we will get this timeout. Since it's not retried it the exception will propagate up and cause our job to fail.

Thanks!

-Rob

@Namoshek
Copy link
Owner

Namoshek commented Aug 2, 2025

@streamingsystems I think you should open a new issue witht his information. To me this sounds like you are using some kind of connection pool that is not utilized frequently enough and some connections time out. However, I'm not opposed to add retry handling for this case. Maybe we can even make the error messages configurable (on top of the defaults), so that it is easier to test the changes before merging them.

@streamingsystems
Copy link
Contributor

Ok thanks, I forked your repo and am running some tests and will open a separate issue. Have a great day.

@jeffreyzant
Copy link
Contributor Author

Looks like a good addition! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants