-
Notifications
You must be signed in to change notification settings - Fork 11
Implement "pinning" cluster connection to a single node - RedisClient::Cluster#with
#298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement "pinning" cluster connection to a single node - RedisClient::Cluster#with
#298
Conversation
lib/redis_client/cluster/router.rb
Outdated
@@ -221,6 +223,26 @@ def close | |||
@node.each(&:close) | |||
end | |||
|
|||
def with(key:, write: true, retry_count: 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[IMO] It looks like this method has two responsibilities. Those are finding a node and yielding a block. I'd say that it would be better to dedicate only finding a node in this method and to call #try_delegate
by RedisClient::Cluster
side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fair enough. I pushed up a new version without this method at all.
- I wrote a new method
find_node_key_by_key
(it's a bit wordy, but does what it says) and re-implementedfind_node_key
andfind_primary_node_key
in terms of that. RedisClient::Cluster
calls@router.find_node_key_by_key
, then@router.find_node
, and finally@router.try_delegate
itself.
6bb5e91
to
2208738
Compare
Uh, so sorry @supercaracal - when you ran the tests I realised I'd forgotten to push one of the files up! I've done that now, hopefully it makes more sense! |
We need this in order to get the list of keys from a WATCH command. It's probably a little overkill, but may as well implemenet the whole thing.
Just because a block is not going to be retried, does not mean we should not process topology updates & redirections in response to errors, I think.
2208738
to
660f693
Compare
I'd say that we can more simplify it like the following fix: The above example accepts some compromises.
It seems that Redis server throws an error when we execute commands to multi slots and discards them. So I think we don't necessarily have to validate keys before hand.
What do you think? |
Thanks for your thoughts.
This I'm fine with - it's easy enough to implement in terms of
In terms of the rest of the complexity in this PR, I would respectfully say that I think it's worth it, in my opinion. The complexity comes really in two main chunks:
Doing the delegation instead of exposing the underlying
I understand that as an open-source maintainer, you don't want people to throw complex code at you and then walk away when they're done. It is not at all my intention to do that with this contribution! Myself and @learoyklinginsmith have been working for the last few months on upgrading redis-rb in our large monolith application at Zendesk, and we (and the people who do our jobs after we're gone) will have an ongoing interest in keeping this code working well. We're going to find more bugs in this stuff as we roll this out to thousands of instances in production, and we're going to fix them and send the patches back. And if other people find bugs in any of this code, we have a real interest in fixing them because they probably affect us too! I guess the summary of this long message is: I know this code is complex, but it's complexity buys real features that we and I believe others will want (not throwing errors during resharding), and we plan to be actively engaged here to help you maintain it, rather than simply throwing code over the wall. |
Previously, extract_first_key would perform hashtag extraction on the key it pulled from command; that meant that the "key" it was returning might not actually be the key in the command. This commit refactors things so that * extract_first_key actually extracts the first key * KeySlotConverter.convert does hashtag extraction * Router's find_node_key and find_primary_node_key can now be implemented in terms of a function "find_node_by_key", because they _actually_ get the key from the command.
try_send already handles the case where the call is a blocking one specially, but try_delegate does not. This diff detects whether or not a ::RedisClient::ReadTimeoutError is for a blocking call at the source, and wraps it in a special subclass so we can differentiate it.
This implements support for "pinning" a cluster client to a particular keyslot. This allows the use of WATCH/MULTI/EXEC transactions on that node without any ambiguity. Because RedisClient also implements watch as "yield self" and ignores all arguments, this means that you can easily write transactions that are compatible with both cluster and non-clustered redis (provided you use hashtags for your keys).
This does change the behaviour ever so slightly (see updated tests), but I believe for the better.
660f693
to
ead279b
Compare
@supercaracal just wondering if you'd had any further thoughts on this? |
I apologize for my delayed response. Thank you for your feedbacks from real world. It means a lot. Sorry for several bugs. I think they should be fixed. I feel easy to merge by seperated and fined pull requests as possible. In As you said, cluster clients should handle redirection. But I feel the way to validate all keys of all commands in client side is going to far from simple implementation. I'd like to consider
I'd like to refer the envoy implementation.
Our client aims being simple. I'd like to find compromise. I'm sorry for my bad English. I'm not good at English. |
No, thank you for maintaining this library, and thank you for your detailed feedback on this issue so far.
To be clear, let's take this block of code as an example.
My current implementation in this PR will raise I prefer to do the check ahead of time here, and raise However, we can always add this checking later on. So, let's initially merge this feature without proactively checking the slot, and we can perhaps revisit this discussion some time next year if we find e.g. that people are raising issues about this behaviour?
Again, to be sure I understand you, some code. Let's imagine that this slot has been newly resharded, but we don't know it yet.
In my current implementation, the order of events is the following
What you are proposing is that the order of events should instead be:
Your way doesn't require that One reason I think we might need the proxy implementation, though, is what happens if we have two different
The MOVED response is for
The envoy implementation seems to work by detecting transaction-starting commands (they mention MULTI, but we'd need to do WATCH as well), then waiting for the next command which specifies a key, and then sending commands to a connection based on that. I can have a go at doing it that way, although I think it'll look similar to #295 (although I'm sure I can do a better job of it now). The only bad thing about this approach is that you can't call block-based methods on TL;DR - I'll have another go at that tomorrow and open a new PR.
No need to be sorry - your english is great and way better than any other language I speak! |
Thank you for your detailed description. I hadn't been able to consider the nested calling with multiple client instances. Certainly, the corner case is hard to support by simplified implementation. But I feel that corner cases like that would be better to be able to handle by user side. I think it would be better to add a public method to client to be able to refresh the state of the cluster. More safer is certainly better. But who most knows the implementation in the block is the user. I think we might as well leave some responsibility to up to user side. I'll look into reference implementations in other languages. |
I had a look at some of them and now I'm just sad. There are allegedly five official clients: https://redis.io/resources/clients/ PythonThis one's easy, they just don't support MULTI at all with redis cluster mode def multi(self):
""" """
raise RedisClusterException("method multi() is not implemented") JavaAlso very easy: @Override
public Transaction multi() {
throw new UnsupportedOperationException();
} .NETAs far as I can tell, it doesn't really support cluster mode at all (there's a recently merged PR which purports to do so, but it doesn't look like it implements even the most basic routing support? redis/NRedisStack#170 GoThe Go client has two implementations, depending on whether you're using There's a method
This is actually vulnerable to the same problem I identified with the nested calls to In other respects, it kind of behaves like what you suggested, where the underlying single-node connection object is passed directly into the callback. NodejsIt seems like node.js does support transactions...
But it doesn't look like WATCH is supported. |
I guess of these, the Go one is clearly the best (and is the closest to what I'm proposing in this PR as well - although it doesn't proxy the underlying Redis connections, it just passes them straight through). |
I appreciate that your detailed investigation of referencing implementations for other clients. Most of them look like there may be no elaborate implementation yet. It might be just because complicated use cases are a few.
Yes, I would. I'd say that separating like the following could become easy to discussion and merge.
|
Hello @supercaracal !
This is a followup to our discussion on #294. This implements a new method
RedisClient::Cluster#with
.This method yields a proxy object which supports all the things
RedisClient::Cluster
would normally support, but wraps them in a check to see that there is no cross-slot access. I checked in some documentation which explains how to use it.The options it takes are:
key:
(mandatory) The key to hash to determine what slot to lock the yielded proxy to. Just passing a hashtag by itself is OK here.on_primary:
(defaults to true). Whether to use the primary or (possibly) a replica to run commands on. I didn't document or test this because I'm not actually sure it's a good idea; on_primary: true will definitely run commands on the primary, buton_primary: false
would run commands according to the configured replica affinity; which might still be the primary if you're using e..g:primary_only
or:random_with_primary
. I can probably delete this for now if you don't think it'll be useful (for myself, we run with:primary_only
affinity, so I don't need it).retry_count:
(defaults to zero). Whether to retry the block in the event of node connectivity or slot re balancing issue.s Since retrying user-provided blocks is not guaranteed to be idempotent, I guess this should default to not.This goes along with a branch I have ready to go for redis-clustering as well: https://github.com/redis/redis-rb/compare/zendesk:ktsanaktsidis/handle_clustering_with. Once we're happy with how this looks, I can open up a PR for that as well. The clustering side of this essentially wraps
#with
so that the yielded proxy also responds to the redis-rb DSL.What I have NOT done anything about in this PR is the existing implementation of
RedisClient::Cluster#multi
. I think our options are:#with
, by lazily waiting for the first command in the transaction, and then calling#with
for that node (I'm not 100% sure this will work though - we might have to reach into the privates ofredis-client
to e.g. call#checkin
and#checkout
on pooled instances outside of a block-scope).I don't know what your feelings on backwards compatibility for this gem is, but I think personally we should just delete it if that's acceptable. It's confusing to have multiple ways to do transactions in the gem, and I think all use-cases can be covered by the
#with
interface.Anyway - keen to hear your thoughts, thanks again for working with me on this!