Conversation
| { | ||
| Name: "test_global_token_limit", | ||
| UniqueKey: "account:12345", | ||
| Algorithm: guber.Algorithm_TOKEN_BUCKET, |
There was a problem hiding this comment.
Instead of hardcoding the algorithm here you can do something like
algorithms := []guber.Algorithm{guber.Algorithm_TOKEN_BUCKET, guber.Algorithm_LEAKY_BUCKET}
for _, algo := range algorithms {
t.Run(algo.String(), func(t *testing.T) {
// Send two hits that should be processed by the owner and the peer and deplete the remaining
sendHit(algo, guber.Status_UNDER_LIMIT, 1)And you can remove the other test (assuming that everything else is the same)
There was a problem hiding this comment.
I think it makes sense to keep the tests split personally for clarity but its a good point that we could merge if we expect the results to always be the same
There was a problem hiding this comment.
Always avoid DRY in tests, functional tests should be clear and verbose, and should generally not change once written.
| // | ||
| // We cannot preform this for LEAKY_BUCKET as we don't know how much time or what other requests | ||
| // might have influenced the leak rate at the owning peer. | ||
| // (Maybe we should preform the leak calculation here?????) |
There was a problem hiding this comment.
As a newbie to this codebase, this comment suggests that we are breaking the single responsibility principle and we need to move this elsewhere. My thought was that we need a new API called PeekRateLimit that doesn't affect the hit count and only calculates the right status.
There was a problem hiding this comment.
100% agree with this, as I had the exact same thought. However, I'm planning on removing GRPC so I decided to wait for a future PR.
I could be convinced otherwise!
There was a problem hiding this comment.
I would avoid major refactors that are outside the context of this PR, for which I consider to be a bugfix. A follow up might make sense though.
I agree that sounds like a bit too much logic leaking into the peer. @thrawn01 thanks for taking this up. Changes make sense and lgtm from as best as I can understand the flow. |
philipgough
left a comment
There was a problem hiding this comment.
lgtm - thanks again :)
There was a problem hiding this comment.
Sorry, i'm taking back my approval because we tested this and, at big scale, it doesn't work. We will send a PR soon with our findings. See #218
|
This PR is replaced by #219 |
Over Limit Behavior
The way it's supposed to work is that the owner of the rate limit will broadcast the computed result of the rate limit to other peers, and that is what will be returned to clients who submit hits to a peer. The owner of the rate limit should be setting the
OVER_LIMITstatus before broadcasting.However, we don't store
OVER_LIMITin the cache, See this comment because it is expected that every time the functions inalgorithms.goare called thatOVER_LIMITwill be calculated and set properly. Again, because of this, the owning peer must preform those calculations and set theOVER_LIMITstatus before broadcasting to other peers.As these wonderful contributors have noticed (@miparnisari and @philipgough) It is currently not doing this! As was pointed out by @miparnisari, I do not know why this has not been caught before now. (Probably because of the lack of testing around the global functionality 😥)
At one time, I had envisioned the broadcaster would only send hits to the peers, and then each peer would calculate the correct
OVER_LIMITstatus based on the number of hits received from the owning peer. However, this plan would require more logic in the peers for when things likeRESET_REMAININGor when a negative hit occurs, etc.... Instead I decided the owning peer would broadcast aRateLimitRespto each pear which in turn would be returned to the client as the most up to date status of the rate limit. It looks like I goofed when I made the transition and decided to make a call togetLocalRateLimit()for each queued broadcast. As a consequence I had to setrl.Hits = 0which short circuits theOVER_LIMITcalculation, as a result the broadcasted responses do not have the properOVER_LIMITset. I think the correct way to handle this is to have the broadcaster queue responses and send them without recalculating the current status before broadcast to peers. (Included in this PR)Also, I noted during testing that when using
TOKEN_BUCKETthe non owning peer would respond withUNDER_LIMITon the third hit. (which is kinda correct, because it will only respond with what the owner has broadcast to it) but isn't what most users would expect when usingTOKEN_BUCKET. So I added some code for the peer to returnOVER_LIMITif the broadcast response has a remainder of 0.I think this works for
TOKET_BUCKETbecause the expiration time on the owner and peer are the same, such that when the bucket expires, both the peer and owner will respond with 0 Hits correctly at the same time. However, forLEAKY_BUCKETthe peer doesn't really know what tokens have leaked since we last checked.... I guess we could have the peer calculate the leak... but that sounds icky... Maybe someone can convince me otherwise?