Fix intermittently failing TestIngesterTransfer#661
Merged
Conversation
Things that are normal occurences are just debugs.
This fixes a race condition in `TestIngesterTransfer`. I don't think it makes the actually-running-in-production situation any worse.
Contributor
|
I note this change leaves the code holding |
bboreham
approved these changes
Jan 18, 2018
Contributor
bboreham
left a comment
There was a problem hiding this comment.
I think this change makes the sequence more correct in production - the new ingester will be active and own the tokens before the old starts to shut down.
The length of time it takes the old ingester to actually shut down is unpredictable, so events could have happened in either order before.
Contributor
|
If you happen to be changing this further, I'd say a log message at the end of |
Contributor
Author
|
Might conflict w/ #654 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes the intermittently failing
TestIngesterTransfer.The issue was that the
TransferChunksmethod (called as an RPC by the departing ingester) would signal that it was complete (SendAndClose) before it claimed the ring and updated its state to active.The test, however, runs a query immediately after the departing ingester has shut down.
Thus, there's a very small window after
Shutdownhas terminated but beforeClaimTokensForandChangeStatehave run. When we run the query in this window, we get no results, because there are no ingesters that have the chunks we need.I've fixed this by moving the call to
SendAndCloseto the end ofTransferChunks. I think this is the right approach, but am not 100% sure.An alternative would be to prevent the race in the test without changing the production code. To do this, we'd add the following snippet just after the call to
Shutdown():It was writing the comment that convinced me this was a less good approach.
The PR includes a commit that fixes some general, minor logging bugs I noticed while trying to debug the test failure.
I should point out that my snarky comment on #369 was wrong-headed: the test failure is not due to relying on the system clock, but rather to a more vanilla race condition.
Fixes #369