-
Notifications
You must be signed in to change notification settings - Fork 818
Stop early-return after majority success in distributor writes #732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop early-return after majority success in distributor writes #732
Conversation
b8e6b11
to
67ce380
Compare
67ce380
to
2ac4b1f
Compare
pkg/distributor/distributor.go
Outdated
} | ||
|
||
func shortCircuit(err error, sampleTracker *sampleTracker, pushTracker *pushTracker) { | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the error handling be moved up into sendSamples
so that waitForAll
and shortCircuit
use the same error handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but there is some subtle logic differences between the two (returning after an error, vs always incrementing sampleTracker.finished). We decided that a little bit of duplication was ok, since it allowed us to leave the previous logic intact. We can refactor it out if you think that is worthwhile however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you mean, pushed another commit doing the refactor.
Looks plausible but would need to be re-done after #681. And I'm not sure it should be optional; as noted at #730 this behaviour means that we have can't bear a single ingester failure with 3x replication. I think the code will be somewhat simpler without early return. What sort of impact did you see when running it? |
I don't quite understand why it means that we can't bear a single ingester outage with a rep factor of 3, given that we have to experience > MaxFailures before a push errors out. If I recall correctly (@csmarchbanks, correct me if I'm way off), we tested it with a set of 3 ingesters, replication factor = 3, scaled down to 2 pods and watched the pushes continue on as expected. We didn't test a single-ingester timeout, though, as I'm not sure how to cause that particular failure case. |
Oh, I think I misunderstood - it seems that you mean currently (master) doesn't allow for a single failure w/ rep factor = 3. I took your comment as "this PR introduces this bad behavior". Oops. :-) The new code should alleviate this, yes? |
What I meant to highlight was that this PR makes the bad behaviour optional. |
I'm slowly getting caught up, my apologies. :-) Agreed, we can just yank the option to behave badly. |
But it also introduces other bad behaviour, namely less tolerance to single failures, in particular single failures that cause request to be delayed / never replied to. These now delay the return, which presumably reduces throughput, increases buffer sizes and memory usage, etc. |
What @rade said was why I made this optional. Here are some timing graphs for before/after the flag was turned on. |
Not a fan of letting the administrator chose between two bad options. Slow-down is visible and recovers if transient; dropping data is silent and unrecoverable. |
Surely the correct fix is to return after majority success, but don't cancel the downstream request. Or make ingesters ignore the cancel. Under what circumstances is it actually desirable to cancel an ingester write? |
That sounds plausible. It’s the single shared We would want to cancel after a reasonable timeout to avoid #672 |
So sounds like: Make a new context for each request with a timeout of say 10s, and keep the returning logic intact? |
If so, take a look at #736 |
Closing in favor of #736 |
Allows you to configure the distributor to wait for all ingester requests to complete before returning to the client. Since this will cause a slowdown, we made it as a configuration variable.
See issue: #730