Skip to content

Read from all ingesters, introduce 2 rings, fix deregistration #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

juliusv
Copy link
Contributor

@juliusv juliusv commented Oct 10, 2016

Probably not related to the consul issues we're seeing, but an ingester
should unregister itself before shutting down, because it rejects new
samples during shutdown.

@tomwilkie
Copy link
Contributor

This will break query correctness.

On Mon, Oct 10, 2016 at 6:36 AM, Julius Volz [email protected]
wrote:

Probably not related to the consul issues we're seeing, but an ingester
should unregister itself before shutting down, because it rejects new

samples during shutdown.

You can view, comment on, or merge this pull request online at:

#43
Commit Summary

  • Unregister ingester before shutting down

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#43, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAbGha39q8QKK741oDr09BamfGtKMCLxks5qyj9ugaJpZM4KSkqE
.

@juliusv
Copy link
Contributor Author

juliusv commented Oct 10, 2016

@tomwilkie Damn, good point. Options:

  • deregister the ingester only for ingestion, but keep it registered for queries (basically, two separate rings?)
  • keep it as it is and just let it return errors while it's shutting down? But we don't have logic in the distributor yet for sending data to multiple ingesters, so the samples will just get lost while an ingester is shutting down, which can take a while...

Any other ideas?

@juliusv juliusv changed the title Unregister ingester *before* shutting down Read from all ingesters, introduce 2 rings, fix deregistration Oct 11, 2016
@juliusv
Copy link
Contributor Author

juliusv commented Oct 11, 2016

@jml and I talked a long time over the general lifecycle problem today and concluded that there's no feasible way (that we could think of) to make queries work with a consistent hashing approach, as one would need to keep around an arbitrary number of old ring states to find all possible ingesters that may still have data for a given query (and then query all of them). We think the correct long-term solution would be to run a separate in-memory indexing service which would tell us which time series resides in which ingester, but building that (especially in a horizontally scalable way) will be a longer-term effort.

To get results for the near-term, we've decided that querying all ingesters and merging the results for now is the least-bad stop-gap for now. It comes with obvious downsides: if one ingester is broken/unreachable, the entire query fails. Also, queries will be somewhat slower (although ingesters which have no results for a given query will return an empty result really fast). So this approach will get worse and worse the more ingesters we have (both reliability + speed).

Since we're now always querying all ingesters, that does give us some benefits though:

  • we could allow queries that don't include a metric name
  • we could shard by the entire labelset, achieving better load distribution over ingesters

I pushed that change into this PR.

@rade
Copy link

rade commented Oct 11, 2016

if one ingester is broken/unreachable, the entire query fails.

Why? Why not just ignore it?

Copy link
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a bug.

// Allow some time for the last appends to this ingester to
// complete after unregistering (looking up an ingester and appending
// to it is not atomic).
time.Sleep(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

fpToSampleStream := map[model.Fingerprint]*model.SampleStream{}

// Fetch samples from all ingesters and group them by fingerprint (unsorted
// and with overlap).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed you were going to do this in parallel. Any reason why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but it's more complexity - it's probably not going to be really noticeable at current ingester scale. Reevaluate later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later is fine. (Although I'll note that moving some of this to a separate function would perhaps reduce complexity).

return err
matrix := make(model.Matrix, 0, len(fpToSampleStream))
for _, ss := range fpToSampleStream {
matrix = append(matrix, ss)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. How does matrix get returned to the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was meant to append to the result defined at the top. Pushing fix.

@jml
Copy link
Contributor

jml commented Oct 11, 2016

if one ingester is broken/unreachable, the entire query fails.

Why? Why not just ignore it?

Because it's better to fail outright than to lie.

@juliusv
Copy link
Contributor Author

juliusv commented Oct 11, 2016

@rade If we fail to query one ingester, we don't know if the query is incomplete or not, and silently returning incomplete results seems worse than failing.

@rade
Copy link

rade commented Oct 11, 2016

How does the querier recover from such an error?

@juliusv
Copy link
Contributor Author

juliusv commented Oct 11, 2016

How does the querier recover from such an error?

It doesn't and it can't, so it has to fail the entire query to the user. That's the problem with the current architecture. We don't know in which ingester the data we want to query lives (at least not after the ring has changed a couple of times), so we either need to query all of them or have an architecturally better way of finding out which ingesters we really need (that separate index service).

@juliusv
Copy link
Contributor Author

juliusv commented Oct 11, 2016

@jml As discussed, removed the sleep between deregistering and actually shutting down for now, for simplicity.

@rade
Copy link

rade commented Oct 11, 2016

It doesn't and it can't, so it has to fail the entire query to the user.

I don't get it. The querier has to get the list of all ingesters from somewhere. That list can always be out of date by the time the querier performs the query - there may be fewer or more ingester in existence at the time. If query correctness depends on querying all ingesters then either case should result in a query failure. But it seems like only the "fewer" case would cause an error in the proposed design.

@juliusv
Copy link
Contributor Author

juliusv commented Oct 11, 2016

@rade With an index service, this could be avoided in the orderly shutdown case: when an ingester shuts down, it:

  • deregisters itself for appends in the index service
  • waits a bit for in-flight appends to finish
  • flushes out its data to AWS
  • deregisters itself for queries in the index service
  • waits a bit for in-flight queries to finish
  • shuts down completely

In the case of a crash, of course, you're screwed for a short while between the crash of the ingester and the time when the index service removes the ingester from its index (due to lack of heartbeats or similar). To solve that correctly without returning errors or incorrect query results to the user, you need to spread ingested data redundantly over multiple ingesters and then try each of the redundant ingesters in turn, without failing if only one of them is down.

@rade
Copy link

rade commented Oct 11, 2016

I was not questioning the eventual fix, but the supposed temporary fix here. In particular, I am questioning why "an ingester got removed" should be treated as an error when "an ingester got added" is not.

@juliusv
Copy link
Contributor Author

juliusv commented Oct 11, 2016

@rade Ah right, sorry. But a completely new ingester would realistically not have a chance to accumulate any significant data between the first samples being sent to it and the first queries hitting it. The race condition there would only be a couple of milliseconds (and we could even register for queries first and only a second later for appends, if this was a problem), so from the user's point of view it just looks like the samples haven't quite made it to storage yet, which is fine. On the other hand, if an ingester with hours worth of data goes away and we can't query its data anymore, we have a big problem and lots (potentially all) missing data.

@rade
Copy link

rade commented Oct 11, 2016

if an ingester with hours worth of data goes away and we can't query its data anymore, we have a big problem and lots (potentially all) missing data.

Aren't we redundantly sending data to multiple ingesters, precisely to counter that?

Also...

{The querier} doesn't and it can't {recover}, so it has to fail the entire query to the user.

So this query will fail forever?

@jml
Copy link
Contributor

jml commented Oct 11, 2016

if an ingester with hours worth of data goes away and we can't query its data anymore, we have a big problem and lots (potentially all) missing data.

Aren't we redundantly sending data to multiple ingesters, precisely to counter that?

Not yet.

{The querier} doesn't and it can't {recover}, so it has to fail the entire query to the user.

So this query will fail forever?

No. The user can retry, and then the result might be successful.

Copy link
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. (Sorry for delay in pushing the button).

@rade
Copy link

rade commented Oct 11, 2016

The user can retry, and then the result might be successful.

It will never succeed after an ingester has failed though, will it?

@jml
Copy link
Contributor

jml commented Oct 11, 2016

Depends on the failure.

  • requests from distributor to ingester can fail for many reasons (e.g. intermittent network failure, race conditions in code, overload)
  • when they fail, the distributor has no way of completing the query in the face of that error, because it's not clear what it means or why it happened, or how long it will last for, etc.
  • therefore the distributor should abort the query

Retrying might result in success depending on the kind of error. We could do this automatically but that often leads to cascading failures.

@rade
Copy link

rade commented Oct 11, 2016

Retrying might result in success depending on the kind of error

Yes of course. My question was specifically about an actual ingester failure. i.e. if a query failed because an ingester failed catastrophically (i.e. without removing itself from consul), then that query will never succeed on retry, at least not until some external intervention to clean up consul - true or false? And since with this PR every query is sent to all ingesters, the catastrophic failure of single ingester breaks all queries until external intervention.

We could do this automatically but that often leads to cascading failures.

which is why I didn't suggest that ;)

@jml
Copy link
Contributor

jml commented Oct 12, 2016

We spoke today with @tomwilkie, who mentioned that the original plan was to have a field in the ring for tracking token (ingester) state, that was meant to go in https://github.com/weaveworks/prism/blob/master/ring/model.go#L33. This contrasts to the two ring approach.

The other element is that replication is the solution to our problems. I'm going to start making some notes on that in the ingester lifecycle design doc.

@jml
Copy link
Contributor

jml commented Oct 17, 2016

Superseded by #49?

@juliusv
Copy link
Contributor Author

juliusv commented Oct 17, 2016

Yes, thanks!

@juliusv juliusv closed this Oct 17, 2016
@tomwilkie tomwilkie deleted the unregister-order branch November 5, 2016 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants