-
Notifications
You must be signed in to change notification settings - Fork 816
Same series may end up in chunks using different fingerprint names #1820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is kind-of opposite to #717 Cortex already dedupes individual metrics, and the series index is based off the sha of the labels, so can you say a bit more about what more needs to be done (without me having to read a 1600-line PR)? |
If Cortex doesn't use fingerprints from Chunk IDs, then this fix isn't necessary. (That would be That linked Loki PR primarily fixes problem with hash collisions, which Loki didn't deal with before (unlike Cortex). Once we started using remapped fingerprints to handle collisions, we also started to write those remapped fingerprints into Chunk IDs. I was told that Loki uses fingerprints from Chunk IDs, so we now make sure to use original (not remapped) fingerprint into Chunk ID. From #717 (which I actually have on my TODO list to look at and possibly fix):
This assumption is wrong. (But this issue is about opposite implication: different fingerprints => different series, which is also wrong.) |
I can see one such usage at cortex/pkg/chunk/series_store.go Line 239 in 77a09cc
where fingerprints are used to detect duplicate chunks. |
Is the problem that you can make a query like |
Thinking about it bit more, using the fix from Loki, we may actually drop valid data, as we already know that we have different series using the same fingerprint, so we don't want to deduplicate based on original (= colliding) fingerprint. Fixing #717 would probably avoid this concern completely. |
I didn't know Cortex already handled fingerprint remapping. The problem that I imagine can happen with remapping fingerprint is when series are being replicated, different ingesters could refer same series with different fingerprints(since its not necessary all ingesters would see same fingerprint collisions) and in turn flush chunks with different IDs with same data. Also the way collisions are handled in Cortex can make different series share the same fingerprint since colliding fingerprints are remapped to have a fingerprint from reserved range i.e 1 to 1<<20. Should we consider dropping fingerprints and using seriesID(which is sha256 of labels) everywhere for sake of uniquely identifying series? |
Looking at the usages of cortex/pkg/chunk/chunk_store_utils.go Lines 63 to 65 in a3eaf06
Someone could do the same not knowing how reliable chunk.Fingerprint is.
|
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
Chunk IDs contain fingerprint of the series. However, due to fingerprint remapping that happens because of fingerprint collisions, same series written by different ingesters may end up in chunks with different fingerprint in the IDs. Spotted by @sandlis, in Loki we fix this in grafana/loki#1247. I can fix it in Cortex too, if we think it's a problem worth fixing.
The text was updated successfully, but these errors were encountered: