Same series may end up in chunks using different fingerprint names #1820

pstibrany · 2019-11-14T07:53:58Z

Chunk IDs contain fingerprint of the series. However, due to fingerprint remapping that happens because of fingerprint collisions, same series written by different ingesters may end up in chunks with different fingerprint in the IDs. Spotted by @sandlis, in Loki we fix this in grafana/loki#1247. I can fix it in Cortex too, if we think it's a problem worth fixing.

bboreham · 2019-11-14T09:04:54Z

This is kind-of opposite to #717

Cortex already dedupes individual metrics, and the series index is based off the sha of the labels, so can you say a bit more about what more needs to be done (without me having to read a 1600-line PR)?

pstibrany · 2019-11-14T09:25:44Z

If Cortex doesn't use fingerprints from Chunk IDs, then this fix isn't necessary. (That would be Fingerprint field in chunk.Chunk struct)

That linked Loki PR primarily fixes problem with hash collisions, which Loki didn't deal with before (unlike Cortex). Once we started using remapped fingerprints to handle collisions, we also started to write those remapped fingerprints into Chunk IDs. I was told that Loki uses fingerprints from Chunk IDs, so we now make sure to use original (not remapped) fingerprint into Chunk ID.

From #717 (which I actually have on my TODO list to look at and possibly fix):

The chunk store fetch code assumes that fingerprints uniquely identify a timeseries; this is fairly likely when they are looking at a single metric, but we still could get clashes.

This assumption is wrong. (But this issue is about opposite implication: different fingerprints => different series, which is also wrong.)

pstibrany · 2019-11-14T09:32:33Z

If Cortex doesn't use fingerprints from Chunk IDs, then this fix isn't necessary. (That would be Fingerprint field in chunk.Chunk struct)

I can see one such usage at

cortex/pkg/chunk/series_store.go

Line 239 in 77a09cc

filtered, keys := filterChunksByUniqueFingerprint(filtered)

where fingerprints are used to detect duplicate chunks.

bboreham · 2019-11-14T09:40:47Z

Is the problem that you can make a query like up and receive a matrix with two rows for the same series (name + labels)?
It would be good to create a unit test, or even an integration test, that showed the problem.

pstibrany · 2019-11-14T09:45:26Z

Thinking about it bit more, using the fix from Loki, we may actually drop valid data, as we already know that we have different series using the same fingerprint, so we don't want to deduplicate based on original (= colliding) fingerprint.

Fixing #717 would probably avoid this concern completely.

sandeepsukhani · 2019-11-14T10:29:32Z

I didn't know Cortex already handled fingerprint remapping. The problem that I imagine can happen with remapping fingerprint is when series are being replicated, different ingesters could refer same series with different fingerprints(since its not necessary all ingesters would see same fingerprint collisions) and in turn flush chunks with different IDs with same data.

Also the way collisions are handled in Cortex can make different series share the same fingerprint since colliding fingerprints are remapped to have a fingerprint from reserved range i.e 1 to 1<<20.
This makes fingerprints unreliable and I can't imagine what use-case they would have.

Should we consider dropping fingerprints and using seriesID(which is sha256 of labels) everywhere for sake of uniquely identifying series?

sandeepsukhani · 2019-11-14T11:43:46Z

Looking at the usages of chunk.Fingerprint, I see a problem only in

cortex/pkg/chunk/chunk_store_utils.go

Lines 63 to 65 in a3eaf06

    
           if _, ok := uniqueFp[chunk.Fingerprint]; ok { 
        
           	continue 
        
           }

which is used in store.LabelNamesForMetricName
Someone could do the same not knowing how reliable chunk.Fingerprint is.

stale · 2020-02-03T10:56:31Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale bot added the stale label Feb 3, 2020

stale bot closed this as completed Feb 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same series may end up in chunks using different fingerprint names #1820

Same series may end up in chunks using different fingerprint names #1820

pstibrany commented Nov 14, 2019 •

edited

Loading

bboreham commented Nov 14, 2019

pstibrany commented Nov 14, 2019

pstibrany commented Nov 14, 2019 •

edited

Loading

bboreham commented Nov 14, 2019

pstibrany commented Nov 14, 2019 •

edited

Loading

sandeepsukhani commented Nov 14, 2019

sandeepsukhani commented Nov 14, 2019 •

edited

Loading

stale bot commented Feb 3, 2020

Same series may end up in chunks using different fingerprint names #1820

Same series may end up in chunks using different fingerprint names #1820

Comments

pstibrany commented Nov 14, 2019 • edited Loading

bboreham commented Nov 14, 2019

pstibrany commented Nov 14, 2019

pstibrany commented Nov 14, 2019 • edited Loading

bboreham commented Nov 14, 2019

pstibrany commented Nov 14, 2019 • edited Loading

sandeepsukhani commented Nov 14, 2019

sandeepsukhani commented Nov 14, 2019 • edited Loading

stale bot commented Feb 3, 2020

pstibrany commented Nov 14, 2019 •

edited

Loading

pstibrany commented Nov 14, 2019 •

edited

Loading

pstibrany commented Nov 14, 2019 •

edited

Loading

sandeepsukhani commented Nov 14, 2019 •

edited

Loading