You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is how I understand the suggestion: please correct me as necessary.
Currently we index from instance+bucket+metric-name+label-name to chunk. The cardinality of the index can be massive because it has all the chunks for all timeseries, multiplied by the replication factor.
Instead we could do two hops: instance+metric-name+label-name to time-series then time-series+bucket to chunk. The cardinality of the first lookup would be the number of timeseries, and the second would be chunks times replication factor.
Ingesters could cache some of the first index, so they know they don't need to re-write it. (Overwriting is harmless, just wasteful).
I suspect this will make queries slower for cases where the current implementation returns a small number of index entries, but faster where a lot of index entries are returned.
The text was updated successfully, but these errors were encountered:
We could try to reduce hot-spot writing to the same index key (in the current scheme 123:d12345:container_cpu_usage_seconds_total:namespace for example, will be very popular).
We could include sum, count, max, min for each chunk in the index table, and use this for aggregate queries.
My main desire for doing this is that we could reduce the amount of sorting needed in the chunk store, and perhaps even make it completely streaming. If this allows us to overlap chunk fetches and computation, could be a big win for very long queries.
As noted by @tomwilkie at #607 (comment)
This is how I understand the suggestion: please correct me as necessary.
Currently we index from instance+bucket+metric-name+label-name to chunk. The cardinality of the index can be massive because it has all the chunks for all timeseries, multiplied by the replication factor.
Instead we could do two hops: instance+metric-name+label-name to time-series then time-series+bucket to chunk. The cardinality of the first lookup would be the number of timeseries, and the second would be chunks times replication factor.
Ingesters could cache some of the first index, so they know they don't need to re-write it. (Overwriting is harmless, just wasteful).
I suspect this will make queries slower for cases where the current implementation returns a small number of index entries, but faster where a lot of index entries are returned.
The text was updated successfully, but these errors were encountered: