Reduce index size by indexing labels->timeseries #718

bboreham · 2018-02-19T14:56:40Z

As noted by @tomwilkie at #607 (comment)

This is how I understand the suggestion: please correct me as necessary.

Currently we index from instance+bucket+metric-name+label-name to chunk. The cardinality of the index can be massive because it has all the chunks for all timeseries, multiplied by the replication factor.

Instead we could do two hops: instance+metric-name+label-name to time-series then time-series+bucket to chunk. The cardinality of the first lookup would be the number of timeseries, and the second would be chunks times replication factor.

Ingesters could cache some of the first index, so they know they don't need to re-write it. (Overwriting is harmless, just wasteful).

I suspect this will make queries slower for cases where the current implementation returns a small number of index entries, but faster where a lot of index entries are returned.

bboreham · 2018-02-19T15:00:47Z

While we're in there...

We could try to reduce hot-spot writing to the same index key (in the current scheme 123:d12345:container_cpu_usage_seconds_total:namespace for example, will be very popular).

We could include sum, count, max, min for each chunk in the index table, and use this for aggregate queries.

tomwilkie · 2018-02-19T15:23:12Z

My main desire for doing this is that we could reduce the amount of sorting needed in the chunk store, and perhaps even make it completely streaming. If this allows us to overlap chunk fetches and computation, could be a big win for very long queries.

tomwilkie · 2018-02-19T21:57:49Z

Some initial comments here: #433

bboreham · 2018-02-20T14:56:44Z

Sorry, hadn't spotted the previous issue. I'll close this as a duplicate.

bboreham added the component/querier label Feb 19, 2018

bboreham closed this as completed Feb 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce index size by indexing labels->timeseries #718

Reduce index size by indexing labels->timeseries #718

bboreham commented Feb 19, 2018

bboreham commented Feb 19, 2018

tomwilkie commented Feb 19, 2018

tomwilkie commented Feb 19, 2018

bboreham commented Feb 20, 2018

Reduce index size by indexing labels->timeseries #718

Reduce index size by indexing labels->timeseries #718

Comments

bboreham commented Feb 19, 2018

bboreham commented Feb 19, 2018

tomwilkie commented Feb 19, 2018

tomwilkie commented Feb 19, 2018

bboreham commented Feb 20, 2018