Skip to content

Reduce index size by indexing labels->timeseries #718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bboreham opened this issue Feb 19, 2018 · 4 comments
Closed

Reduce index size by indexing labels->timeseries #718

bboreham opened this issue Feb 19, 2018 · 4 comments

Comments

@bboreham
Copy link
Contributor

As noted by @tomwilkie at #607 (comment)

This is how I understand the suggestion: please correct me as necessary.

Currently we index from instance+bucket+metric-name+label-name to chunk. The cardinality of the index can be massive because it has all the chunks for all timeseries, multiplied by the replication factor.

Instead we could do two hops: instance+metric-name+label-name to time-series then time-series+bucket to chunk. The cardinality of the first lookup would be the number of timeseries, and the second would be chunks times replication factor.

Ingesters could cache some of the first index, so they know they don't need to re-write it. (Overwriting is harmless, just wasteful).

I suspect this will make queries slower for cases where the current implementation returns a small number of index entries, but faster where a lot of index entries are returned.

@bboreham
Copy link
Contributor Author

While we're in there...

We could try to reduce hot-spot writing to the same index key (in the current scheme 123:d12345:container_cpu_usage_seconds_total:namespace for example, will be very popular).

We could include sum, count, max, min for each chunk in the index table, and use this for aggregate queries.

@tomwilkie
Copy link
Contributor

My main desire for doing this is that we could reduce the amount of sorting needed in the chunk store, and perhaps even make it completely streaming. If this allows us to overlap chunk fetches and computation, could be a big win for very long queries.

@tomwilkie
Copy link
Contributor

Some initial comments here: #433

@bboreham
Copy link
Contributor Author

Sorry, hadn't spotted the previous issue. I'll close this as a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants