Skip to content

Don't query high cardinality labels. #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Jul 16, 2018

Fixes #884

This change limits the cardinality for labels; if the limit is exceeded, queries can still work, but there must be another selector with lower cardinality.

Notably, after this change, queries on two high-cardinality labels that would have results in a small number of series will fail.

Builds on seriesStores, so includes #875.

This will allow us to vary the store implementation over time, and not just the schema.
This will unblock the new bigtable storage adapter (using columns instead of rows), and allow us to more easily implement the iterative intersections and indexing of series instead of chunks.

Signed-off-by: Tom Wilkie <[email protected]>
…r this schema.

I tried to adapt the original chunk store to support this style of indexing - easy on the right path, but the read path became even more of a rats nest.  So I factored out the common bits as best I could and made a new chunk store - the seriesStore.

Signed-off-by: Tom Wilkie <[email protected]>
@tomwilkie tomwilkie changed the title [WIP] Don't query high cardinality labels. Don't query high cardinality labels. Jul 16, 2018
@tomwilkie
Copy link
Contributor Author

After some testing, this improved latency for those particular queries mentioned in #884 by >2x.

var schemas = []struct {
name string
fn func(cfg SchemaConfig) Schema
schemaFn schemaFactory
storeFn storeFactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this member to come in as part of the refactor to multiple stores.

var schemas = []struct {
name string
fn func(cfg SchemaConfig) Schema
requireMetricName bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected storeFn to arrive in this commit, even though they would all be the same.

"time"
)

// heapCache is a simple string -> interface{} cache which uses a heap to manage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to write another cache?

)

// heapCache is a simple string -> interface{} cache which uses a heap to manage
// evictions. O(log N) inserts, O(n) get and update.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it should be O(1) get - is that a typo or am I missing something?

@tomwilkie
Copy link
Contributor Author

Going to roll this into #875 as I'm struggling to keep track of all my open PRs.

@tomwilkie tomwilkie closed this Jul 25, 2018
@tomwilkie tomwilkie deleted the high-cardinality-matchers branch July 25, 2018 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants