Query optimisation ideas

(Please don't put any discussion / investigation on this ticket, its just for gathering all the ideas)

Target:
- <100ms 99%
- measured at authfe
- for upto week-long queries, with 5 matches of 3 k=v matchers each

Architecture
- [ ] add a long lived cached for DynamoDB rows which we know won't change ("closed" buckets) (#964)
- [ ] parallelise queries across "workers" by time range, for long queries (#963)
- [ ] pipeline DynamoDB pagination & chunk fetch (#962)
- [x] add a very short-lived cached for DynamoDB rows (ie 3s) (#947)
- [x] log each query with stats (#135)
- [x] add in-process chunk cache (#685)
- [x] parallelise chunk fetch (#603)
- [x] write extra index entries for equality matchers
- [x] do memcache write backs asynchronously
- [x] consider using [gogoprotobuf](https://github.com/gogo/protobuf), which claims to be faster
- [x] use gRPC from authfe <-> cortex to minimise tcp connection setup time (#235)
- [x] increase size of memcaches, they are currently puny (https://github.com/weaveworks/service-conf/commit/94ce8a698310e3f7407f2fca71436728d696425f)
- [x] separate query service (#194)
- [x] parallelisation of ingester / chunk store fetch  (#132)  (#150)
- [x] filter of chunks by time range before we fetch them (#132) (#149)

Chunk iterators:
- [x] lazily merge / iterate chunks (#713)
- [x] optimise merge (#81)
- [x] save partial results to cache (#605)

Index optimisations (idea: minimise index reads per query):
- [ ] We could notice when we're doing overlapping queries (for instance, fetching all chunk ids for a metric vs some subset) and do the filtering client side (ie `(count(max(node_cpu{job="monitoring/prom-node-exporter"}) by (cpu, node)) by (node) - sum(irate(node_cpu{job="monitoring/prom-node-exporter",mode="idle"}[1m])) by (node)) / count(max(node_cpu{job="monitoring/prom-node-exporter"}) by (cpu, node)) by (node) * 100` fetches `node_cpu{job="monitoring/prom-node-exporter"}` and `node_cpu{job="monitoring/prom-node-exporter",mode="idle"}`, the later could be filtered client side) (#967)
- [ ] We could consider not doing parallel queries for the same hash-key, as this could overload a shard
- [ ] figure out if one can do `foo{bar=~"bar,*", bar!="bazzer"}`, and if so optimise it (#966)
- [ ] filter inequality expressions in DynamoDB (similar to #402 but for not-equals) (#965)
- [x] We could consider keeping metrics of how many chunks match certain `k=v` combinations, then when presented with an intersection operation only fetch the smallest k=v combinations - ie `__name__=foo` will match lots of chunks, as will `instance=bar`, but `job=baz` might match fewer.  Either way we don't necessarily need to query all `k=v` and intersect, once we get to a certain size we could just fetch the chunks (which contain the full metric) and filter. (#884)
- [x] see if we can filter equality matchers on the server (dynamo) side (#398)
- [x] perhaps include chunk prefix in search criteria, for short recent query optimisations
- [x] Include chunk start in rangeKey, and use that during fetches to minimise amount of data fetched (#298)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query optimisation ideas #209

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query optimisation ideas #209

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions