Closed
Description
(Please don't put any discussion / investigation on this ticket, its just for gathering all the ideas)
Target:
- <100ms 99%
- measured at authfe
- for upto week-long queries, with 5 matches of 3 k=v matchers each
Architecture
- add a long lived cached for DynamoDB rows which we know won't change ("closed" buckets) (Cache index entries that will never change for a long time #964)
- parallelise queries across "workers" by time range, for long queries (Parallelise queries across "workers" by time range, for long queries #963)
- pipeline DynamoDB pagination & chunk fetch (Pipeline index lookups, intersections and chunk fetches #962)
- add a very short-lived cached for DynamoDB rows (ie 3s) (Cache index queries for up to 15mins #947)
- log each query with stats (Log per-query statistics #135)
- add in-process chunk cache (Diskcache: querier-local SSD backed chunk cache #685)
- parallelise chunk fetch (Parallel chunk fetching from DynamoDB #603)
- write extra index entries for equality matchers
- do memcache write backs asynchronously
- consider using gogoprotobuf, which claims to be faster
- use gRPC from authfe <-> cortex to minimise tcp connection setup time (HTTP over gRPC: encode HTTP requests as gRPC requests #235)
- increase size of memcaches, they are currently puny (https://github.com/weaveworks/service-conf/commit/94ce8a698310e3f7407f2fca71436728d696425f)
- separate query service (Separate out query service #194)
- parallelisation of ingester / chunk store fetch (Cortex reads are slow #132) (Read ingesters and chunk store in parallel #150)
- filter of chunks by time range before we fetch them (Cortex reads are slow #132) (Don't fetch chunks outside our timerange #149)
Chunk iterators:
- lazily merge / iterate chunks (Don't convert chunks to matrixes and then merges them... #713)
- optimise merge (Optimize merging of many small lists of samples #81)
- save partial results to cache (Save chunks to cache if GetChunks returns partial result #605)
Index optimisations (idea: minimise index reads per query):
- We could notice when we're doing overlapping queries (for instance, fetching all chunk ids for a metric vs some subset) and do the filtering client side (ie
(count(max(node_cpu{job="monitoring/prom-node-exporter"}) by (cpu, node)) by (node) - sum(irate(node_cpu{job="monitoring/prom-node-exporter",mode="idle"}[1m])) by (node)) / count(max(node_cpu{job="monitoring/prom-node-exporter"}) by (cpu, node)) by (node) * 100
fetchesnode_cpu{job="monitoring/prom-node-exporter"}
andnode_cpu{job="monitoring/prom-node-exporter",mode="idle"}
, the later could be filtered client side) (Extend Prometheus storage interface to do multiple queries in parallel, and use that to dedupe index lookups. #967) - We could consider not doing parallel queries for the same hash-key, as this could overload a shard
- figure out if one can do
foo{bar=~"bar,*", bar!="bazzer"}
, and if so optimise it (Optimise queries likefoo{bar=~"bar,*", bar!="bazzer"}
to only query the index once #966) - filter inequality expressions in DynamoDB (similar to Filter equality matchers on the dynamo side #402 but for not-equals) (Push regex matching into the NOSQL store where possible #965)
- We could consider keeping metrics of how many chunks match certain
k=v
combinations, then when presented with an intersection operation only fetch the smallest k=v combinations - ie__name__=foo
will match lots of chunks, as willinstance=bar
, butjob=baz
might match fewer. Either way we don't necessarily need to query allk=v
and intersect, once we get to a certain size we could just fetch the chunks (which contain the full metric) and filter. (Don't query very high cardinality labels #884) - see if we can filter equality matchers on the server (dynamo) side (Filter equality matchers on the server (dynamo) side #398)
- perhaps include chunk prefix in search criteria, for short recent query optimisations
- Include chunk start in rangeKey, and use that during fetches to minimise amount of data fetched (Include chunk end in rangeKey #298)