Skip to content

Commit c2f29fd

Browse files
Add TSDB index lookup planning proposal
This proposal introduces extension points for TSDB index lookups that allow different execution strategies to address the problem of inefficient index lookup usage. Key features: - Extension points via LookupPlanner and LookupPlan interfaces - Simple rule-based implementation example (ScanEmptyMatchersLookupPlanner) - Enables downstream projects like Mimir, Thanos, and Cortex to implement custom optimization approaches - Builds on scan matchers foundation from PR #16835 The proposal focuses on providing interfaces rather than prescribing specific optimization strategies, allowing flexibility for different deployment scenarios and use cases. Signed-off-by: Dimitar Dimitrov <[email protected]>
1 parent 9f74f25 commit c2f29fd

File tree

1 file changed

+124
-0
lines changed

1 file changed

+124
-0
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# TSDB Index Lookup Planning
2+
3+
* **Owners:**
4+
* `@dimitarvdimitrov`
5+
6+
* **Implementation Status:** `Partially implemented`
7+
8+
* **Related Issues and PRs:**
9+
* [PR #16835: tsdb index: introduce scan matchers](https://github.com/prometheus/prometheus/pull/16835)
10+
* [Mimir issue #11916: TSDB index lookup planning](https://github.com/grafana/mimir/issues/11916)
11+
12+
* **Other docs or links:**
13+
* [Store-gateway optimization blog post](https://grafana.com/blog/2023/08/21/less-is-more-how-grafana-mimir-queries-run-faster-and-more-cost-efficiently-with-fewer-indexes/)
14+
* [Prometheus fast regexp label matcher](https://github.com/grafana/mimir-prometheus/blob/main/model/labels/regexp.go)
15+
* [Access Path Selection in a Relational Database Management System](https://15799.courses.cs.cmu.edu/spring2025/papers/02-systemr/selinger-sigmod1979.pdf)
16+
17+
> TL;DR: This proposal introduces extension points for TSDB index lookups that allow different execution strategies to address the problem of inefficient index lookup usage. The goal is to provide interfaces that enable downstream projects to implement custom optimization approaches for their specific use cases.
18+
19+
## Why
20+
21+
Prometheus' current index lookup approach creates performance bottlenecks in high-cardinality environments. Two major inefficiencies exist:
22+
23+
1. **Broad matcher inefficiency**: Wide matchers like `namespace != ""` select massive numbers of series, creating significant memory overhead for minimal filtering benefit
24+
2. **Expensive regex evaluation**: Non-optimizable regex matchers against high-cardinality labels create CPU bottlenecks
25+
26+
Real-world profiling across high-cardinality Mimir deployments shows 34% of CPU time spent on string matching and 20% on posting list iteration. These patterns appear consistently in high-cardinality environments and significantly affect total cost of ownership.
27+
28+
### Pitfalls of the current solution
29+
30+
The current naive approach to index lookups has specific problems:
31+
32+
**Example 1: Broad matcher inefficiency**
33+
- Query with 5 matchers, including `namespace != ""`
34+
- Selects union of all series with any namespace value
35+
- In a 2M series block: 2M series × 8 bytes = 16MB (roughly the equivalent of 16,000 XOR chunks)
36+
- Other matchers (`job`, `pod`, `container`, metric name) are typically more selective
37+
- Results in massive memory overhead for minimal filtering benefit
38+
39+
**Example 2: Expensive regex evaluation**
40+
- Single TSDB block: 1.8M series
41+
- One label with 220,000 distinct values
42+
- Non-optimizable regex against high-cardinality label
43+
- Runs regex against 200K values to select 2-10 series
44+
- Shows up as double-digit CPU percentage in profiles with massive allocation impact
45+
46+
## Goals
47+
48+
* Provide extension points for TSDB index lookups that allow alternative execution strategies
49+
* Enable downstream projects to implement custom optimization approaches for their specific use cases
50+
* Support experimentation with different planning algorithms and storage characteristics
51+
* Allow flexibility in addressing index lookup inefficiencies without changing core TSDB behavior
52+
53+
### Audience
54+
55+
This change primarily targets:
56+
- High-cardinality Prometheus deployments (>1M series)
57+
- Downstream projects like Mimir, Thanos, and Cortex that need different optimization strategies
58+
59+
## Non-Goals
60+
61+
* Replace existing regex optimizations
62+
* Change the core TSDB storage format
63+
* Provide immediate performance improvements without statistics collection
64+
* Improve `/api/v1/labels` and `/api/v1/label/{}/values` requests
65+
66+
## How
67+
68+
### Core Approach
69+
70+
Building on the scan matchers foundation from [PR #16835](https://github.com/prometheus/prometheus/pull/16835), this proposal introduces a planning phase that:
71+
72+
1. Allows different execution strategies for each query
73+
2. Partitions matchers into index-resolved vs series-resolved categories
74+
3. Executes with lazy evaluation according to the chosen plan
75+
76+
The approach mirrors techniques used by database query planners when choosing between index scans and sequential scans.
77+
78+
### Interface Design
79+
80+
Introduce core planning interfaces that allow downstream projects to implement their own strategies:
81+
82+
```go
83+
// LookupPlanner plans how to execute index lookups by deciding which matchers
84+
// to apply during index lookup versus after series retrieval.
85+
type LookupPlanner interface {
86+
PlanIndexLookup(ctx context.Context, plan LookupPlan, minT, maxT int64) (LookupPlan, error)
87+
}
88+
89+
// LookupPlan represents the decision of which matchers to apply during
90+
// index lookup versus during series scanning.
91+
type LookupPlan interface {
92+
// ScanMatchers returns matchers that should be applied during series scanning
93+
ScanMatchers() []*labels.Matcher
94+
// IndexMatchers returns matchers that should be applied during index lookup
95+
IndexMatchers() []*labels.Matcher
96+
}
97+
```
98+
99+
### Simple Rule-Based Implementation
100+
101+
As a concrete example, [PR #16835](https://github.com/prometheus/prometheus/pull/16835) introduces a `ScanEmptyMatchersLookupPlanner` that implements a simple rule-based approach. This planner identifies matchers that are expensive to apply on the inverted index and usually don't filter any data, deferring them to scan matchers instead.
102+
103+
The rules are:
104+
- `{label=""}` - converted to scan matcher (expensive index lookup, minimal filtering)
105+
- `{label=~".+"}` - converted to scan matcher (expensive regex evaluation, broad selection)
106+
- `{label=~".*"}` - removed entirely (matches everything, including unset values)
107+
108+
This demonstrates how the interface can be used to implement straightforward optimizations without requiring complex cost models or statistics collection. Such simple rule-based planners can provide immediate benefits for well-understood inefficient patterns while serving as building blocks for more sophisticated approaches.
109+
110+
## Alternatives
111+
112+
1. **Improve existing regex optimizations**: Continue optimizing the current approach with better regex compilation and caching. This approach has diminishing returns and doesn't address broad matcher inefficiency.
113+
114+
2. **Always use sequential scans**: Skip index lookups entirely and scan all series. This could be simpler but would hurt performance for selective queries.
115+
116+
3. **Static rule-based approach**: Use fixed rules instead of cost-based planning. This would be simpler to implement but usually misses the nuances of a cost model with cardinality estimations. However, the current `PostingsForMatchers` implementation already has some of these heuristics which always work.
117+
118+
The proposed approach provides the flexibility to adapt to different workload characteristics while maintaining compatibility with existing optimizations.
119+
120+
## Action Plan
121+
122+
* [ ] Add scan matchers to querier code
123+
* [ ] Implement basic `LookupPlanner` interface with simple heuristics
124+
* [ ] Validate approach with real-world high-cardinality workloads

0 commit comments

Comments
 (0)