Skip to content

Conversation

@JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Jan 8, 2026

Removes a bottleneck in the ES|QL rate implementation which was found via profiling:

image

I benchmarked the rate function locally with a high cardinality (2k series) series on fine granular time buckets.
This means that for the aggregation there are many groups and each group only contains little to medium amounts of data.

The profiling showed that the main bottleneck for this use-case was in fact the memory accounting for the CircuitBreaker: We allocate multiple arrays for each group, which in turn means that we are hammering the underlying atomic counter with concurrent updates.

This PR fixes this by making sure that we use the LocalCircuitBreaker here instead: This circuit breaker pre-allocates larger chunks of memory from the atomic counter, causing the number of updates to be significantly reduced.
In my local benchmarks, this reduced the response time for the query by about ~30%.

Unfortunately those results don't exactly translate to our high cardinality CI benchmarks: Those benchmarks also have a high number of groups, but much more data per group, causing this problem to have a lower impact. For that reason we only see a 3~5% improvement here on CI for the high cardinality benchmarks.

@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.4.0 labels Jan 8, 2026
@JonasKunz
Copy link
Contributor Author

Buildkite benchmark this with tsdb-metricsgen-240m-highcardinality please

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 8, 2026

💚 Build Succeeded

This build ran two tsdb-metricsgen-240m-highcardinality benchmarks to evaluate performance impact of this PR.

History

@JonasKunz JonasKunz added :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL >non-issue labels Jan 9, 2026
@JonasKunz JonasKunz marked this pull request as ready for review January 9, 2026 12:21
@JonasKunz JonasKunz requested review from dnhatn and kkrik-es January 9, 2026 12:21
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two comments, but the changes look great. Thank you for the investigation @JonasKunz!

this.dateFactor = isDateNanos ? 1_000_000_000.0 : 1000.0;
try {
this.reducedStates = driverContext.bigArrays().newObjectArray(256);
this.reducedStates = bigArrays.newObjectArray(256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should release localCircuitBreakerService if creating reducedStates fails.

this.driverContext = driverContext;
this.bigArrays = driverContext.bigArrays();
localCircuitBreakerService = new LocalCircuitBreaker.SingletonService(
driverContext.bigArrays().breakerService(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pass the actual settings here, but we can do that in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to, but I looked at DriverContext and it didn't expose the Settings unless I missed it.
If it's easy, I think we can fix it on this PR.
Do you have a suggestion on what we the best way to gain access to the real Settings here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are currently missing some wiring. We can move these settings to PlannerSettings and pass PlannerSettings to DriverContext. Let's address this in a follow-up.

@JonasKunz JonasKunz changed the title POC: Improve ES|QL rate performance for high-cardinality using local circuit breaker Improve ES|QL rate performance for high-cardinality using local circuit breaker Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants