Skip to content

[Serve] Exclude early partial period in timeseries aggregation#61403

Draft
abrarsheikh wants to merge 3 commits intomasterfrom
autoscaling-metrics-agg-fix
Draft

[Serve] Exclude early partial period in timeseries aggregation#61403
abrarsheikh wants to merge 3 commits intomasterfrom
autoscaling-metrics-agg-fix

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Feb 28, 2026

Problem: When merging timeseries from multiple replicas/handles, series that start later are implicitly 0 before their first data point. That undercounts the total and biases aggregations (especially mean) downward.

Solution: Start the aggregation window at the timestamp when all series have at least one point (aligned_start = max(first_timestamp) across series). This removes the early partial period where some series are implicitly 0.

Problem: When merging timeseries from replicas and handles with LOCF, the same request can appear in both because they report at different times (e.g. replica: 4 running, handle: 2 queued). That double-counts requests and inflates the total, biasing aggregations (especially mean) upward and causing over-scaling.

Solution: Cap the aggregated value by the current snapshot total (sum of the latest value from each series). By the time the autoscaler runs, the handle has usually updated (e.g. 0 queued), so the cap is 4 + 0 = 4, and the inflated value (e.g. 4.31) is reduced to the true total.

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Feb 28, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable improvement to the timeseries aggregation logic by excluding the initial partial period where data from some series is missing, which prevents undercounting and biased aggregations. The implementation is clear and includes a good test case. I've found one potential issue with a boundary condition that could cause the fix to not apply in some cases. Please see my detailed comment.

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant