Skip to content

Commit a58d6e0

Browse files
arcyleungabrarsheikhArthur Leung
authored andcommitted
[serve] Transform replica level metrics to AutoScalingContext constructor args (ray-project#57202)
## Changes 1. Wire up the AutoScalingContext constructor args to make metrics readable in the custom AutoScalingPolicy function. 2. dropped `requests_per_replica` since its expensive to compute 3. renamed `queued_requests` to `total_queued_requests` for consistency with `total_num_requests` 4. added `total_running_requests` 5. added tests assert new fields are populated correctly 6. run custom metrics tests with `RAY_SERVE_AGGREGATE_METRICS_AT_CONTROLLER` = 0 and 1 7. updated docs --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: Arthur Leung <arcyleung@gmail.com> Signed-off-by: Arthur Leung <arcyleung+github@gmail.com> Co-authored-by: abrar <abrar@anyscale.com> Co-authored-by: Arthur Leung <arcyleung@gmail.com>
1 parent 54207a4 commit a58d6e0

File tree

11 files changed

+380
-121
lines changed

11 files changed

+380
-121
lines changed

doc/source/serve/advanced-guides/advanced-autoscaling.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -668,5 +668,3 @@ In your policy, access custom metrics via:
668668
* **`ctx.raw_metrics[metric_name]`** — A mapping of replica IDs to lists of raw metric values.
669669
The number of data points stored for each replica depends on the [`look_back_period_s`](../api/doc/ray.serve.config.AutoscalingConfig.look_back_period_s.rst) (the sliding window size) and [`metrics_interval_s`](../api/doc/ray.serve.config.AutoscalingConfig.metrics_interval_s.rst) (the metric recording interval).
670670
* **`ctx.aggregated_metrics[metric_name]`** — A time-weighted average computed from the raw metric values for each replica.
671-
672-
> Today, aggregation is a time-weighted average. In future releases, additional aggregation options may be supported.

doc/source/serve/doc_code/autoscaling_policy.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,10 @@ def custom_metrics_autoscaling_policy(
2929
) -> tuple[int, Dict[str, Any]]:
3030
cpu_usage_metric = ctx.aggregated_metrics.get("cpu_usage", {})
3131
memory_usage_metric = ctx.aggregated_metrics.get("memory_usage", {})
32-
max_cpu_usage = max(cpu_usage_metric.values())
33-
max_memory_usage = max(memory_usage_metric.values())
32+
max_cpu_usage = list(cpu_usage_metric.values())[-1] if cpu_usage_metric else 0
33+
max_memory_usage = (
34+
list(memory_usage_metric.values())[-1] if memory_usage_metric else 0
35+
)
3436

3537
if max_cpu_usage > 80 or max_memory_usage > 85:
3638
return min(ctx.capacity_adjusted_max_replicas, ctx.current_num_replicas + 1), {}

doc/source/serve/doc_code/custom_metrics_autoscaling.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
autoscaling_config={
1010
"min_replicas": 1,
1111
"max_replicas": 5,
12+
"metrics_interval_s": 0.1,
1213
"policy": {
1314
"policy_function": "autoscaling_policy:custom_metrics_autoscaling_policy"
1415
},
@@ -21,7 +22,7 @@ def __init__(self):
2122
self.memory_usage = 60.0
2223

2324
def __call__(self) -> str:
24-
time.sleep(0.1)
25+
time.sleep(0.5)
2526
self.cpu_usage = min(100, self.cpu_usage + 5)
2627
self.memory_usage = min(100, self.memory_usage + 3)
2728
return "Hello, world!"
@@ -39,10 +40,10 @@ def record_autoscaling_stats(self) -> Dict[str, float]:
3940
app = CustomMetricsDeployment.bind()
4041
# __serve_example_end__
4142

42-
# TODO: uncomment after autoscaling context is populated with all metrics
43-
# if __name__ == "__main__":
44-
# import requests # noqa
43+
if __name__ == "__main__":
44+
import requests # noqa
4545

46-
# serve.run(app)
47-
# resp = requests.get("http://localhost:8000/")
48-
# assert resp.text == "Hello, world!"
46+
serve.run(app)
47+
for _ in range(10):
48+
resp = requests.get("http://localhost:8000/")
49+
assert resp.text == "Hello, world!"

doc/source/serve/doc_code/scheduled_batch_processing.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
policy=AutoscalingPolicy(
1313
policy_function="autoscaling_policy:scheduled_batch_processing_policy"
1414
),
15+
metrics_interval_s=0.1,
1516
),
1617
max_ongoing_requests=3,
1718
)

0 commit comments

Comments
 (0)