[Serve][2/N] Add deployment-level autoscaling snapshot and event summarizer#56225
[Serve][2/N] Add deployment-level autoscaling snapshot and event summarizer#56225abrarsheikh merged 98 commits intoray-project:masterfrom
Conversation
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces valuable observability features for autoscaling in Ray Serve by adding structured JSON logs for autoscaling snapshots. The implementation is solid, with a new ServeEventSummarizer to handle log formatting and throttling, and new methods in AutoscalingState to provide the necessary data.
My review includes a few suggestions for improvement:
- A high-severity issue where a hardcoded policy name is used in
ScalingDecisionobjects, which should be corrected to use the dynamically determined policy name. - A medium-severity issue in the logging utility where missing timestamps are replaced with the current time, which could be misleading.
- A medium-severity suggestion to refactor duplicated logic for accessing configuration values to improve code maintainability.
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
abrarsheikh
left a comment
There was a problem hiding this comment.
my main feedback about this PR is that we are creating many intermediate free form dictionaries, and it is not clear to me why we need them all but importantly they create ambiguity in future about what each dictionary is supposed to contain making maintaining code harder. The code can be reorganized better, used typed objects to function that need to return large dictionaries.
…except and unused func(note_once_per_interval) Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
…er, and add constant Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
…remove unnecessary getattr Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Thanks for the contribution @nadongjun! Have you thought about how this would change/work with application-level autoscaling which is in flight: #56149? When application-level autoscaling is enabled, deployment does not autoscale by-itself, so that may change how user should interpret the logs.
As a feedback for the PR, I would recommend packaging the various autoscaling relevant values into objects, and pass that object around. It's somewhat difficult to track all the different variables and where they come from, and makes the code a bit harder to parse.
- Rename get_observability_snapshot → get_snapshot for clarity
- Rename proposed_replicas → target_replicas across snapshot flow
- Return last_metrics_age_s=None when no metrics; map to "unknown" in summarizer
- Flatten replicas_allowed{min,max} into top-level min, max in snapshot payload
- Move look_back_period_s to top-level for consistency
- Rename DecisionSummary → AutoscalingDecisionSummary for clarity
- Replace tuple-based SnapshotSignature with typed dataclass
- Use DeploymentID directly as dedupe key instead of (app_name, dep_name)
- Inline snapshot computation in controller; remove _compute_snapshot_inputs
- Push scaling_status formatting into log_deployment_snapshot
- Update tests to validate new payload shape (min/max, no replicas_allowed)
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
- Standardize payload to return 'timestamp_s' for snapshots. - Return metrics health as last_metrics_age_s Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
@abrarsheikh @akyang-anyscale Thanks for the detailed review! @akyang-anyscale That’s a fair point. serve_autoscaling_snapshot log format currently only covers deployment-level autoscaling. Once application-level autoscaling is added, we’ll log deployment and application-level snapshots separately. I’ve already switched to typed dataclasses (e.g., DeploymentSnapshot, AutoscalingDecisionSummary) so the controller passes structured objects instead of dicts. I’ll do the same for application-level autoscaling to keep things consistent. |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
|
||
| return total_requests | ||
|
|
||
| def get_deployment_snapshot(self, curr_target_num_replicas: int) -> Dict[str, Any]: |
There was a problem hiding this comment.
get_deployment_snapshot is a expensive operation to be performed on every control loop iteration, reason because that it calls get_total_num_requests, loops over replicas and handle. These are expensive operations for a large cluster. Second, it calls self.get_decision_num_replicas which internally executed autoscaling policy which was be expensive.
I suggest instead constructing the DeploymentAutoscalingSnapshot object every time get_decision_num_replicas run and storing that on the class object. Then get_deployment_snapshot simply return the cached DeploymentAutoscalingSnapshot object.
There was a problem hiding this comment.
Good call, I’ve applied this. Now the snapshot is constructed once during get_decision_num_replicas() and cached, and get_deployment_snapshot() just returns the cached object.
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
cc @abrarsheikh |
| ongoing_requests=float(ctx.total_num_requests), | ||
| metrics_health=metrics_health, | ||
| errors=errors, | ||
| decisions=decisions_summary, |
There was a problem hiding this comment.
why do we need decisions inside DeploymentSnapshot?
| self._autoscaling_logger.info( | ||
| "", extra={"type": "deployment", "snapshot": payload} | ||
| ) |
There was a problem hiding this comment.
payload is already json because of model_dump. And type should be part of deployment_snapshot object in my opinion.
the extra argument to logger.info is used in a non traditional way here IMO
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Bug: App-level policies bypass snapshot creation entirely
When applications use app-level autoscaling policies (has_policy() returns True), the ApplicationAutoscalingState.get_decision_num_replicas method calls apply_bounds() directly and returns without ever invoking DeploymentAutoscalingState.get_decision_num_replicas(). The new snapshot creation logic (recording to _decision_history and populating _cached_deployment_snapshot) exists only in the deployment-level method. As a result, deployments under app-level policies will always have _cached_deployment_snapshot remain None, and get_deployment_snapshot() will return None. The controller's _emit_deployment_autoscaling_snapshots silently skips these deployments, making the new observability feature completely non-functional for app-level policy configurations.
python/ray/serve/_private/autoscaling_state.py#L877-L887
ray/python/ray/serve/_private/autoscaling_state.py
Lines 877 to 887 in 8835412
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
| for ( | ||
| app_name, | ||
| dep_name, | ||
| details, | ||
| autoscaling_config, | ||
| ) in self._autoscaling_enabled_deployments_cache: |
There was a problem hiding this comment.
We should batch write all deployments at once. this can be slow for application with 1000s of deployments.
There was a problem hiding this comment.
I updated the controller to batch autoscaling snapshot logs into a single write per loop, instead of writing once per deployment.
However, in extreme cases where an application has thousands of deployments, writing one huge payload at once could be slow. Should we add a CHUNK_SIZE to emit snapshots in chunks of N to handle this case?
…init Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Bug: App-level policies skip deployment snapshot creation
When using an app-level autoscaling policy (has_policy() returns True), the code path in ApplicationAutoscalingState.get_decision_num_replicas (lines 842-876) directly calls the app-level policy and returns decisions without calling DeploymentAutoscalingState.get_decision_num_replicas(). The _cached_deployment_snapshot is only populated inside DeploymentAutoscalingState.get_decision_num_replicas() (lines 265-268), which is only called when using deployment-level policies (line 880). As a result, get_deployment_snapshot() returns None for deployments using app-level policies, causing _emit_deployment_autoscaling_snapshots to silently skip these deployments without logging any snapshot data.
python/ray/serve/_private/autoscaling_state.py#L841-L876
ray/python/ray/serve/_private/autoscaling_state.py
Lines 841 to 876 in 6e1105a
python/ray/serve/_private/autoscaling_state.py#L262-L268
ray/python/ray/serve/_private/autoscaling_state.py
Lines 262 to 268 in 6e1105a
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
tests are failing |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Fixed the failing tests! |
…arizer (#56225) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? This PR introduces deployment-level autoscaling observability in Serve. The controller now emits a single, structured JSON log line (serve_autoscaling_snapshot) per autoscaling-enabled deployment each control-loop tick. This avoids recomputation in the controller call sites and provides a stable, machine-parsable surface for tooling and debugging. #### Changed - Add get_observability_snapshot in AutoscalingState and manager wrapper to generate compact snapshots (replica counts, queued/total requests, metric freshness). - Add ServeEventSummarizer to build payloads, reduce duplicate logs, and summarize recent scaling decisions. #### Example log (single line): Logs can be found in controller log files, `e.g. /tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`. ``` serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]} ``` #### Follow-ups - Expose the same snapshot data via `serve status -v` and CLI/SDK surfaces. - Aggregate per-app snapshots and external scaler history. ## Related issue number #55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Co-authored-by: akyang-anyscale <alexyang@anyscale.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
…arizer (ray-project#56225) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? This PR introduces deployment-level autoscaling observability in Serve. The controller now emits a single, structured JSON log line (serve_autoscaling_snapshot) per autoscaling-enabled deployment each control-loop tick. This avoids recomputation in the controller call sites and provides a stable, machine-parsable surface for tooling and debugging. #### Changed - Add get_observability_snapshot in AutoscalingState and manager wrapper to generate compact snapshots (replica counts, queued/total requests, metric freshness). - Add ServeEventSummarizer to build payloads, reduce duplicate logs, and summarize recent scaling decisions. #### Example log (single line): Logs can be found in controller log files, `e.g. /tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`. ``` serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]} ``` #### Follow-ups - Expose the same snapshot data via `serve status -v` and CLI/SDK surfaces. - Aggregate per-app snapshots and external scaler history. ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Co-authored-by: akyang-anyscale <alexyang@anyscale.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
…arizer (ray-project#56225) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? This PR introduces deployment-level autoscaling observability in Serve. The controller now emits a single, structured JSON log line (serve_autoscaling_snapshot) per autoscaling-enabled deployment each control-loop tick. This avoids recomputation in the controller call sites and provides a stable, machine-parsable surface for tooling and debugging. #### Changed - Add get_observability_snapshot in AutoscalingState and manager wrapper to generate compact snapshots (replica counts, queued/total requests, metric freshness). - Add ServeEventSummarizer to build payloads, reduce duplicate logs, and summarize recent scaling decisions. #### Example log (single line): Logs can be found in controller log files, `e.g. /tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`. ``` serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]} ``` #### Follow-ups - Expose the same snapshot data via `serve status -v` and CLI/SDK surfaces. - Aggregate per-app snapshots and external scaler history. ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Co-authored-by: akyang-anyscale <alexyang@anyscale.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
…arizer (ray-project#56225) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? This PR introduces deployment-level autoscaling observability in Serve. The controller now emits a single, structured JSON log line (serve_autoscaling_snapshot) per autoscaling-enabled deployment each control-loop tick. This avoids recomputation in the controller call sites and provides a stable, machine-parsable surface for tooling and debugging. #### Changed - Add get_observability_snapshot in AutoscalingState and manager wrapper to generate compact snapshots (replica counts, queued/total requests, metric freshness). - Add ServeEventSummarizer to build payloads, reduce duplicate logs, and summarize recent scaling decisions. #### Example log (single line): Logs can be found in controller log files, `e.g. /tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`. ``` serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]} ``` #### Follow-ups - Expose the same snapshot data via `serve status -v` and CLI/SDK surfaces. - Aggregate per-app snapshots and external scaler history. ## Related issue number ray-project#55834 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Dongjun Na <kmu5544616@gmail.com> Co-authored-by: akyang-anyscale <alexyang@anyscale.com> Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Why are these changes needed?
This PR introduces deployment-level autoscaling observability in Serve. The controller now emits a single, structured JSON log line (serve_autoscaling_snapshot) per autoscaling-enabled deployment each control-loop tick.
This avoids recomputation in the controller call sites and provides a stable, machine-parsable surface for tooling and debugging.
Changed
Example log (single line):
Logs can be found in controller log files,
e.g. /tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log.Follow-ups
serve status -vand CLI/SDK surfaces.Related issue number
#55834
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.