Skip to content

[Serve][2/N] Add deployment-level autoscaling snapshot and event summarizer#56225

Merged
abrarsheikh merged 98 commits intoray-project:masterfrom
nadongjun:serve-obsv-deployment
Dec 17, 2025
Merged

[Serve][2/N] Add deployment-level autoscaling snapshot and event summarizer#56225
abrarsheikh merged 98 commits intoray-project:masterfrom
nadongjun:serve-obsv-deployment

Conversation

@nadongjun
Copy link
Contributor

Why are these changes needed?

This PR introduces deployment-level autoscaling observability in Serve. The controller now emits a single, structured JSON log line (serve_autoscaling_snapshot) per autoscaling-enabled deployment each control-loop tick.

This avoids recomputation in the controller call sites and provides a stable, machine-parsable surface for tooling and debugging.

Changed

  • Add get_observability_snapshot in AutoscalingState and manager wrapper to generate compact snapshots (replica counts, queued/total requests, metric freshness).
  • Add ServeEventSummarizer to build payloads, reduce duplicate logs, and summarize recent scaling decisions.

Example log (single line):

Logs can be found in controller log files, e.g. /tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log.

serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]}

Follow-ups

  • Expose the same snapshot data via serve status -v and CLI/SDK surfaces.
  • Aggregate per-app snapshots and external scaler history.

Related issue number

#55834

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@nadongjun nadongjun requested a review from a team as a code owner September 4, 2025 06:49
@nadongjun nadongjun changed the title [Serve][1/N] Add deployment-level autoscaling snapshot and event summarizer [Serve][2/N] Add deployment-level autoscaling snapshot and event summarizer Sep 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces valuable observability features for autoscaling in Ray Serve by adding structured JSON logs for autoscaling snapshots. The implementation is solid, with a new ServeEventSummarizer to handle log formatting and throttling, and new methods in AutoscalingState to provide the necessary data.

My review includes a few suggestions for improvement:

  • A high-severity issue where a hardcoded policy name is used in ScalingDecision objects, which should be corrected to use the dynamically determined policy name.
  • A medium-severity issue in the logging utility where missing timestamps are replaced with the current time, which could be misleading.
  • A medium-severity suggestion to refactor duplicated logic for accessing configuration values to improve code maintainability.

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels Sep 4, 2025
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Copy link
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main feedback about this PR is that we are creating many intermediate free form dictionaries, and it is not clear to me why we need them all but importantly they create ambiguity in future about what each dictionary is supposed to contain making maintaining code harder. The code can be reorganized better, used typed objects to function that need to return large dictionaries.

…except and unused func(note_once_per_interval)

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
…er, and add constant

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
…remove unnecessary getattr

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Copy link
Contributor

@akyang-anyscale akyang-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @nadongjun! Have you thought about how this would change/work with application-level autoscaling which is in flight: #56149? When application-level autoscaling is enabled, deployment does not autoscale by-itself, so that may change how user should interpret the logs.

As a feedback for the PR, I would recommend packaging the various autoscaling relevant values into objects, and pass that object around. It's somewhat difficult to track all the different variables and where they come from, and makes the code a bit harder to parse.

- Rename get_observability_snapshot → get_snapshot for clarity
- Rename proposed_replicas → target_replicas across snapshot flow
- Return last_metrics_age_s=None when no metrics; map to "unknown" in summarizer
- Flatten replicas_allowed{min,max} into top-level min, max in snapshot payload
- Move look_back_period_s to top-level for consistency
- Rename DecisionSummary → AutoscalingDecisionSummary for clarity
- Replace tuple-based SnapshotSignature with typed dataclass
- Use DeploymentID directly as dedupe key instead of (app_name, dep_name)
- Inline snapshot computation in controller; remove _compute_snapshot_inputs
- Push scaling_status formatting into log_deployment_snapshot
- Update tests to validate new payload shape (min/max, no replicas_allowed)

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
- Standardize payload to return 'timestamp_s' for snapshots.

- Return metrics health as last_metrics_age_s

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@nadongjun
Copy link
Contributor Author

@abrarsheikh @akyang-anyscale Thanks for the detailed review!

@akyang-anyscale That’s a fair point. serve_autoscaling_snapshot log format currently only covers deployment-level autoscaling. Once application-level autoscaling is added, we’ll log deployment and application-level snapshots separately.

I’ve already switched to typed dataclasses (e.g., DeploymentSnapshot, AutoscalingDecisionSummary) so the controller passes structured objects instead of dicts. I’ll do the same for application-level autoscaling to keep things consistent.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

return total_requests

def get_deployment_snapshot(self, curr_target_num_replicas: int) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_deployment_snapshot is a expensive operation to be performed on every control loop iteration, reason because that it calls get_total_num_requests, loops over replicas and handle. These are expensive operations for a large cluster. Second, it calls self.get_decision_num_replicas which internally executed autoscaling policy which was be expensive.

I suggest instead constructing the DeploymentAutoscalingSnapshot object every time get_decision_num_replicas run and storing that on the class object. Then get_deployment_snapshot simply return the cached DeploymentAutoscalingSnapshot object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I’ve applied this. Now the snapshot is constructed once during get_decision_num_replicas() and cached, and get_deployment_snapshot() just returns the cached object.

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@akyang-anyscale
Copy link
Contributor

cc @abrarsheikh

ongoing_requests=float(ctx.total_num_requests),
metrics_health=metrics_health,
errors=errors,
decisions=decisions_summary,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need decisions inside DeploymentSnapshot?

Comment on lines 448 to 450
self._autoscaling_logger.info(
"", extra={"type": "deployment", "snapshot": payload}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

payload is already json because of model_dump. And type should be part of deployment_snapshot object in my opinion.

the extra argument to logger.info is used in a non traditional way here IMO

nadongjun and others added 2 commits December 4, 2025 10:17
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: App-level policies bypass snapshot creation entirely

When applications use app-level autoscaling policies (has_policy() returns True), the ApplicationAutoscalingState.get_decision_num_replicas method calls apply_bounds() directly and returns without ever invoking DeploymentAutoscalingState.get_decision_num_replicas(). The new snapshot creation logic (recording to _decision_history and populating _cached_deployment_snapshot) exists only in the deployment-level method. As a result, deployments under app-level policies will always have _cached_deployment_snapshot remain None, and get_deployment_snapshot() will return None. The controller's _emit_deployment_autoscaling_snapshots silently skips these deployments, making the new observability feature completely non-functional for app-level policy configurations.

python/ray/serve/_private/autoscaling_state.py#L877-L887

return {
deployment_id: (
self._deployment_autoscaling_states[deployment_id].apply_bounds(
num_replicas
)
if not _skip_bound_check
else num_replicas
)
for deployment_id, num_replicas in decisions.items()
}

Fix in Cursor Fix in Web


Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Comment on lines +433 to +438
for (
app_name,
dep_name,
details,
autoscaling_config,
) in self._autoscaling_enabled_deployments_cache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should batch write all deployments at once. this can be slow for application with 1000s of deployments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the controller to batch autoscaling snapshot logs into a single write per loop, instead of writing once per deployment.

However, in extreme cases where an application has thousands of deployments, writing one huge payload at once could be slow. Should we add a CHUNK_SIZE to emit snapshots in chunks of N to handle this case?

…init

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: App-level policies skip deployment snapshot creation

When using an app-level autoscaling policy (has_policy() returns True), the code path in ApplicationAutoscalingState.get_decision_num_replicas (lines 842-876) directly calls the app-level policy and returns decisions without calling DeploymentAutoscalingState.get_decision_num_replicas(). The _cached_deployment_snapshot is only populated inside DeploymentAutoscalingState.get_decision_num_replicas() (lines 265-268), which is only called when using deployment-level policies (line 880). As a result, get_deployment_snapshot() returns None for deployments using app-level policies, causing _emit_deployment_autoscaling_snapshots to silently skip these deployments without logging any snapshot data.

python/ray/serve/_private/autoscaling_state.py#L841-L876

"""
if self.has_policy():
# Using app-level policy
autoscaling_contexts = {
deployment_id: state.get_autoscaling_context(
deployment_to_target_num_replicas[deployment_id]
)
for deployment_id, state in self._deployment_autoscaling_states.items()
}
# Policy returns {deployment_name -> decision}
decisions, self._policy_state = self._policy(autoscaling_contexts)
assert (
type(decisions) is dict
), "Autoscaling policy must return a dictionary of deployment_name -> decision_num_replicas"
# assert that deployment_id is in decisions is valid
for deployment_id in decisions.keys():
assert (
deployment_id in self._deployment_autoscaling_states
), f"Deployment {deployment_id} is not registered"
assert (
deployment_id in deployment_to_target_num_replicas
), f"Deployment {deployment_id} is invalid"
return {
deployment_id: (
self._deployment_autoscaling_states[deployment_id].apply_bounds(
num_replicas
)
if not _skip_bound_check
else num_replicas
)
for deployment_id, num_replicas in decisions.items()
}

python/ray/serve/_private/autoscaling_state.py#L262-L268

decision_num_replicas = self.apply_bounds(decision_num_replicas)
self._cached_deployment_snapshot = self._create_deployment_snapshot(
ctx=autoscaling_context,
target_replicas=decision_num_replicas,
)

Fix in Cursor Fix in Web


Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
@abrarsheikh
Copy link
Contributor

tests are failing

@nadongjun
Copy link
Contributor Author

tests are failing

Fixed the failing tests!

@abrarsheikh abrarsheikh merged commit f297c98 into ray-project:master Dec 17, 2025
6 checks passed
cszhu pushed a commit that referenced this pull request Dec 17, 2025
…arizer (#56225)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

This PR introduces deployment-level autoscaling observability in Serve.
The controller now emits a single, structured JSON log line
(serve_autoscaling_snapshot) per autoscaling-enabled deployment each
control-loop tick.

This avoids recomputation in the controller call sites and provides a
stable, machine-parsable surface for tooling and debugging.

#### Changed
- Add get_observability_snapshot in AutoscalingState and manager wrapper
to generate compact snapshots (replica counts, queued/total requests,
metric freshness).
- Add ServeEventSummarizer to build payloads, reduce duplicate logs, and
summarize recent scaling decisions.

#### Example log (single line):
Logs can be found in controller log files, `e.g.
/tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`.

```
serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]}
```
#### Follow-ups
- Expose the same snapshot data via `serve status -v` and CLI/SDK
surfaces.
- Aggregate per-app snapshots and external scaler history.

## Related issue number

#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: akyang-anyscale <alexyang@anyscale.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
zzchun pushed a commit to zzchun/ray that referenced this pull request Dec 18, 2025
…arizer (ray-project#56225)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

This PR introduces deployment-level autoscaling observability in Serve.
The controller now emits a single, structured JSON log line
(serve_autoscaling_snapshot) per autoscaling-enabled deployment each
control-loop tick.

This avoids recomputation in the controller call sites and provides a
stable, machine-parsable surface for tooling and debugging.

#### Changed
- Add get_observability_snapshot in AutoscalingState and manager wrapper
to generate compact snapshots (replica counts, queued/total requests,
metric freshness).
- Add ServeEventSummarizer to build payloads, reduce duplicate logs, and
summarize recent scaling decisions.

#### Example log (single line):
Logs can be found in controller log files, `e.g.
/tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`.

```
serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]}
```
#### Follow-ups
- Expose the same snapshot data via `serve status -v` and CLI/SDK
surfaces.
- Aggregate per-app snapshots and external scaler history.

## Related issue number

ray-project#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: akyang-anyscale <alexyang@anyscale.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
…arizer (ray-project#56225)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

This PR introduces deployment-level autoscaling observability in Serve.
The controller now emits a single, structured JSON log line
(serve_autoscaling_snapshot) per autoscaling-enabled deployment each
control-loop tick.

This avoids recomputation in the controller call sites and provides a
stable, machine-parsable surface for tooling and debugging.

#### Changed
- Add get_observability_snapshot in AutoscalingState and manager wrapper
to generate compact snapshots (replica counts, queued/total requests,
metric freshness).
- Add ServeEventSummarizer to build payloads, reduce duplicate logs, and
summarize recent scaling decisions.

#### Example log (single line):
Logs can be found in controller log files, `e.g.
/tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`.

```
serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]}
```
#### Follow-ups
- Expose the same snapshot data via `serve status -v` and CLI/SDK
surfaces.
- Aggregate per-app snapshots and external scaler history.

## Related issue number

ray-project#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: akyang-anyscale <alexyang@anyscale.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…arizer (ray-project#56225)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

This PR introduces deployment-level autoscaling observability in Serve.
The controller now emits a single, structured JSON log line
(serve_autoscaling_snapshot) per autoscaling-enabled deployment each
control-loop tick.

This avoids recomputation in the controller call sites and provides a
stable, machine-parsable surface for tooling and debugging.

#### Changed
- Add get_observability_snapshot in AutoscalingState and manager wrapper
to generate compact snapshots (replica counts, queued/total requests,
metric freshness).
- Add ServeEventSummarizer to build payloads, reduce duplicate logs, and
summarize recent scaling decisions.

#### Example log (single line):
Logs can be found in controller log files, `e.g.
/tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log`.

```
serve_autoscaling_snapshot {"ts":"2025-09-04T06:12:11Z","app":"default","deployment":"worker","current_replicas":2,"target_replicas":2,"replicas_allowed":{"min":1,"max":8},"scaling_status":"stable","policy":"default","metrics":{"look_back_period_s":10.0,"queued_requests":0.0,"total_requests":0.0},"metrics_health":"ok","errors":[],"decisions":[{"ts":"2025-09-04T06:12:11Z","from":0,"to":2,"reason":"current=0, proposed=2"},{"ts":"2025-09-04T06:12:11Z","from":2,"to":2,"reason":"current=2, proposed=2"}]}
```
#### Follow-ups
- Expose the same snapshot data via `serve status -v` and CLI/SDK
surfaces.
- Aggregate per-app snapshots and external scaler history.

## Related issue number

ray-project#55834

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Co-authored-by: akyang-anyscale <alexyang@anyscale.com>
Co-authored-by: Abrar Sheikh <abrar2002as@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants