-
Notifications
You must be signed in to change notification settings - Fork 7.3k
add docs for post API #57698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
add docs for post API #57698
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
ec8fb2e
add docs for post API
harshit-anyscale 1c22056
add docs for post API
harshit-anyscale 63f95c7
Refactor: Move external scaling webhook docs to advanced guides
cursoragent ce8c21b
review changes
harshit-anyscale 021fc91
merge master
harshit-anyscale a400c24
Merge branch 'master' of github.com:ray-project/ray into add-docs-for…
harshit-anyscale d41ff5f
review changes
harshit-anyscale 19dd017
review changes
harshit-anyscale b497cd5
review changes
harshit-anyscale 1a7dcd4
review changes
harshit-anyscale cc625d6
fix broken links
harshit-anyscale 63c4d0d
review changes
harshit-anyscale 75a506b
Refactor: Improve external scaling API documentation
cursoragent b69bcf6
review changes
harshit-anyscale 3b220ea
merge master
harshit-anyscale 3a354ab
fix docs builder
harshit-anyscale 567c922
fix docs builder
harshit-anyscale 84ef92c
fix docs builder
harshit-anyscale 2bfdde7
fix tests
harshit-anyscale File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
149 changes: 149 additions & 0 deletions
149
doc/source/serve/advanced-guides/external-scaling-webhook.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| (serve-external-scale-webhook)= | ||
|
|
||
| # External Scaling Webhook | ||
|
|
||
| Ray Serve exposes a REST API endpoint that you can use to dynamically scale your deployments from outside the Ray cluster. This endpoint gives you flexibility to implement custom scaling logic based on any metrics or signals you choose, such as external monitoring systems, business metrics, or predictive models. | ||
|
|
||
| ## Overview | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| The external scaling webhook provides programmatic control over the number of replicas for any deployment in your Ray Serve application. Unlike Ray Serve's built-in autoscaling, which scales based on queue depth and ongoing requests, this webhook allows you to scale based on any external criteria you define. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before you can use the external scaling webhook, you must enable it in your Ray Serve application configuration: | ||
|
|
||
| ### Enable external scaler | ||
|
|
||
| Set `external_scaler_enabled: true` in your application configuration: | ||
|
|
||
| ```yaml | ||
| applications: | ||
| - name: my-app | ||
| import_path: my_module:app | ||
| external_scaler_enabled: true | ||
| deployments: | ||
| - name: my-deployment | ||
| num_replicas: 1 | ||
| ``` | ||
abrarsheikh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| :::{warning} | ||
| External scaling and built-in autoscaling are mutually exclusive. You can't use both for the same application. | ||
|
|
||
| - If you set `external_scaler_enabled: true`, you **must not** configure `autoscaling_config` on any deployment in that application. | ||
| - If you configure `autoscaling_config` on any deployment, you **must not** set `external_scaler_enabled: true` for the application. | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Attempting to use both will result in an error. | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ::: | ||
|
|
||
| ### Get authentication token | ||
|
|
||
| The external scaling webhook requires authentication using a bearer token. You can obtain this token from the Ray Dashboard UI: | ||
|
|
||
| 1. Open the Ray Dashboard in your browser (typically at `http://localhost:8265`). | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 2. Navigate to the Serve section. | ||
| 3. Find and copy the authentication token for your application. | ||
|
|
||
| ## API endpoint | ||
|
|
||
| The webhook is available at the following endpoint: | ||
abrarsheikh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ``` | ||
| POST /api/v1/applications/{application_name}/deployments/{deployment_name}/scale | ||
| ``` | ||
|
|
||
| **Path Parameters:** | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - `application_name`: The name of your Serve application. | ||
| - `deployment_name`: The name of the deployment you want to scale. | ||
|
|
||
| **Headers:** | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - `Authorization` (required): Bearer token for authentication. Format: `Bearer <token>` | ||
| - `Content-Type` (required): Must be `application/json` | ||
|
|
||
| **Request Body:** | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The following example shows the request body structure: | ||
|
|
||
| ```json | ||
| { | ||
| "target_num_replicas": 5 | ||
| } | ||
| ``` | ||
|
|
||
| The request body must conform to the [`ScaleDeploymentRequest`](https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.ScaleDeploymentRequest.html) schema: | ||
|
|
||
| - `target_num_replicas` (integer, required): The target number of replicas for the deployment. Must be a non-negative integer. | ||
|
|
||
|
|
||
| ## Example - Predictive scaling | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Implement predictive scaling based on historical patterns or forecasts. For instance, you can preemptively scale up before anticipated traffic spikes: | ||
|
|
||
| ```python | ||
| import requests | ||
| from datetime import datetime | ||
|
|
||
| def predictive_scale( | ||
| application_name: str, | ||
| deployment_name: str, | ||
| auth_token: str, | ||
| serve_endpoint: str = "http://localhost:8000" | ||
| ) -> bool: | ||
| """Scale based on time of day and historical patterns.""" | ||
| hour = datetime.now().hour | ||
|
|
||
| # Define scaling profile based on historical traffic patterns | ||
| if 9 <= hour < 17: # Business hours | ||
| target_replicas = 10 | ||
| elif 17 <= hour < 22: # Evening peak | ||
| target_replicas = 15 | ||
| else: # Off-peak hours | ||
| target_replicas = 3 | ||
|
|
||
| url = ( | ||
| f"{serve_endpoint}/api/v1/applications/{application_name}" | ||
| f"/deployments/{deployment_name}/scale" | ||
| ) | ||
|
|
||
| headers = { | ||
| "Authorization": f"Bearer {auth_token}", | ||
| "Content-Type": "application/json" | ||
| } | ||
|
|
||
| response = requests.post( | ||
| url, | ||
| headers=headers, | ||
| json={"target_num_replicas": target_replicas} | ||
| ) | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| return response.status_code == 200 | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ``` | ||
|
|
||
| ## Use cases | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The external scaling webhook is useful for several scenarios where you need custom scaling logic beyond what Ray Serve's built-in autoscaling provides: | ||
|
|
||
| ### Custom metric-based scaling | ||
|
|
||
| Scale your deployments based on business or application metrics that Ray Serve doesn't track automatically: | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - External monitoring systems such as Prometheus, Datadog, or CloudWatch metrics. | ||
| - Database query latencies or connection pool sizes. | ||
| - Cost metrics to optimize for budget constraints. | ||
|
|
||
| ### Predictive and scheduled scaling | ||
|
|
||
| Implement predictive scaling based on historical patterns or business schedules: | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - Preemptive scaling before anticipated traffic spikes (such as daily or weekly patterns). | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Event-driven scaling for known traffic events (such as sales, launches, or scheduled batch jobs). | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Time-of-day based scaling profiles for predictable workloads. | ||
|
|
||
| ### Manual and operational control | ||
|
|
||
| Direct control over replica counts for operational scenarios: | ||
harshit-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - Manual scaling for load testing or performance testing. | ||
| - Cost optimization by scaling down during off-peak hours or weekends. | ||
| - Development and staging environment management. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.