Skip to content

[Serve][2/n] add batching metrics#59232

Merged
abrarsheikh merged 5 commits intomasterfrom
59218-abrar-batch
Dec 15, 2025
Merged

[Serve][2/n] add batching metrics#59232
abrarsheikh merged 5 commits intomasterfrom
59218-abrar-batch

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Dec 7, 2025

fixes #59218

Performance Delta

from ray import serve
from typing import List

@serve.deployment(max_ongoing_requests=1000)
class MyDeployment:
    @serve.batch(max_batch_size=10, batch_wait_timeout_s=1)
    async def handle_batch(self, requests: List[int]) -> List[int]:
        return [request + 1 for request in requests]

    async def __call__(self) -> List[int]:
        return await self.handle_batch(1)

app = MyDeployment.bind()

ray start --head --metrics-export-port=8080 -> serve run batch_test:app

locust 100 users

Metric With Change Master Δ (Master – With Change)
Requests 32,033 33,541 +1,508
Fails 0 0 0
Median (ms) 170 170 0
95%ile (ms) 240 240 0
99%ile (ms) 280 270 –10 ms
Average (ms) 172.98 171.87 –1.11 ms
Min (ms) 70 84 +14 ms
Max (ms) 352 365 +13 ms
Average size (bytes) 1 1 0
Current RPS 581.9 604.1 +22.2
Current Failures/s 0 0 0

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Dec 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces metrics for batching functionality in Ray Serve. New constants for histogram buckets are added, and the batching logic is updated to record metrics for wait time, execution time, queue length, utilization, and processed batches. A new test is also added to verify these metrics.

The changes are well-implemented. I have a couple of suggestions for the new test file:

  • Move local imports to the top of the file to follow PEP 8 guidelines.
  • Enhance the test to also verify the sum of the batch utilization metric, which would make the test more robust.

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh marked this pull request as ready for review December 13, 2025 01:44
@abrarsheikh abrarsheikh requested review from a team as code owners December 13, 2025 01:44
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Dec 13, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Copy link
Contributor

@harshit-anyscale harshit-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@abrarsheikh abrarsheikh merged commit 8b3003b into master Dec 15, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the 59218-abrar-batch branch December 15, 2025 18:33
kriyanshii pushed a commit to kriyanshii/ray that referenced this pull request Dec 16, 2025
fixes ray-project#59218

### Performance Delta

```python
from ray import serve
from typing import List

@serve.deployment(max_ongoing_requests=1000)
class MyDeployment:
    @serve.batch(max_batch_size=10, batch_wait_timeout_s=1)
    async def handle_batch(self, requests: List[int]) -> List[int]:
        return [request + 1 for request in requests]

    async def __call__(self) -> List[int]:
        return await self.handle_batch(1)

app = MyDeployment.bind()

```

`ray start --head --metrics-export-port=8080` -> `serve run
batch_test:app`

locust 100 users

Metric | With Change | Master | Δ (Master – With Change)
-- | -- | -- | --
Requests | 32,033 | 33,541 | +1,508
Fails | 0 | 0 | 0
Median (ms) | 170 | 170 | 0
95%ile (ms) | 240 | 240 | 0
99%ile (ms) | 280 | 270 | –10 ms
Average (ms) | 172.98 | 171.87 | –1.11 ms
Min (ms) | 70 | 84 | +14 ms
Max (ms) | 352 | 365 | +13 ms
Average size (bytes) | 1 | 1 | 0
Current RPS | 581.9 | 604.1 | +22.2
Current Failures/s | 0 | 0 | 0

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: kriyanshii <kriyanshishah06@gmail.com>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
fixes ray-project#59218


### Performance Delta

```python
from ray import serve
from typing import List

@serve.deployment(max_ongoing_requests=1000)
class MyDeployment:
    @serve.batch(max_batch_size=10, batch_wait_timeout_s=1)
    async def handle_batch(self, requests: List[int]) -> List[int]:
        return [request + 1 for request in requests]

    async def __call__(self) -> List[int]:
        return await self.handle_batch(1)

app = MyDeployment.bind()

```

`ray start --head --metrics-export-port=8080` -> `serve run
batch_test:app`

locust 100 users

Metric | With Change | Master | Δ (Master – With Change)
-- | -- | -- | --
Requests | 32,033 | 33,541 | +1,508
Fails | 0 | 0 | 0
Median (ms) | 170 | 170 | 0
95%ile (ms) | 240 | 240 | 0
99%ile (ms) | 280 | 270 | –10 ms
Average (ms) | 172.98 | 171.87 | –1.11 ms
Min (ms) | 70 | 84 | +14 ms
Max (ms) | 352 | 365 | +13 ms
Average size (bytes) | 1 | 1 | 0
Current RPS | 581.9 | 604.1 | +22.2
Current Failures/s | 0 | 0 | 0

---------

Signed-off-by: abrar <abrar@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
fixes ray-project#59218

### Performance Delta

```python
from ray import serve
from typing import List

@serve.deployment(max_ongoing_requests=1000)
class MyDeployment:
    @serve.batch(max_batch_size=10, batch_wait_timeout_s=1)
    async def handle_batch(self, requests: List[int]) -> List[int]:
        return [request + 1 for request in requests]

    async def __call__(self) -> List[int]:
        return await self.handle_batch(1)

app = MyDeployment.bind()

```

`ray start --head --metrics-export-port=8080` -> `serve run
batch_test:app`

locust 100 users

Metric | With Change | Master | Δ (Master – With Change)
-- | -- | -- | --
Requests | 32,033 | 33,541 | +1,508
Fails | 0 | 0 | 0
Median (ms) | 170 | 170 | 0
95%ile (ms) | 240 | 240 | 0
99%ile (ms) | 280 | 270 | –10 ms
Average (ms) | 172.98 | 171.87 | –1.11 ms
Min (ms) | 70 | 84 | +14 ms
Max (ms) | 352 | 365 | +13 ms
Average size (bytes) | 1 | 1 | 0
Current RPS | 581.9 | 604.1 | +22.2
Current Failures/s | 0 | 0 | 0

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] add debugging metrics to ray serve

2 participants