[Serve] Fix replicas hanging forever when requests are stuck draining in direct ingress mode by abrarsheikh · Pull Request #60754 · ray-project/ray

abrarsheikh · 2026-02-04T20:34:05Z

In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck
Replicas are now force-killed after max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)

Before

Scenario	Behavior
Requests drain normally	Wait forever for 30s min drain period
Requests stuck	Hang forever

After

Scenario	Behavior
Requests drain normally	Force-kill after `max(timeout, 30s)`
Requests stuck	Force-kill after `max(timeout, 30s)`

Test Plan

python/ray/serve/tests/unit/test_deployment_state.py
python/ray/serve/tests/test_direct_ingress.py

^{Cursor Bugbot reviewed your changes and found no issues for commit 6460607}

Signed-off-by: abrar <abrar@anyscale.com>

gemini-code-assist

Code Review

This pull request addresses an issue where replicas in direct ingress mode could hang indefinitely if requests were stuck during the draining process. The fix ensures that replicas are now forcefully killed after a timeout, which is calculated as the maximum of the deployment's graceful_shutdown_timeout_s and a new minimum draining period constant for direct ingress. This change moves the timeout enforcement logic to the controller, simplifying the replica's shutdown process. Additionally, the RAY_SERVE_DISABLE_SHUTTING_DOWN_INGRESS_REPLICAS_FORCEFULLY feature flag has been removed, which further simplifies the code and makes the force-kill behavior more robust. The changes are well-structured and improve the overall code quality.

Signed-off-by: abrar <abrar@anyscale.com>

python/ray/serve/_private/replica.py

python/ray/serve/tests/test_direct_ingress.py

python/ray/serve/_private/deployment_state.py

Signed-off-by: abrar <abrar@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/serve/_private/deployment_state.py

Signed-off-by: abrar <abrar@anyscale.com>

harshit-anyscale

we should also document the behavior (like who will take priority) in the scenario when MIN_DRANING_PERIOD and RAY_SERVE_FORCE_STOP_UNHEALTHY_REPLICAS, both are set by the users.

python/ray/serve/_private/replica.py

python/ray/serve/_private/deployment_state.py

abrarsheikh · 2026-02-05T17:14:41Z

we should also document the behavior (like who will take priority) in the scenario when MIN_DRANING_PERIOD and RAY_SERVE_FORCE_STOP_UNHEALTHY_REPLICAS, both are set by the users.

good point, for now I am going to add a comment in code since DI is not public

Signed-off-by: abrar <abrar@anyscale.com>

python/ray/serve/_private/constants.py

Signed-off-by: abrar <abrar@anyscale.com>

eicherseiji

My understanding of this PR: Makes force_stop unconditional and updates _shutdown_deadline to enforce RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S within the check_and_update_replicas's check_stopped function to guarantee replicas are stopped, even while draining.

… in direct ingress mode (#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com>

… in direct ingress mode (ray-project#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>

… in direct ingress mode (ray-project#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com>

… in direct ingress mode (#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

… in direct ingress mode (#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com>

… in direct ingress mode (ray-project#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: Muhammad Saif <2024BBIT200@student.Uet.edu.pk>

… in direct ingress mode (ray-project#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com>

… in direct ingress mode (ray-project#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>

… in direct ingress mode (ray-project#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com>

… in direct ingress mode (ray-project#60754) - In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck - Replicas are now force-killed after `max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)` ## Before | Scenario | Behavior | |----------|----------| | Requests drain normally | Wait forever for 30s min drain period | | Requests stuck | Hang forever | ## After | Scenario | Behavior | |----------|----------| | Requests drain normally | Force-kill after `max(timeout, 30s)` | | Requests stuck | Force-kill after `max(timeout, 30s)` | ## Test Plan - [x] `python/ray/serve/tests/unit/test_deployment_state.py` - [x] `python/ray/serve/tests/test_direct_ingress.py` <a href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit 6460607 --------- Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

[Serve] force kill ingress replica after min draining period

a50ee78

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh requested a review from a team as a code owner February 4, 2026 20:34

gemini-code-assist bot reviewed Feb 4, 2026

View reviewed changes

add test

d490787

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh added the go add ONLY when ready to merge, run all tests label Feb 4, 2026

abrarsheikh requested review from akyang-anyscale and eicherseiji February 4, 2026 20:39

cursor bot reviewed Feb 4, 2026

View reviewed changes

python/ray/serve/_private/replica.py Show resolved Hide resolved

python/ray/serve/tests/test_direct_ingress.py Outdated Show resolved Hide resolved

python/ray/serve/_private/deployment_state.py Show resolved Hide resolved

python/ray/serve/_private/deployment_state.py Show resolved Hide resolved

make replica wait timeout

24fb5e5

Signed-off-by: abrar <abrar@anyscale.com>

cursor bot reviewed Feb 5, 2026

View reviewed changes

python/ray/serve/_private/deployment_state.py Outdated Show resolved Hide resolved

abrarsheikh added 2 commits February 5, 2026 00:24

fix doc

6460607

Signed-off-by: abrar <abrar@anyscale.com>

wait for replicas to be cleaned

ff57cee

Signed-off-by: abrar <abrar@anyscale.com>

harshit-anyscale reviewed Feb 5, 2026

View reviewed changes

python/ray/serve/_private/replica.py Show resolved Hide resolved

python/ray/serve/_private/deployment_state.py Show resolved Hide resolved

check if user code is initialized

d9d1027

Signed-off-by: abrar <abrar@anyscale.com>

harshit-anyscale reviewed Feb 5, 2026

View reviewed changes

python/ray/serve/_private/constants.py Outdated Show resolved Hide resolved

fix comment

1be851b

Signed-off-by: abrar <abrar@anyscale.com>

harshit-anyscale approved these changes Feb 5, 2026

View reviewed changes

eicherseiji approved these changes Feb 5, 2026

View reviewed changes

akyang-anyscale approved these changes Feb 5, 2026

View reviewed changes

abrarsheikh merged commit e7fa2e4 into master Feb 5, 2026
6 checks passed

abrarsheikh deleted the 60748-abrar-stopping branch February 5, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Fix replicas hanging forever when requests are stuck draining in direct ingress mode#60754

[Serve] Fix replicas hanging forever when requests are stuck draining in direct ingress mode#60754
abrarsheikh merged 7 commits intomasterfrom
60748-abrar-stopping

abrarsheikh commented Feb 4, 2026 •

edited by cursor bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

harshit-anyscale left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

abrarsheikh commented Feb 5, 2026

Uh oh!

Uh oh!

eicherseiji left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

abrarsheikh commented Feb 4, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

harshit-anyscale left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

abrarsheikh commented Feb 5, 2026

Uh oh!

Uh oh!

eicherseiji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abrarsheikh commented Feb 4, 2026 •

edited by cursor bot

Loading

harshit-anyscale left a comment •

edited

Loading