Skip to content

[Serve] Fix replicas hanging forever when requests are stuck draining in direct ingress mode#60754

Merged
abrarsheikh merged 7 commits intomasterfrom
60748-abrar-stopping
Feb 5, 2026
Merged

[Serve] Fix replicas hanging forever when requests are stuck draining in direct ingress mode#60754
abrarsheikh merged 7 commits intomasterfrom
60748-abrar-stopping

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Feb 4, 2026

  • In direct ingress mode, replicas waiting for requests to drain were never force-killed, causing them to hang indefinitely if requests got stuck
  • Replicas are now force-killed after max(graceful_shutdown_timeout_s, RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)

Before

Scenario Behavior
Requests drain normally Wait forever for 30s min drain period
Requests stuck Hang forever

After

Scenario Behavior
Requests drain normally Force-kill after max(timeout, 30s)
Requests stuck Force-kill after max(timeout, 30s)

Test Plan

  • python/ray/serve/tests/unit/test_deployment_state.py
  • python/ray/serve/tests/test_direct_ingress.py
Cursor Bugbot reviewed your changes and found no issues for commit 6460607

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh requested a review from a team as a code owner February 4, 2026 20:34
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where replicas in direct ingress mode could hang indefinitely if requests were stuck during the draining process. The fix ensures that replicas are now forcefully killed after a timeout, which is calculated as the maximum of the deployment's graceful_shutdown_timeout_s and a new minimum draining period constant for direct ingress. This change moves the timeout enforcement logic to the controller, simplifying the replica's shutdown process. Additionally, the RAY_SERVE_DISABLE_SHUTTING_DOWN_INGRESS_REPLICAS_FORCEFULLY feature flag has been removed, which further simplifies the code and makes the force-kill behavior more robust. The changes are well-structured and improve the overall code quality.

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Feb 4, 2026
Signed-off-by: abrar <abrar@anyscale.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Copy link
Contributor

@harshit-anyscale harshit-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also document the behavior (like who will take priority) in the scenario when MIN_DRANING_PERIOD and RAY_SERVE_FORCE_STOP_UNHEALTHY_REPLICAS, both are set by the users.

@abrarsheikh
Copy link
Contributor Author

we should also document the behavior (like who will take priority) in the scenario when MIN_DRANING_PERIOD and RAY_SERVE_FORCE_STOP_UNHEALTHY_REPLICAS, both are set by the users.

good point, for now I am going to add a comment in code since DI is not public

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Copy link
Contributor

@eicherseiji eicherseiji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this PR: Makes force_stop unconditional and updates _shutdown_deadline to enforce RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S within the check_and_update_replicas's check_stopped function to guarantee replicas are stopped, even while draining.

@abrarsheikh abrarsheikh merged commit e7fa2e4 into master Feb 5, 2026
6 checks passed
@abrarsheikh abrarsheikh deleted the 60748-abrar-stopping branch February 5, 2026 20:11
abrarsheikh added a commit that referenced this pull request Feb 5, 2026
… in direct ingress mode (#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`


<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
… in direct ingress mode (ray-project#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
… in direct ingress mode (ray-project#60754)


- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
… in direct ingress mode (ray-project#60754)


- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`


<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
… in direct ingress mode (#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`


<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
… in direct ingress mode (#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`


<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
MuhammadSaif700 pushed a commit to MuhammadSaif700/ray that referenced this pull request Feb 17, 2026
… in direct ingress mode (ray-project#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Muhammad Saif <2024BBIT200@student.Uet.edu.pk>
Kunchd pushed a commit to Kunchd/ray that referenced this pull request Feb 17, 2026
… in direct ingress mode (ray-project#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`


<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
… in direct ingress mode (ray-project#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
… in direct ingress mode (ray-project#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`


<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
… in direct ingress mode (ray-project#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
… in direct ingress mode (ray-project#60754)

- In direct ingress mode, replicas waiting for requests to drain were
never force-killed, causing them to hang indefinitely if requests got
stuck
- Replicas are now force-killed after `max(graceful_shutdown_timeout_s,
RAY_SERVE_DIRECT_INGRESS_MIN_DRAINING_PERIOD_S)`

## Before
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Wait forever for 30s min drain period |
| Requests stuck | Hang forever |

## After
| Scenario | Behavior |
|----------|----------|
| Requests drain normally | Force-kill after `max(timeout, 30s)` |
| Requests stuck | Force-kill after `max(timeout, 30s)` |

## Test Plan
- [x] `python/ray/serve/tests/unit/test_deployment_state.py`
- [x] `python/ray/serve/tests/test_direct_ingress.py`

<!-- BUGBOT_STATUS --><sup><a
href="https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a>
reviewed your changes and found no issues for commit
<u>6460607</u></sup><!-- /BUGBOT_STATUS -->

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants