[skyrl-train] Support older vllm versions till 0.9.2#671
[skyrl-train] Support older vllm versions till 0.9.2#671SumanthRH merged 4 commits intoNovaSky-AI:mainfrom
Conversation
|
|
||
| In some cases, you may encounter "illegal memory access" errors with vLLM >= 0.10.0: https://github.com/vllm-project/vllm/issues/23814. Currently, we recommend a workaround by downgrading to vLLM 0.9.2. | ||
|
|
||
| With SkyRL, this can be done with the following overrides: |
There was a problem hiding this comment.
A todo here is to mention the exact commit from which such override is supported., Will be done after merging the PR
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for older versions of vLLM, specifically down to version 0.9.2. The changes introduce version-aware logic to handle API differences, update the CI pipeline to include tests for the older vLLM version, and add relevant troubleshooting documentation. My review focuses on improving the maintainability of the CI script, correcting a broken link in the documentation, and suggesting refactoring to reduce code duplication in the Python source.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
CharlieFRuan
left a comment
There was a problem hiding this comment.
LGTM! This is quite useful (e.g. in clusters where you need to build things from source, you might prefer to run older versions). Left two comments just for my own sake of understanding
Adds support for older vllm versions. With these changes, I am able to use older vllm versions as follows: ```bash uv run --isolated --extra vllm --extra dev --with vllm==0.9.2 --with transformers==4.53.0 --with torch==2.7.0 --with "flash-attn@https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl" -- pytest -s -vvv tests/gpu/gpu_ci/test_engine_generation.py::test_token_based_generation -m "vllm" ``` Will add docs for this in a future PR --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
# What does this PR do? Adds support for older vllm versions. With these changes, I am able to use older vllm versions as follows: ```bash uv run --isolated --extra vllm --extra dev --with vllm==0.9.2 --with transformers==4.53.0 --with torch==2.7.0 --with "flash-attn@https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl" -- pytest -s -vvv tests/gpu/gpu_ci/test_engine_generation.py::test_token_based_generation -m "vllm" ``` Will add docs for this in a future PR --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
What does this PR do?
Adds support for older vllm versions.
With these changes, I am able to use older vllm versions as follows:
Will add docs for this in a future PR