Use CUDA 12.4 as default for release and nightly wheels#12098
Conversation
Signed-off-by: mgoin <michael@neuralmagic.com>
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Agree! there are several pieces we pick up going from 12.1 -> 12.4 (Lovelace fp8 kernels, 2:4 sparse kernels, some cuda graph stuff)
+1 to this as well |
| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.1.0 --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT --target vllm-openai --progress plain ." | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.4.0 --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT --target vllm-openai --progress plain ." |
There was a problem hiding this comment.
both "Build wheel - CUDA 12.4" and "Build wheel - CUDA 12.1" build with cuda 12.4?
There was a problem hiding this comment.
Isn't this the release image? I have not changed the "Build wheel - CUDA 12.1" case. Why shouldn't this also be 12.4?
youkaichao
left a comment
There was a problem hiding this comment.
please note that, in .buildkite/upload-wheels.sh , we do not upload cuda 11.8 wheels, but will upload cuda 12 wheels.
if you build for both 12.1 and 12.4, make sure only one version is uploaded to avoid upload race condition.
khluu
left a comment
There was a problem hiding this comment.
Triggered a test run here: https://buildkite.com/vllm/release/builds/2643
and maybe it's time we stop building cu118 wheels?
| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.1.0 --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT --target vllm-openai --progress plain ." | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.4.0 --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT --target vllm-openai --progress plain ." |
I thought that wheels built that don't match the Line 49 in 54cacf0 So I think would prevent wheel collision. It would be nice to have explicit wheel versions like this to make it more transparent what is available |
|
@mgoin the upload wheel only checks if cu118 is present 🤕 vllm/.buildkite/upload-wheels.sh Line 50 in b5b57e3 |
|
Looks like PyTorch has 11.8, 12.1, and 12.4. So to confirm, we will publish 12.4 to PyPI, still build for 12.1 and 11.8 for nightly, then push them to artifacts? |
Signed-off-by: mgoin <michael@neuralmagic.com>
|
@khluu could you help give this a test again? |
Signed-off-by: mgoin <michael@neuralmagic.com>
|
Validated with cutlass sparsity support, since we need 12.2 CUDA at least on Hopper vllm/csrc/sparse/cutlass/sparse_scaled_mm_entry.cu Lines 8 to 10 in b382a7f |
…#12098) Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
I think it is time to match pytorch's default cuda version. We should still keep wheels built with 12.1 around.