[Bugfix] Temporarily disable gptq_bitblas on ROCm #17411
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In v0.8.5, those lines of code replaces
gptq
withgptq_bitblas
:vllm/vllm/config.py
Lines 795 to 810 in ba41cc9
After the replacement, a check is introduced to verify whether the current platform supports the specified quantization method:
vllm/vllm/config.py
Line 828 in ba41cc9
However,
gptq_bitblas
is not included in the list of supported methods for ROCm, causing vLLM to throw an exception and exit:vllm/vllm/platforms/rocm.py
Lines 132 to 135 in ba41cc9
Due to this check,
gptq_bitblas
should not be functional on the ROCm platform in this version of vLLM. Any user attempting to usegptq_bitblas
on ROCm should encounter this error and the program will exit.In other words, it appears that no one has actually tested or successfully run
gptq_bitblas
on the ROCm platform. In this situation, I believe we should temporarily disablegptq_bitblas
for ROCm users. The feature can be re-enabled once a developer successfully tests and runsgptq_bitblas
on ROCm.This PR primarily considers users on the ROCm platform who rely on GPTQ quantization. These users will encounter issues where vLLM fails to start after updating from v0.8.4 to v0.8.5.
FIX #17410