[CI/Build] Bump flashinfer to v0.6.10#41711
[CI/Build] Bump flashinfer to v0.6.10#41711arpera wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
Hi @arpera, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request updates the FlashInfer version to 0.6.10 across the project's Docker configurations and dependency files. It also introduces conditional logic in the Dockerfile and setup.py to include the [cu13] extra for flashinfer-python when CUDA 13 is detected, facilitating support for SM100 GDN kernels. I have no feedback to provide.
|
FYI: 0.6.9 update - #40998 |
|
Yes, I have seen this PR #40998, thanks. It wasn't finished, so I think now v0.6.10 makes more sense. |
|
I would also like to point out that in this PR, in addition to directly integrating the new FI version v0.6.8, I made a small fix that wasn't accounted for in vLLM when integrating previous FI versions. There is also a small discussion about this issue in comments: 1, 2. Since I don't have much experience managing build dependencies in vLLM, I'd be happy to get suggestions for a more correct way to handle this in vLLM. |
|
I am noticing some potential numeric issues with the newer flashinfer versions. Specifically, the generation length for GPQA with DSv4 with the new versions are significantly longer than before (claude suggests the model is stuck in self-doubt loop). I am still investigating the issue. But just wanted to flag this out. It may be worth doing some more eval studies before merging this. |
|
Do I understand correctly that if have environment with cu13 and do |
|
Yes, you understand right |
|
@arpera With more investigation, I think the issue that I was hitting was not related to newer flashinfer versions (but with something else). I tested v0.6.10 GPQA eval with deepseek v4, it looks good. I have no more concern for upgrading. |
Purpose
v0.6.8.post1tov0.6.10.flashinfer-python[cu13]extra for cu13 users.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.