Skip to content

[CI/Build] Bump flashinfer to v0.6.10#41711

Open
arpera wants to merge 3 commits intovllm-project:mainfrom
arpera:bump-flashinfer-0.6.10
Open

[CI/Build] Bump flashinfer to v0.6.10#41711
arpera wants to merge 3 commits intovllm-project:mainfrom
arpera:bump-flashinfer-0.6.10

Conversation

@arpera
Copy link
Copy Markdown
Contributor

@arpera arpera commented May 5, 2026

Purpose

  • Bump FlashInfer from v0.6.8.post1 to v0.6.10.
  • Adjust installation to use flashinfer-python[cu13] extra for cu13 users.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 5, 2026

Hi @arpera, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the FlashInfer version to 0.6.10 across the project's Docker configurations and dependency files. It also introduces conditional logic in the Dockerfile and setup.py to include the [cu13] extra for flashinfer-python when CUDA 13 is detected, facilitating support for SM100 GDN kernels. I have no feedback to provide.

@pavanimajety pavanimajety added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label May 5, 2026
@pavanimajety
Copy link
Copy Markdown
Collaborator

FYI: 0.6.9 update - #40998

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 5, 2026

Yes, I have seen this PR #40998, thanks. It wasn't finished, so I think now v0.6.10 makes more sense.

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 5, 2026

I would also like to point out that in this PR, in addition to directly integrating the new FI version v0.6.8, I made a small fix that wasn't accounted for in vLLM when integrating previous FI versions.
Specifically, I added installation of the flashinfer-python[cu13] extras in cases where the user has cu13 installed. Right now this is necessary because without the extras, Flashinfer does not install nvidia-cutlass-dsl[cu13] extras by default, which is required in particular for using the FI Blackwell GDN implementation whose support I'm currently trying to add here: #40717.

There is also a small discussion about this issue in comments: 1, 2.

Since I don't have much experience managing build dependencies in vLLM, I'd be happy to get suggestions for a more correct way to handle this in vLLM.

@wzhao18
Copy link
Copy Markdown
Contributor

wzhao18 commented May 5, 2026

I am noticing some potential numeric issues with the newer flashinfer versions. Specifically, the generation length for GPQA with DSv4 with the new versions are significantly longer than before (claude suggests the model is stuck in self-doubt loop).

I am still investigating the issue. But just wanted to flag this out. It may be worth doing some more eval studies before merging this.

@pavanimajety pavanimajety removed the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label May 5, 2026
@vadiklyutiy
Copy link
Copy Markdown
Collaborator

Do I understand correctly that if have environment with cu13 and do pip install flashinfer-python it doesn't install everything and users have to additionally make pip install flashinfer-python[cu13]?

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 6, 2026

Yes, you understand right

@wzhao18
Copy link
Copy Markdown
Contributor

wzhao18 commented May 6, 2026

@arpera With more investigation, I think the issue that I was hitting was not related to newer flashinfer versions (but with something else). I tested v0.6.10 GPQA eval with deepseek v4, it looks good. I have no more concern for upgrading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants