use block size 128 and 256bit loading by IwakuraRein · Pull Request #3289 · flashinfer-ai/flashinfer

IwakuraRein · 2026-05-11T21:23:05Z

📌 Description

Apply the optimizations from (per-token) fp4 quantization kernel to mxfp8 quantization kernel:

use block size 128
enable 256bit when using sm100f

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

coderabbitai · 2026-05-11T21:23:13Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b46ca015-bd12-40f1-a8c8-803b95ff9c73

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mxfp8-quant-opt

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the MXFP8 quantization kernels to support processing either 8 or 16 elements per thread, depending on the CUDA version. Key changes include the introduction of the MxFp8OutT type alias and an overload for fp32_vec_to_e4m3 to handle both uint64_t and uint4 output formats. Additionally, the block size for invokeMxFP8Quantization has been fixed to 128. I have no feedback to provide as there were no review comments to assess.

use block size 128 and 256bit loading

69cd88d

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use block size 128 and 256bit loading#3289

use block size 128 and 256bit loading#3289
IwakuraRein wants to merge 1 commit into
mainfrom
mxfp8-quant-opt

IwakuraRein commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

IwakuraRein commented May 11, 2026

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

coderabbitai Bot commented May 11, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant