[WIP] FP8 scaledMM with DeepSeek-style dequantization #453

sanchitintel · 2025-07-02T07:05:23Z

TODO

Add UT
Use float scales
Change implementation from TT to TN
Revise comments

Some background -

Recently, an FP8 ScaledMM was added to cutlass (but it doesn't currently satisfy DeepSeek requirements for B matrix dequantization/scaling). It shares the same implementation as cutlass mixed dtype GEMM.
The original plan was to combine the source-code of the two implementations with compile time evaluated conditionals, but due to some IGC bugs, they're separate for now.
Anyway, both of those implementations are pretty slow right now due to some IGC bug.
Since I reused/copy-pasted A-scaling code from there, the scaled MM in this PR is also currently slow.

A lot of code in this PR has been duplicated, and would be refactored later.

Scaled MM with FP16 scales, needs discontiguous B & B scales

9993e54

sanchitintel changed the title ~~[WIP] FP8 GEMM with DeepSeek-style quantization~~ [WIP] FP8 scaledMM with DeepSeek-style dequantization Jul 2, 2025

[skip ci] Revise implementation

52d34b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] FP8 scaledMM with DeepSeek-style dequantization #453

[WIP] FP8 scaledMM with DeepSeek-style dequantization #453

Uh oh!

sanchitintel commented Jul 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

[WIP] FP8 scaledMM with DeepSeek-style dequantization #453

Are you sure you want to change the base?

[WIP] FP8 scaledMM with DeepSeek-style dequantization #453

Uh oh!

Conversation

sanchitintel commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sanchitintel commented Jul 2, 2025 •

edited

Loading