Skip to content

Add INT8 and INT4 support to P2P benchmark. #918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

caogao
Copy link
Contributor

@caogao caogao commented Feb 8, 2022

Summary:
Add two options:

  1. default, which transfers quantized tensor directly;
  2. --include_quantization, which start with FP16 tensor, quantize, transfer, and finally dequantize to FP16 tensor.

Also, add a option to sweep through data types and shapes.

Caveat: INT4 dequantization is not numerically correct, but adding as a proxy for performance measurement.

Reviewed By: brad-mengchi

Differential Revision: D31098854

Summary:
Add two options:
1) default, which transfers quantized tensor directly;
2) --include_quantization, which start with FP16 tensor, quantize, transfer, and finally dequantize to FP16 tensor.

Also, add a option to sweep through data types and shapes.

Caveat: INT4 dequantization is not numerically correct, but adding as a proxy for performance measurement.

Reviewed By: brad-mengchi

Differential Revision: D31098854

fbshipit-source-id: f7dd6ec6d57967aec9593108b8a9ecf52c947d4c
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31098854

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3841

Pull Request resolved: facebookresearch/FBGEMM#918

qk norm applies L2 norm, not rms norm. So we just use k_norm instead of k_rms_norm.

Reviewed By: jasonjk-park, Aya-ZIbra

Differential Revision: D71268903

fbshipit-source-id: aa5ad2ea795a718843d6c15a9dee03e9b332b860
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants