[AutoRound] Support w8a8 scheme in auto-round and add example #2150

mengniwang95 · 2025-12-18T12:53:22Z

SUMMARY:

Support w8a8 static fp8 quantization in auto-round modifier
Support w8a8 dynamic fp8 quantization in auto-round modifier
Add llama4 w8a8 static/dynamic fp8 quantization example

note: this PR depends on the auto-round PR: intel/auto-round#1161

TEST PLAN:
run the ut: ./tests/llmcompressor/transformers/autoround/test_autoround_oneshot.py

Signed-off-by: Mengni Wang <[email protected]>

gemini-code-assist · 2025-12-18T12:53:40Z

Summary of Changes

Hello @mengniwang95, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AutoRoundModifier by introducing support for both static and dynamic w8a8 FP8 quantization schemes. It also includes new Llama4 examples to showcase these capabilities, making the quantization process more versatile for various model architectures and precision requirements.

Highlights

W8A8 Static FP8 Quantization Support: The AutoRound modifier now supports w8a8 static FP8 quantization, expanding its capabilities for model compression.
W8A8 Dynamic FP8 Quantization Support: The AutoRound modifier now supports w8a8 dynamic FP8 quantization, offering more flexibility in how models are compressed.
Llama4 FP8 Quantization Examples: New examples have been added to demonstrate w8a8 static and dynamic FP8 quantization specifically for Llama4 models, providing practical guidance for users.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2025-12-18T12:53:51Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist

Code Review

This pull request adds support for w8a8 static and dynamic fp8 quantization to the AutoRound modifier, including new examples and tests. The core logic changes in base.py are significant. My review identified a few issues: a copy-paste error in an example file, a couple of bugs in the new activation quantization logic within base.py, and a removed assertion in a test file that weakens test coverage. Addressing these points will improve the correctness and clarity of the implementation.

src/llmcompressor/modifiers/autoround/base.py

examples/autoround/quantization_w8a8_fp8/llama4_static_quant_example.py

src/llmcompressor/modifiers/autoround/base.py

tests/llmcompressor/transformers/autoround/test_autoround_oneshot.py

Signed-off-by: Wang, Mengni <[email protected]>

src/llmcompressor/modifiers/autoround/base.py

tests/llmcompressor/transformers/autoround/test_autoround_oneshot.py

Signed-off-by: Mengni Wang <[email protected]>

xin3he · 2025-12-25T06:23:05Z

src/llmcompressor/modifiers/autoround/base.py

+            act_dynamic = activation_args.dynamic
+            act_group_size = activation_args.group_size
+            act_symmetric = activation_args.symmetric
+            act_bits = activation_args.num_bits


How about using act_dynamic = getattr(activation_args, "dynamic", None)?

There are default values in QuantizationArgs for each parameter. If we use getattr, I think all similar codes should be replaced to keep aligned.

Support more auto-round scheme and add example

4ec7c84

Signed-off-by: Mengni Wang <[email protected]>

mengniwang95 changed the title ~~[AutoRound] Support more auto-round scheme and add example~~ [AutoRound] Support w8a8 scheme in auto-round and add example Dec 18, 2025

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

mengniwang95 added 10 commits December 18, 2025 20:57

Update base.py

cc7284c

Signed-off-by: Wang, Mengni <[email protected]>

Update base.py

f493883

Signed-off-by: Wang, Mengni <[email protected]>

Update test_autoround_oneshot.py

c0ecb08

Signed-off-by: Wang, Mengni <[email protected]>

Update llama4_static_quant_example.py

a77be82

Signed-off-by: Wang, Mengni <[email protected]>

Update llama4_dynamic_quant_example.py

2963647

Signed-off-by: Wang, Mengni <[email protected]>

Update base.py

0c378ed

Signed-off-by: Wang, Mengni <[email protected]>

Update base.py

c4b97c1

Signed-off-by: Wang, Mengni <[email protected]>

Update base.py

792554a

Signed-off-by: Wang, Mengni <[email protected]>

Update llama4_dynamic_quant_example.py

7a18267

Signed-off-by: Wang, Mengni <[email protected]>

Update llama4_static_quant_example.py

64dff8e

Signed-off-by: Wang, Mengni <[email protected]>

dsikka added the autoround For any PR / issue related to autoround support label Dec 19, 2025

Merge branch 'main' into llama4

a6981f7

dsikka marked this pull request as ready for review December 19, 2025 01:21

dsikka added the fp8 For any issue / PR related to FP8 support label Dec 19, 2025

Update base.py

1282ba2

Signed-off-by: Wang, Mengni <[email protected]>

mengniwang95 mentioned this pull request Dec 19, 2025

Support w8a8 scheme in auto-round and add example intel/auto-round#1173

Open

yiliu30 reviewed Dec 19, 2025

View reviewed changes

refine code

868af0d

Signed-off-by: Mengni Wang <[email protected]>

xin3he reviewed Dec 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoRound] Support w8a8 scheme in auto-round and add example #2150

[AutoRound] Support w8a8 scheme in auto-round and add example #2150

mengniwang95 commented Dec 18, 2025

Uh oh!

gemini-code-assist bot commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he Dec 25, 2025

Uh oh!

mengniwang95 Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AutoRound] Support w8a8 scheme in auto-round and add example #2150

Are you sure you want to change the base?

[AutoRound] Support w8a8 scheme in auto-round and add example #2150

Conversation

mengniwang95 commented Dec 18, 2025

Uh oh!

gemini-code-assist bot commented Dec 18, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

mengniwang95 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants