[GPT-OSS] Expanded Support for Activation Quantization

Weight-only quantization support for GPT-OSS has recently been merged via [[CPU] Linearize gpt_oss model and add example to quantize it to w4a8](https://github.com/vllm-project/llm-compressor/pull/2113) from @isharif168 🔥.

There are a couple follow ups that will make this work even better and more expansive
1. Refactor `convert_model_for_quantization_gptoss` to use the `MoECalibrationModule` interface
2. Add support for the `calibrate_all_experts` option in the forward definition
3. Test the forward function with modifiers such as GPTQ or AWQ
4. Add an example of how to run the quantized model in vLLM (with reference to any necessary patches)
5. Potential code cleanup/simplification, described in comments [here](https://github.com/vllm-project/llm-compressor/pull/2113#pullrequestreview-3599079869)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPT-OSS] Expanded Support for Activation Quantization #2159

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[GPT-OSS] Expanded Support for Activation Quantization #2159

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions