Skip to content

[GPT-OSS] Expanded Support for Activation Quantization #2159

@kylesayrs

Description

@kylesayrs

Weight-only quantization support for GPT-OSS has recently been merged via [CPU] Linearize gpt_oss model and add example to quantize it to w4a8 from @isharif168 🔥.

There are a couple follow ups that will make this work even better and more expansive

  1. Refactor convert_model_for_quantization_gptoss to use the MoECalibrationModule interface
  2. Add support for the calibrate_all_experts option in the forward definition
  3. Test the forward function with modifiers such as GPTQ or AWQ
  4. Add an example of how to run the quantized model in vLLM (with reference to any necessary patches)
  5. Potential code cleanup/simplification, described in comments here

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgood first issueA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebase

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions