-
Notifications
You must be signed in to change notification settings - Fork 346
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebase
Description
Weight-only quantization support for GPT-OSS has recently been merged via [CPU] Linearize gpt_oss model and add example to quantize it to w4a8 from @isharif168 🔥.
There are a couple follow ups that will make this work even better and more expansive
- Refactor
convert_model_for_quantization_gptossto use theMoECalibrationModuleinterface - Add support for the
calibrate_all_expertsoption in the forward definition - Test the forward function with modifiers such as GPTQ or AWQ
- Add an example of how to run the quantized model in vLLM (with reference to any necessary patches)
- Potential code cleanup/simplification, described in comments here
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebase