🚀 The feature, motivation and pitch
Thanks for such excellent work!
We hope to leverage the fast inference capabilities of vllm to evaluate the accuracy of the quantized models. However, vllm currently does not support custom quantization schemes, and we would appreciate it if the functionality to register custom quantization schemes could be provided.
The usage would be as follows:
from vllm.model_executor.layers.quantization import QuantizationConfig
from vllm.model_executor.layers.quantization import register_quantization_config
@register_quantization_config("customize")
class CustomizeQuantizationConfig(QuantizationConfig):
"""Customize quantization config."""
Alternatives
No response
Additional context
No response
Before submitting a new issue...
🚀 The feature, motivation and pitch
Thanks for such excellent work!
We hope to leverage the fast inference capabilities of vllm to evaluate the accuracy of the quantized models. However, vllm currently does not support custom quantization schemes, and we would appreciate it if the functionality to register custom quantization schemes could be provided.
The usage would be as follows:
Alternatives
No response
Additional context
No response
Before submitting a new issue...