-
Notifications
You must be signed in to change notification settings - Fork 259
Understanding 8da4w #430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this probably depends on the specific backend. For 8da4w we've tested it to work with ExecuTorch runtime (XNNPack backend) which I believe does the computation in the int bitwidths directly (8-bit act x 4-bit weight) @jerryzh168 can probably help confirm this and help answer the other questions. |
It's true that we will need to use integer compute to speed things up, that's what we are doing in our But specifically for |
closing since the question is answered, feel free to reach out for more questions. |
Hi there,
I'm new to quantization. From my understanding, "8da4w" means that the weights are pre-quantized to 4 bits, and the activations are quantized to 8 bits at runtime. Following this, the GEMM (General Matrix Multiply) operation between weights and activations is computed in the
int8
data type. Do I have this correct?However, I'm confused by the code for
Int8DynActInt4WeightQuantizer
. Theforward
method ofInt8DynActInt4WeightLinear
calls a method namedper_token_dynamic_quant
, which can be found here. In this method, the input is first quantized toint8
and then immediately converted back to its original data type without further processing. I don't understand the purpose of this function. Furthermore, I have launched a program usingInt8DynActInt4WeightQuantizer
and observed the data types ofx
andw_dq
in the methodlinear_forward_8da4w
, which can be found here, they both arefloat32
. This seems to contradict my understanding of the computations involved in '8da4w'.I realize that I'm likely missing some fundamental aspects of dynamic quantization. Could anyone kindly clarify this process for me?
Thank you!
The text was updated successfully, but these errors were encountered: