-
Notifications
You must be signed in to change notification settings - Fork 260
Very large discrepancy in the quantized model's output compared to the original model when quantizing on CPU #1335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jerryzh168 Sorry I changed up a few things during test and left out the line where the model was quantized on the CPU, but basically the model's output when quantizing on the CPU and GPU is significantly different. I don't think its related to #1117 since the difference is the same when executing the CPU quantized model on the CPU itself, instead of quantizing on the CPU and executing the model on the GPU. |
how do you get the cpu_quant_model? |
|
@jerryzh168 is this expected behavior on CPU? Or a bug? |
In the example code actually nothing is quantized I think, you can check by printing reason that it's not quantized is because I fixed the above issue with a simpler example, you can repro the cpu error with the following code:
you will be able to see:
|
* updates for 70b and gpu process monitor * updates for 70b and gpu process monitor
@jerryzh168 What version of |
mine is probably in recent torch and torchao, not exactly sure, but I think you should be able to repro in nightly for both. can you paste the exact code that you use and errors? is the code in #1335 (comment) up to date? |
Just realized the real reason was that in the snippet the models had actually two different # Inited the model without the same state_dict
model = TestModel()
cpu_quant_model = TestModel() |
Quantization on GPU works as expected with very small errors, but on CPU there seems to be a problem with the quantized model's output. Here is the code to replicate the problem.
The text was updated successfully, but these errors were encountered: