-
Notifications
You must be signed in to change notification settings - Fork 949
Added weight compression for Dolly 2.0 #1319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added weight compression for Dolly 2.0 #1319
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@eaidova @MaximProshin @AlexKoff88 please review |
notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb
Show resolved
Hide resolved
| "\n", | ||
| "compressed_model_path = Path(f'{model_path}_compressed') / 'openvino_model.xml'\n", | ||
| "\n", | ||
| "def compress_model(model):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we planned to use optimum-intel to compress the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found any API for weight compression in optimum-intel, only quantization. @AlexKoff88 is there such API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikita-savelyevv , it's the same API. You can find the example here https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/254-llm-chatbot/254-llm-chatbot.ipynb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for huggingface/optimum-intel#415 to be merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has been merged so you can use the functionality you were waiting for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now blocked by CVS-121154
Edit:
Decided to add compression with a workaround.
After CVS-121154 is fixed, will need enable compression by default and remove the workaround code.
notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb
Show resolved
Hide resolved
notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb
Show resolved
Hide resolved
notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb
Show resolved
Hide resolved
notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb
Show resolved
Hide resolved
MaximProshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I executed it on my i7 laptop with 16 GB RAM. Initially I ran non-compressed version which was really slow at the demo stage and then even got stuck (i guess due to lack of RAM). Compressed version worked really well which demonstrates a real improvement for end users.
| "In this tutorial, we consider how to run an instruction-following text generation pipeline using Dolly 2.0 and OpenVINO. We will use a pre-trained model from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. To simplify the user experience, the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library is used to convert the models to OpenVINO™ IR format.\n", | ||
| "\n", | ||
| "The tutorial consists of the following steps:\n", | ||
| "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment looks strange on my opinion for end users, as 2023.2 is not released yet, it is better to say that in 2023.1.0 release weights compression supported only on CPU, GPU support will be added later. It is recommended to disable weights compression for GPU
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Added weight compression for Dolly 2.0:
Enabled the compression by default, however there is still a widget for enabling/disabling it