Added weight compression for Dolly 2.0 #1319

nikita-savelyevv · 2023-09-14T14:35:48Z

Added weight compression for Dolly 2.0:

Performance boost is about 2x
No prediction quality degradation based on the samples provided in gradio

Enabled the compression by default, however there is still a widget for enabling/disabling it

review-notebook-app · 2023-09-14T14:35:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

nikita-savelyevv · 2023-09-14T14:44:35Z

@eaidova @MaximProshin @AlexKoff88 please review

notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb

MaximProshin · 2023-09-14T14:46:07Z

notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb

+    "\n",
+    "compressed_model_path = Path(f'{model_path}_compressed') / 'openvino_model.xml'\n",
+    "\n",
+    "def compress_model(model):\n",


I thought we planned to use optimum-intel to compress the model.

I haven't found any API for weight compression in optimum-intel, only quantization. @AlexKoff88 is there such API?

@nikita-savelyevv , it's the same API. You can find the example here https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/254-llm-chatbot/254-llm-chatbot.ipynb

Waiting for huggingface/optimum-intel#415 to be merged

The PR has been merged so you can use the functionality you were waiting for.

~~Now blocked by CVS-121154~~
Edit:
Decided to add compression with a workaround.

After CVS-121154 is fixed, will need enable compression by default and remove the workaround code.

notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb

MaximProshin

I executed it on my i7 laptop with 16 GB RAM. Initially I ran non-compressed version which was really slow at the demo stage and then even got stuck (i guess due to lack of RAM). Compressed version worked really well which demonstrates a real improvement for end users.

eaidova · 2023-09-25T04:42:03Z

notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb

    "In this tutorial, we consider how to run an instruction-following text generation pipeline using Dolly 2.0 and OpenVINO. We will use a pre-trained model from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. To simplify the user experience, the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library is used to convert the models to OpenVINO™ IR format.\n",
    "\n",
    "The tutorial consists of the following steps:\n",
    "\n",


comment looks strange on my opinion for end users, as 2023.2 is not released yet, it is better to say that in 2023.1.0 release weights compression supported only on CPU, GPU support will be added later. It is recommended to disable weights compression for GPU

Reply via ReviewNB

Added weight compression for Dolly 2.0

1afee40

Extra space

1289174

MaximProshin reviewed Sep 14, 2023

View reviewed changes

nikita-savelyevv added 3 commits September 14, 2023 17:03

Tweak

d1206ea

Replaced by-hand compression with optimum

54d1bc8

Merge branch 'main' into ns/dolly-compression

59f9fb4

nikita-savelyevv marked this pull request as draft September 15, 2023 08:28

eaidova reviewed Sep 19, 2023

View reviewed changes

notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb Show resolved Hide resolved

nikita-savelyevv added 5 commits September 20, 2023 19:19

Merge branch 'main' into ns/dolly-compression

fe62185

Tweaks

4c2eb57

Added workaround for weights compression

4f97d86

Fixed typo

1d730d9

Tweak

66ab819

nikita-savelyevv marked this pull request as ready for review September 22, 2023 09:09

nikita-savelyevv added 2 commits September 22, 2023 11:15

Fixed typo

b2137d8

Removed unnecessary changes

e4eec8b

MaximProshin reviewed Sep 24, 2023

View reviewed changes

MaximProshin approved these changes Sep 24, 2023

View reviewed changes

eaidova reviewed Sep 25, 2023

View reviewed changes

nikita-savelyevv added 2 commits September 25, 2023 10:27

Fixed a note

9b8e4e4

Fixed typo

a6187a0

eaidova approved these changes Sep 25, 2023

View reviewed changes

eaidova merged commit 5049aaf into openvinotoolkit:main Sep 25, 2023

Added weight compression for Dolly 2.0 #1319

Added weight compression for Dolly 2.0 #1319

Uh oh!

Conversation

nikita-savelyevv commented Sep 14, 2023

Uh oh!

review-notebook-app bot commented Sep 14, 2023

Uh oh!

nikita-savelyevv commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MaximProshin Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

MaximProshin Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

AlexKoff88 Sep 21, 2023

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv Sep 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MaximProshin left a comment

Choose a reason for hiding this comment

Uh oh!

eaidova Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv Sep 25, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nikita-savelyevv commented Sep 14, 2023 •

edited

Loading

nikita-savelyevv Sep 22, 2023 •

edited

Loading

eaidova Sep 25, 2023 •

edited

Loading