Skip to content

Conversation

@nikita-savelyevv
Copy link
Collaborator

Added weight compression for Dolly 2.0:

  • Performance boost is about 2x
  • No prediction quality degradation based on the samples provided in gradio

Enabled the compression by default, however there is still a widget for enabling/disabling it

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@nikita-savelyevv
Copy link
Collaborator Author

nikita-savelyevv commented Sep 14, 2023

@eaidova @MaximProshin @AlexKoff88 please review

"\n",
"compressed_model_path = Path(f'{model_path}_compressed') / 'openvino_model.xml'\n",
"\n",
"def compress_model(model):\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we planned to use optimum-intel to compress the model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found any API for weight compression in optimum-intel, only quantization. @AlexKoff88 is there such API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for huggingface/optimum-intel#415 to be merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has been merged so you can use the functionality you were waiting for.

Copy link
Collaborator Author

@nikita-savelyevv nikita-savelyevv Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now blocked by CVS-121154
Edit:
Decided to add compression with a workaround.

After CVS-121154 is fixed, will need enable compression by default and remove the workaround code.

@nikita-savelyevv nikita-savelyevv marked this pull request as draft September 15, 2023 08:28
@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review September 22, 2023 09:09
Copy link
Contributor

@MaximProshin MaximProshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I executed it on my i7 laptop with 16 GB RAM. Initially I ran non-compressed version which was really slow at the demo stage and then even got stuck (i guess due to lack of RAM). Compressed version worked really well which demonstrates a real improvement for end users.

"In this tutorial, we consider how to run an instruction-following text generation pipeline using Dolly 2.0 and OpenVINO. We will use a pre-trained model from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library. To simplify the user experience, the [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library is used to convert the models to OpenVINO™ IR format.\n",
"\n",
"The tutorial consists of the following steps:\n",
"\n",
Copy link
Contributor

@eaidova eaidova Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment looks strange on my opinion for end users, as 2023.2 is not released yet, it is better to say that in 2023.1.0 release weights compression supported only on CPU, GPU support will be added later. It is recommended to disable weights compression for GPU


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@eaidova eaidova merged commit 5049aaf into openvinotoolkit:main Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants