Skip to content

Conversation

@l-bat
Copy link
Contributor

@l-bat l-bat commented Aug 24, 2023

What does this PR do?

Introduced data-free weight compression to 8 bits for OVBaseModel and OVBaseDecoderModel
PR to nncf openvinotoolkit/nncf#2059 was merged

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 29, 2023

The documentation is not available anymore as the PR was closed or merged.

alexsu52 pushed a commit to openvinotoolkit/nncf that referenced this pull request Sep 1, 2023
### Changes

Extended data free int8 weight compression algorithm for OpenVINO
backend

Example (WeightsModel):

![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c)

PR to optimum huggingface/optimum-intel#415

### Reason for changes

Optimize the model footprint and performance of large models where the
size of weights is relatively larger than the size of activations

### Related tickets

117412

### Tests

`tests/openvino/native/quantization/test_weights_compression.py`
swin transformer support verified

Results
Task: lambada_openai
|     Model |Metric|Value |   |Stderr|
|--------------|------|-----:|---|-----:|
|dolly-v2-3b_original| ppl   |5.0144|±  |0.1510|
|              |acc   |0.6297|±  |0.0067|
|dolly-v2-3b_compressed|ppl   |4.9868|±  |0.1498|
|                |acc  |0.6313|±  |0.0067|
|Llama-2-7b-chat-hf_original|ppl   |3.2788|±  |0.0866|
|       |acc   |0.7058|±  |0.0063|
|Llama-2-7b-chat-hf_compressed|ppl   |3.2856|±  |0.0869|
|       |acc   |0.7054|±  |0.0064|
l-bat added a commit to l-bat/nncf that referenced this pull request Sep 1, 2023
…kit#2059)

### Changes

Extended data free int8 weight compression algorithm for OpenVINO
backend

Example (WeightsModel):

![image](https://github.com/openvinotoolkit/nncf/assets/22346860/02138cce-290a-40aa-b997-f83815400a6c)

PR to optimum huggingface/optimum-intel#415

### Reason for changes

Optimize the model footprint and performance of large models where the
size of weights is relatively larger than the size of activations

### Related tickets

117412

### Tests

`tests/openvino/native/quantization/test_weights_compression.py`
swin transformer support verified

Results
Task: lambada_openai
|     Model |Metric|Value |   |Stderr|
|--------------|------|-----:|---|-----:|
|dolly-v2-3b_original| ppl   |5.0144|±  |0.1510|
|              |acc   |0.6297|±  |0.0067|
|dolly-v2-3b_compressed|ppl   |4.9868|±  |0.1498|
|                |acc  |0.6313|±  |0.0067|
|Llama-2-7b-chat-hf_original|ppl   |3.2788|±  |0.0866|
|       |acc   |0.7058|±  |0.0063|
|Llama-2-7b-chat-hf_compressed|ppl   |3.2856|±  |0.0869|
|       |acc   |0.7054|±  |0.0064|
@AlexKoff88
Copy link
Contributor

I am ok with the PR but failed tests look strange

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the addition @l-bat

@l-bat l-bat force-pushed the lt/ov_compress_weights branch from 2338c7f to 6cd4a85 Compare September 20, 2023 09:56
@l-bat
Copy link
Contributor Author

l-bat commented Sep 20, 2023

@echarlaix there are some failed tests not related to my changes

Comment on lines +149 to +150
(OVModelForSequenceClassification, "hf-internal-testing/tiny-random-bert", 70, 35),
(OVModelForCausalLM, "hf-internal-testing/tiny-random-gpt2", 45, 22),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference in the quantization applied on the two models (depending on whether this is a pytorch or an openvino model) ?

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on it, the PR looks good so will merge, also added a question on the changes applied in 6cd4a85

@echarlaix echarlaix merged commit 673484b into huggingface:main Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants