Skip to content

Conversation

@dmitry-gorokhov
Copy link

@dmitry-gorokhov dmitry-gorokhov commented May 22, 2024

Details:

  • Improved LLMs latency for models with BF16 weights (by extending compressed IP kernel on BF16 data type)
  • Improved compilation time for models with BF16 weights on avx2 systems (enabled jit reorder for BF16 data type)

OneDNN fork PR: openvinotoolkit/oneDNN#250

Tickets:

@dmitry-gorokhov dmitry-gorokhov self-assigned this May 22, 2024
@dmitry-gorokhov dmitry-gorokhov requested review from a team as code owners May 22, 2024 13:45
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label May 22, 2024
@dmitry-gorokhov
Copy link
Author

@usstq could you please review?

@dmitry-gorokhov dmitry-gorokhov added this to the 2024.3 milestone Jun 3, 2024
Copy link
Contributor

@usstq usstq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except one comment here:
openvinotoolkit/oneDNN#250 (comment)

@slyalin
Copy link
Contributor

slyalin commented Jun 19, 2024

Are we going to merge it for the next release?

@dmitry-gorokhov dmitry-gorokhov force-pushed the feature/bf16_weights_compression branch from 55908c4 to ea73407 Compare June 26, 2024 07:37
@maxnick maxnick added this pull request to the merge queue Jul 1, 2024
Merged via the queue into openvinotoolkit:master with commit 1d7daae Jul 1, 2024
@maxnick maxnick deleted the feature/bf16_weights_compression branch July 1, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants