Skip to content

Conversation

@usstq
Copy link
Contributor

@usstq usstq commented Mar 23, 2023

Details:

  • Replace enforceBF16 in config with inferencePrecision
  • Add f16 support in CPU config property INFERENCE_PRECISION_HINT
  • Add f16 support in jit_convert_truncation_emitter, jit_convert_saturation_emitter, jit_load_emitter & jit_store_emitter
  • Brings f16 related oneDNN impls back
  • Add f16 support in conv/deconv/eltwise/fullyconnect/matmul/mvn/pad/pooling/softmax/subgraph

oneDNN fork PR

openvinotoolkit/oneDNN#197

Tickets:

@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Mar 23, 2023
@github-actions github-actions bot added the category: LP transformations OpenVINO Low Precision transformations label Mar 31, 2023
@github-actions github-actions bot removed the category: LP transformations OpenVINO Low Precision transformations label Apr 25, 2023
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
@akladiev
Copy link
Collaborator

This PR will be closed in 2 weeks in case of no activity.

@akladiev akladiev removed the Stale label May 16, 2023
@yuxu42
Copy link
Contributor

yuxu42 commented May 23, 2023

@usstq can we go on for the PR review?

@usstq usstq marked this pull request as ready for review May 24, 2023 08:35
@usstq usstq requested review from a team as code owners May 24, 2023 08:35
@usstq usstq changed the title enable f16 inference precision [CPU] enable f16 inference precision May 24, 2023
@wenjiew
Copy link

wenjiew commented May 27, 2023

Is this PR ready for review? Thanks!

@usstq
Copy link
Contributor Author

usstq commented May 29, 2023

@wenjiew Yes, I think it's ready!

@yuxu42
Copy link
Contributor

yuxu42 commented May 29, 2023

@luo-cheng2021 @tiger100256-hu Could you please review? Thanks!

Copy link
Contributor

@luo-cheng2021 luo-cheng2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We'd better to link the ticket about adding testcases to cover the fp16 function.

@usstq
Copy link
Contributor Author

usstq commented Jun 21, 2023

I found some regression in performance after rebase, debugging...

uni_vpslld(vmm_src, vmm_src, 16);
break;
case Precision::FP16:
assert(mayiuse(x64::avx512_core_fp16));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, from my understanding conversion instructions like vcvtph2ps don't actually require avx512_core_fp16 ISA, but just F16C + AVX512VL/AVX512F (or pure avx2) which is available on all modern intel CPUs. Given that we can relax ISA limitation for all operations that uses only FP32<->FP16 conversion and keep real math in FP32 (like Eltwise, MVN, Interpolate etc). By doing that we can enable FP16 tests for such layers in precommit already now.
What do you think? Sounds like worth to add in follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me remove these assert and test can be added in follow-up PR

Copy link
Contributor Author

@usstq usstq Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, one exception is vcvtss2sh/vcvtsh2ss used in load_scalar/store_scalar which requires AVX512-FP16, Can we changed them into using vcvtps2ph/vcvtph2ps instead? which may pollute higher bits in xmm_src, is it safe ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Eltwise it is save.
Not sure about load/store_emitters. Would ask @chenhu-wang to comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's enough, load/store_emitters actually already using vector version vcvtps2ph/vcvtph2ps with mask to handle variable length load/store

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are still using avx512_core_fp16 check which, from my understanding, is available on SPR only. In other words avx512_core_fp16 is not equal f16c + avx512f + avx512vl. So to enable single layer tests in precommit we need to relax isa limitation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, some avx512_core_fp16 checks are still there, sorry I didn't remove them completely, and we can do that in single layer tests PR.

Copy link
Contributor

@chenhu-wang chenhu-wang Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that sse4 do not support f16<-> f32 convert instructions. Instead of report exceptions or assert, there should be some alignment to fallback to f32, in property(precision hint) reading stage or createSupportedPrimitive() stage of nodes.

@dmitry-gorokhov
Copy link

@usstq Could you please create PR with corresponding changed in oneDNN fork? I would also ask you to have only one commit for FP16 enabling there.

@dmitry-gorokhov
Copy link

@usstq We also need to create ticket for FP16 signle layer tests (for Convolutions, Matmuls etc) enabling activities once GNR will be available.

@dmitry-gorokhov
Copy link

@usstq Please also check binary size impact. We need to understand the change caused by FP16 instances.

@usstq
Copy link
Contributor Author

usstq commented Jun 21, 2023

@usstq We also need to create ticket for FP16 signle layer tests (for Convolutions, Matmuls etc) enabling activities once GNR will be available.

OK, will do.

@usstq Could you please create PR with corresponding changed in oneDNN fork? I would also ask you to have only one commit for FP16 enabling there.

Done: openvinotoolkit/oneDNN#197

@usstq Please also check binary size impact. We need to understand the change caused by FP16 instances.

libopenvino_intel_cpu_plugin.so has been increased from 47813192 to 48566856, by 736KB, relative increase 1.58%

@usstq
Copy link
Contributor Author

usstq commented Jun 21, 2023

@dmitry-gorokhov I have fixed fp16 brgconv issue and validated locally, and recent review comments also have been addressed, please review again, Thanks!

@usstq
Copy link
Contributor Author

usstq commented Jun 21, 2023

@dmitry-gorokhov I just found that MHA node will throw exception when enforce fp16, should I change it's behaviour by fallback to FP32 automatically


void MHA::initSupportedPrimitiveDescriptors() {
    if (!supportedPrimitiveDescriptors.empty())
        return;

    for (auto idx : {0, 1, 2, 3}) {
        inputPrecisions.push_back(getOriginalInputPrecisionAtPort(idx));
        if (!one_of(inputPrecisions[idx], Precision::FP32, Precision::BF16, Precision::I8))
            THROW_ERROR << "doesn't support " << inputPrecisions[idx].name() << " precision on " << idx <<  " input port";
    }

@dmitry-gorokhov
Copy link

@dmitry-gorokhov I just found that MHA node will throw exception when enforce fp16, should I change it's behaviour by fallback to FP32 automatically


void MHA::initSupportedPrimitiveDescriptors() {
    if (!supportedPrimitiveDescriptors.empty())
        return;

    for (auto idx : {0, 1, 2, 3}) {
        inputPrecisions.push_back(getOriginalInputPrecisionAtPort(idx));
        if (!one_of(inputPrecisions[idx], Precision::FP32, Precision::BF16, Precision::I8))
            THROW_ERROR << "doesn't support " << inputPrecisions[idx].name() << " precision on " << idx <<  " input port";
    }

Yes. MHA behavior should be updated.

@usstq
Copy link
Contributor Author

usstq commented Jun 21, 2023

@dmitry-gorokhov MHA's behavior is changed and avx512_fp16 assertions are totally removed, I validated following models on local machine using FP16 infer precision and found no regression in accuracy & performance:

  • bert-large-uncased-whole-word-masking-squad-onnx
  • resnet_v1.5_50
  • distilbert-base-uncased-sst-2
  • vit-base-16-224

Copy link

@dmitry-gorokhov dmitry-gorokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging the PR as long as major functionality is completed.
There are two follow-ups:

  1. Enable FP16 single layer tests on HW with AVX512 support.
  2. Clarify with oneDNN on lacking functionality

@dmitry-gorokhov dmitry-gorokhov merged commit 54bb74b into openvinotoolkit:master Jun 22, 2023
@usstq
Copy link
Contributor Author

usstq commented Jun 22, 2023

Merging the PR as long as major functionality is completed. There are two follow-ups:

  1. Enable FP16 single layer tests on HW with AVX512 support.
  2. Clarify with oneDNN on lacking functionality

OK, no problem, will follow-up these tasks

github-merge-queue bot pushed a commit that referenced this pull request Jul 1, 2024
### Details:
-
*#16500 (comment)
- *add test case for conv dconv fullconnect matmul mvn pad pooling
subgraph softmax*

### Tickets:
 - *CVS-110112*

---------

Signed-off-by: HU Yuan2 <[email protected]>
AsyaPronina pushed a commit to AsyaPronina/openvino that referenced this pull request Jul 1, 2024
### Details:
-
*openvinotoolkit#16500 (comment)
- *add test case for conv dconv fullconnect matmul mvn pad pooling
subgraph softmax*

### Tickets:
 - *CVS-110112*

---------

Signed-off-by: HU Yuan2 <[email protected]>
AsyaPronina pushed a commit to AsyaPronina/openvino that referenced this pull request Jul 1, 2024
### Details:
-
*openvinotoolkit#16500 (comment)
- *add test case for conv dconv fullconnect matmul mvn pad pooling
subgraph softmax*

### Tickets:
 - *CVS-110112*

---------

Signed-off-by: HU Yuan2 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin category: docs OpenVINO documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants