-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[CPU] enable f16 inference precision #16500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
|
This PR will be closed in 2 weeks in case of no activity. |
Signed-off-by: HU Yuan2 <[email protected]>
|
@usstq can we go on for the PR review? |
|
Is this PR ready for review? Thanks! |
|
@wenjiew Yes, I think it's ready! |
|
@luo-cheng2021 @tiger100256-hu Could you please review? Thanks! |
luo-cheng2021
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We'd better to link the ticket about adding testcases to cover the fp16 function.
|
I found some regression in performance after rebase, debugging... |
| uni_vpslld(vmm_src, vmm_src, 16); | ||
| break; | ||
| case Precision::FP16: | ||
| assert(mayiuse(x64::avx512_core_fp16)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, from my understanding conversion instructions like vcvtph2ps don't actually require avx512_core_fp16 ISA, but just F16C + AVX512VL/AVX512F (or pure avx2) which is available on all modern intel CPUs. Given that we can relax ISA limitation for all operations that uses only FP32<->FP16 conversion and keep real math in FP32 (like Eltwise, MVN, Interpolate etc). By doing that we can enable FP16 tests for such layers in precommit already now.
What do you think? Sounds like worth to add in follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me remove these assert and test can be added in follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, one exception is vcvtss2sh/vcvtsh2ss used in load_scalar/store_scalar which requires AVX512-FP16, Can we changed them into using vcvtps2ph/vcvtph2ps instead? which may pollute higher bits in xmm_src, is it safe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Eltwise it is save.
Not sure about load/store_emitters. Would ask @chenhu-wang to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's enough, load/store_emitters actually already using vector version vcvtps2ph/vcvtph2ps with mask to handle variable length load/store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are still using avx512_core_fp16 check which, from my understanding, is available on SPR only. In other words avx512_core_fp16 is not equal f16c + avx512f + avx512vl. So to enable single layer tests in precommit we need to relax isa limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, some avx512_core_fp16 checks are still there, sorry I didn't remove them completely, and we can do that in single layer tests PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder that sse4 do not support f16<-> f32 convert instructions. Instead of report exceptions or assert, there should be some alignment to fallback to f32, in property(precision hint) reading stage or createSupportedPrimitive() stage of nodes.
|
@usstq Could you please create PR with corresponding changed in oneDNN fork? I would also ask you to have only one commit for FP16 enabling there. |
|
@usstq We also need to create ticket for FP16 signle layer tests (for Convolutions, Matmuls etc) enabling activities once GNR will be available. |
|
@usstq Please also check binary size impact. We need to understand the change caused by FP16 instances. |
OK, will do.
Done: openvinotoolkit/oneDNN#197
libopenvino_intel_cpu_plugin.so has been increased from 47813192 to 48566856, by 736KB, relative increase 1.58% |
|
@dmitry-gorokhov I have fixed fp16 brgconv issue and validated locally, and recent review comments also have been addressed, please review again, Thanks! |
|
@dmitry-gorokhov I just found that MHA node will throw exception when enforce fp16, should I change it's behaviour by fallback to FP32 automatically |
Yes. MHA behavior should be updated. |
|
@dmitry-gorokhov MHA's behavior is changed and avx512_fp16 assertions are totally removed, I validated following models on local machine using FP16 infer precision and found no regression in accuracy & performance:
|
dmitry-gorokhov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging the PR as long as major functionality is completed.
There are two follow-ups:
- Enable FP16 single layer tests on HW with AVX512 support.
- Clarify with oneDNN on lacking functionality
OK, no problem, will follow-up these tasks |
### Details: - *#16500 (comment) - *add test case for conv dconv fullconnect matmul mvn pad pooling subgraph softmax* ### Tickets: - *CVS-110112* --------- Signed-off-by: HU Yuan2 <[email protected]>
### Details: - *openvinotoolkit#16500 (comment) - *add test case for conv dconv fullconnect matmul mvn pad pooling subgraph softmax* ### Tickets: - *CVS-110112* --------- Signed-off-by: HU Yuan2 <[email protected]>
### Details: - *openvinotoolkit#16500 (comment) - *add test case for conv dconv fullconnect matmul mvn pad pooling subgraph softmax* ### Tickets: - *CVS-110112* --------- Signed-off-by: HU Yuan2 <[email protected]>
Details:
enforceBF16in config withinferencePrecisionjit_convert_truncation_emitter,jit_convert_saturation_emitter,jit_load_emitter&jit_store_emitteroneDNN fork PR
openvinotoolkit/oneDNN#197
Tickets: