-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[CPU] Reduce node supports fp16 precision #18227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Reduce node supports fp16 precision #18227
Conversation
|
@dmitry-gorokhov Hi Dmitry, could you please take a look? |
|
@chenhu-wang @maxnick Could you please take a review? Thanks! |
4c6b4c5 to
9427afb
Compare
| if (!InferenceEngine::with_cpu_x86_fp16()) { | ||
| // Skip fp16 tests for paltforms that don't support fp16 precision | ||
| retVector.emplace_back(R"(.*INFERENCE_PRECISION_HINT=(F|f)16.*)"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You add with_cpu_x86_fp16() to help tell capability of avx512_f16 and use it.
https://github.com/openvinotoolkit/openvino/pull/18227/files#diff-2ec2dbc37b12dc344e3d56d476bebb0a795777886906b5598ec2c91873404c0dR76
- There is no limitation to support f16 only on avx512_f16. If on avx2, can support it directly. If no avx2, should be supported with fallback on f32 internally.
- The function name is ambiguous, maybe we can remove it as no needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed and applied. Thanks Chenhu! And to make this happen, this config here is also revised to be avx2 for fp16. Please feel free to commet, if there should be any concerns.
cc @dmitry-gorokhov @usstq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the change in common part, should we have performance test? As f16 is not always outperform f32, reduce maybe ok as the computing complexity is low, but not sure for other computing bound node and shapes as precision convert overhead introduced. Given not all nodes support f16, so f16 hint is actually mixed precision in execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to our discussion, for now we only activate FP16 test cases for platforms with tAVX512_FP16 instructions. FP16 test cases for AVX2 will be activated after the follow-ups in #16500 (comment) be applied. Thanks Chenhu!
src/plugins/intel_cpu/tests/functional/shared_tests_instances/skip_tests_config.cpp
Show resolved
Hide resolved
src/plugins/intel_cpu/tests/functional/single_layer_tests/instances/x64/reduce.cpp
Show resolved
Hide resolved
src/plugins/intel_cpu/tests/functional/single_layer_tests/classes/reduce.cpp
Show resolved
Hide resolved
58ae9cc to
d16b8b4
Compare
7eec8d5 to
bd90799
Compare
Details:
Tickets: