Skip to content

arm unified elempack optimization for groupnorm#4080

Merged
nihui merged 12 commits intoTencent:masterfrom
mmyyy22:groupnorm_arm
Jun 4, 2025
Merged

arm unified elempack optimization for groupnorm#4080
nihui merged 12 commits intoTencent:masterfrom
mmyyy22:groupnorm_arm

Conversation

@mmyyy22
Copy link
Contributor

@mmyyy22 mmyyy22 commented Jul 23, 2022

  • add groupnorm_arm optimize

@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2022

Codecov Report

Attention: Patch coverage is 99.29245% with 3 lines in your changes missing coverage. Please review.

Project coverage is 95.70%. Comparing base (1d84e98) to head (e7e770e).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/arm/groupnorm_arm.cpp 99.20% 2 Missing ⚠️
src/layer/arm/groupnorm_arm_asimdhp.cpp 99.41% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #4080    +/-   ##
========================================
  Coverage   95.70%   95.70%            
========================================
  Files         827      829     +2     
  Lines      269904   270328   +424     
========================================
+ Hits       258301   258722   +421     
- Misses      11603    11606     +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nihui
Copy link
Member

nihui commented Jul 25, 2022

hi, the test failure indicates that your code produces wrong result in some condition. please investigate it.

@github-actions github-actions bot added the arm label Jun 4, 2025
@tencent-adm
Copy link
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ nihui
❌ mmyyy22
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions bot added the test label Jun 4, 2025
@nihui nihui changed the title Groupnorm_arm arm unified elempack optimization for groupnorm Jun 4, 2025
@nihui nihui requested a review from Copilot June 4, 2025 11:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an optimized group normalization implementation for ARM with unified elempack support, enhancing performance in FP16 calculations.

  • Updated test logging to include additional dimensions and the affine flag.
  • Added a new implementation (groupnorm_arm_asimdhp.cpp) using ARM NEON intrinsics for FP16 arithmetic.
  • Introduced header declarations in groupnorm_arm.h for the new methods.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
tests/test_groupnorm.cpp Updated test log formatting to print additional tensor dimensions and the affine flag.
src/layer/arm/groupnorm_arm_asimdhp.cpp New optimized FP16 group normalization implementation using ARM NEON intrinsics.
src/layer/arm/groupnorm_arm.h Header declarations for the ARM-optimized groupnorm functions.
Comments suppressed due to low confidence (1)

src/layer/arm/groupnorm_arm_asimdhp.cpp:25

  • Consider adding a function-level comment for groupnorm_fp16s to explain its purpose, algorithm, and parameter assumptions. This will improve code readability and maintainability.
static void groupnorm_fp16s(__fp16* ptr, const float* gamma_ptr, const float* beta_ptr, float eps, int channels, int size, int elempack, size_t cstep)

@nihui nihui merged commit 78b2e68 into Tencent:master Jun 4, 2025
94 of 98 checks passed
@nihui
Copy link
Member

nihui commented Jun 4, 2025

Thanks for your contribution !

@github-actions
Copy link

github-actions bot commented Jun 4, 2025

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 16495480 16495480 0 😘
armhf 7369732 7380016 +10284 ⚠️
aarch64 10775832 10777936 +2104 ⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants