Skip to content

Variable thread count for multi-threaded GEMMs #1316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 4, 2017

Conversation

timmoon10
Copy link
Contributor

When performing smallish GEMMs (M~1000, N~100, K~100) on a 36-core system, I've found that I can achieve a significant speedup by reducing the number of OpenMP threads. It turns out that multi-threading is disabled if (m < nthreads * SWITCH_RATIO) || (n < nthreads * SWITCH_RATIO), so we maintain parallel execution with no algorithmic changes if nthreads is changed to min(m / SWITCH_RATIO, n / SWITCH_RATIO, nthreads). This pull request makes this change automatically and also clears some unused code that dates back to GotoBLAS.

@martin-frbg
Copy link
Collaborator

Looks good to me, but could you please provide some performance figures just to illustrate how significant "significant" is ? (And if I read your patch correctly, it should not cause a slowdown on less well endowed systems ?)

m, n, and k can be set to arbitrary constants. A and B matrices can be transposed independently.
@timmoon10
Copy link
Contributor Author

Here are some scaling studies on a 36-core system (2 x Intel 18-core Xeon E5-2695 v4, gcc 4.9.3, TARGET=HASWELL USE_OPENMP=1 NUM_THREADS=36 INTERFACE64=0):

surface_gpgpu_sgemm

surface_gpgpu_dgemm

surface_gpgpu_cgemm

surface_gpgpu_zgemm

@timmoon10
Copy link
Contributor Author

Here are some experiments on an 8-core system (Intel Core i7, default build for OSX):

osx_sgemm

osx_dgemm

osx_cgemm

osx_zgemm

@brada4
Copy link
Contributor

brada4 commented Sep 28, 2017

There is another threading threshold in interface/gemm.c
Graph going down probably means move between L3 of different cores.

@brada4
Copy link
Contributor

brada4 commented Oct 1, 2017

I forgot to mention that you are on right path, fixed threading threshold as currently is great impairment to say OpenCV/SGEMM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants