Skip to content

Conversation

@bartoldeman
Copy link
Contributor

(created using eb --new-pr)

…8.0-GCC-7.3.0-2.30.eb, ScaLAPACK-2.0.2-gompi-2018b-BLIS-0.3.2.eb
@bartoldeman
Copy link
Contributor Author

bartoldeman commented Jul 25, 2018

@bartoldeman
Copy link
Contributor Author

On Sandy Bridge single node (theoretical peak: ~333 Gflops) HPL test using 16 cores with ~50% of memory (32G out of 64G) used (mpirun --bind-to core -n 16 xhpl with OMP_NUM_THREADS=1)

goblf (BLIS)

T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR01C2R4       64000   168     4     4             636.00              2.748e+02

foss (OpenBLAS)

T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR01C2R4       64000   168     4     4             610.12              2.864e+02

gomkl (MKL)

T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR01C2R4       64000   168     4     4             570.33              3.064e+02

will test on SkyLake (avx512) later.

@bartoldeman
Copy link
Contributor Author

@akesandgren dgren mentioned I should vary NB, so I will post some more results for Sandy Bridge once those are finished.

and for SkyLake (chip: 8160 Platinum with 48 cores, peak: 2150 Gflops at 1.4 GHz avx512):

test.blis:WR01C2R4       96000   192     6     8             394.19              1.496e+03
test.blis:WR01C2R4       96000   384     6     8             352.61              1.673e+03
test.mkl:WR01C2R4       96000   192     6     8             297.83              1.980e+03
test.mkl:WR01C2R4       96000   384     6     8             304.93              1.934e+03
test.openblas:WR01C2R4       96000   192     6     8             458.44              1.287e+03
test.openblas:WR01C2R4       96000   384     6     8             475.34              1.241e+03

so here as expected blis outperforms openblas (since openblas does not support avx512 yet for double precision -- it is approaching its AVX2 peak of 1382 Gflops though at 1.8 GHz avx2). I'll do another run with intel's own mplinpack that is part of MKL for comparison.

@bartoldeman
Copy link
Contributor Author

Closing and reopening to trigger travis.

@bartoldeman bartoldeman reopened this Aug 28, 2018
@bartoldeman
Copy link
Contributor Author

Hmm that failure looks overly strict unless I am missing something. @boegel ?

@boegel
Copy link
Member

boegel commented Aug 29, 2018

@bartoldeman We tried to keep things synced across different toolchains from the same generation, but of course that doesn't make sense for ScaLAPACK, so we'll need to add an exception to test_dep_versions_per_toolchain_generation in test/easyconfigs/easyconfigs.py (in a smart way, though, to still make sure only one version of ScaLAPACK is used in a toolchain geneation).

@bartoldeman
Copy link
Contributor Author

@boegel thanks for confirming, I'll work on the test

@boegel
Copy link
Member

boegel commented Aug 29, 2018

@bartoldeman Based on the benchmarking you did, do you feel we should consider switching away from OpenBLAS in favor of BLIS in future generations of foss*?

In any case, we should open an issue on that and collect some info there, since there's obviously more to it than just performance (e.g. can BLIS be considered stable/complete yet, etc.)?

@bartoldeman
Copy link
Contributor Author

I'm not sure. Certainly BLIS outperforms OpenBLAS on avx512, but on other platforms it's not the case. BLIS certainly looks complete but I am not sure how fast it is for other functions (non-DGEMM).

Also it looks like by the time foss 2019a is out OpenBLAS is fast again on avx512 given recent contributions by @fenrus75 (see https://github.com/xianyi/OpenBLAS/pulls?utf8=%E2%9C%93&q=author%3Afenrus75 )

@boegel
Copy link
Member

boegel commented Aug 29, 2018

@bartoldeman Well, ok, but it would still be useful to have a dedicated issue on that, to have a central place to track pros & cons to take into account.

@bartoldeman
Copy link
Contributor Author

@boegel agreed about the issue.

@akesandgren
Copy link
Contributor

Rekicking Travis due to changes in EasyBuild testing

@akesandgren
Copy link
Contributor

Travis kicked

@akesandgren akesandgren reopened this Aug 28, 2019
@easybuilders easybuilders deleted a comment from boegelbot Oct 16, 2020
@easybuilders easybuilders deleted a comment from boegelbot Oct 16, 2020
@bartoldeman
Copy link
Contributor Author

ah the failing test reminds me the old issue is still there:

found 2 variants of 'ScaLAPACK' dependency in easyconfigs using '2018b' toolchain generation
* version: 2.0.2; versionsuffix: -BLIS-0.3.2 as dep for set(['goblf-2018b.eb'])
* version: 2.0.2; versionsuffix: -OpenBLAS-0.3.1 as dep for set(['foss-2018b.eb', 'fosscuda-2018b.eb'])

@easybuilders easybuilders deleted a comment from boegelbot Dec 11, 2020
@boegel
Copy link
Member

boegel commented Dec 11, 2020

@bartoldeman Does it make sense to pursue this, now that we have gobff (see #11761)?

@boegel boegel added this to the 4.x milestone Dec 11, 2020
@bartoldeman
Copy link
Contributor Author

bartoldeman commented Dec 11, 2020

It's just the underlying testsuite issue that's blocking it, would be good to sort that out.
I'll have a look how this was resolved with gobff.

For ScaLAPACK in BLIS-based vs OpenBLAS-based toolchains, we allow a dependency
on a particular version, as long as that's indicated by the versionsuffix
as used in goblf toolchain. This is the proper fix.
@bartoldeman
Copy link
Contributor Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bartoldeman: Request for testing this PR well received on generoso

PR test command 'EB_PR=6615 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_6615 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12275

Test results coming soon (I hope)...

Details

- notification for comment with ID 743330521 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@easybuilders easybuilders deleted a comment from boegelbot Dec 11, 2020
@easybuilders easybuilders deleted a comment from boegelbot Dec 11, 2020
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 5 out of 5 (4 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/165b0b347990936a048e9e90237c7e7c for a full test report.

@boegel
Copy link
Member

boegel commented Dec 11, 2020

Test report by @boegel
SUCCESS
Build succeeded for 5 out of 5 (4 easyconfigs in total)
node2707.swalot.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/85dcdb1f2aab653a1aa0dd539a59fab3 for a full test report.

@boegel
Copy link
Member

boegel commented Dec 11, 2020

Test report by @boegel
SUCCESS
Build succeeded for 5 out of 5 (4 easyconfigs in total)
node3129.skitty.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/b4212841035882bee7760b90ce52d767 for a full test report.

@boegel
Copy link
Member

boegel commented Dec 11, 2020

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3409.kirlia.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 3.6.8
See https://gist.github.com/e6e95fd6a57e5f2b5ae6e1ef25697bb3 for a full test report.

@boegel
Copy link
Member

boegel commented Dec 11, 2020

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3504.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/52022634fb5bc184e736a53bd979594b for a full test report.

@boegel
Copy link
Member

boegel commented Dec 11, 2020

Going in, thanks @bartoldeman!

@boegel boegel merged commit 2b7cd12 into easybuilders:develop Dec 11, 2020
@boegel boegel modified the milestones: 4.x, next release (4.3.3?) Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants