-
Notifications
You must be signed in to change notification settings - Fork 772
{devel}[foss/2020b] PyTorch v1.8.1 w/ Python 3.8.6 #12347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| name = 'PyTorch' | ||
| version = '1.8.0' | ||
|
|
||
| homepage = 'https://pytorch.org/' | ||
| description = """Tensors and Dynamic neural networks in Python with strong GPU acceleration. | ||
| PyTorch is a deep learning framework that puts Python first.""" | ||
|
|
||
| toolchain = {'name': 'foss', 'version': '2020b'} | ||
|
|
||
| sources = [{ | ||
| 'filename': '%(name)s-%(version)s.tar.gz', | ||
| 'git_config': { | ||
| 'url': 'https://github.com/pytorch', | ||
| 'repo_name': 'pytorch', | ||
| 'tag': 'v%(version)s', | ||
| 'recursive': True, | ||
| }, | ||
| }] | ||
| patches = [ | ||
| 'PyTorch-1.6.0_fix-test-dataloader-fixed-affinity.patch', | ||
| 'PyTorch-1.7.0_avoid-nan-in-test-torch.patch', | ||
| 'PyTorch-1.7.0_increase-distributed-test-timeout.patch', | ||
| 'PyTorch-1.7.0_disable-dev-shm-test.patch', | ||
| ] | ||
| checksums = [ | ||
| None, # can't add proper SHA256 checksum, because source tarball is created locally after recursive 'git clone' | ||
| # PyTorch-1.6.0_fix-test-dataloader-fixed-affinity.patch | ||
| 'a4208a46cd2098744daaba96cebb96cd91166f8fc616924315e05974bad80c67', | ||
| 'b899aa94d9e60f11ee75a706563312ccefa9cf432756c470caa8e623991c8f18', # PyTorch-1.7.0_avoid-nan-in-test-torch.patch | ||
| # PyTorch-1.7.0_increase-distributed-test-timeout.patch | ||
| '95abb468a35451fbd0f864ca843f6ad15ff8bfb909c3fd580f65859b26c9691c', | ||
| '622cb1eaeadc06e13128a862d9946bcc1f1edd3d02b259c56a9aecc4d5406b8a', # PyTorch-1.7.0_disable-dev-shm-test.patch | ||
| ] | ||
|
|
||
| osdependencies = [OS_PKG_IBVERBS_DEV] | ||
|
|
||
| builddependencies = [ | ||
| ('CMake', '3.18.4'), | ||
| ('hypothesis', '5.41.5'), | ||
| ] | ||
|
|
||
| dependencies = [ | ||
| ('Ninja', '1.10.1'), # Required for JIT compilation of C++ extensions | ||
| ('Python', '3.8.6'), | ||
| ('protobuf', '3.14.0'), | ||
| ('protobuf-python', '3.14.0'), | ||
| ('pybind11', '2.6.0'), | ||
| ('SciPy-bundle', '2020.11'), | ||
| ('typing-extensions', '3.7.4.3'), | ||
| ('PyYAML', '5.3.1'), | ||
| ('MPFR', '4.1.0'), | ||
| ('GMP', '6.2.0'), | ||
| ('numactl', '2.0.13'), | ||
| ('FFmpeg', '4.3.1'), | ||
| ('Pillow', '8.0.1'), | ||
| ] | ||
|
|
||
| excluded_tests = { | ||
| '': [ | ||
| # Test from this suite timeout often. The process group backend is deprecated anyway | ||
| 'distributed/rpc/test_process_group_agent', | ||
| # Potentially problematic save/load issue with test_lstm on only some machines. Tell users to verify save&load! | ||
| # https://github.com/pytorch/pytorch/issues/43209 | ||
| 'test_quantization', | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @branfosj DId you check whether we still see failures? I can test on our Cascade Lake system where I saw issues with this earlier (cfr. pytorch/pytorch#43209)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've not yet checked that. I'll run a test on our Cascade Lake where we run that test - though I do not know if we saw the issue you saw or not.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see the same failure with PyTorch 1.7.1 and 1.8.0 on our Cascade Lake.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test failure still occurs when I build with MKL. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Were you using a full-metal Cascade Lake machine, or were you using a VM on it?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can confirm that there are issues when optimizing for a cascade lake machine, e.g. tensorflow/tensorflow#47179 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot for the info! I had used gcc/g++ 9.3, but that TensorFlow issue you posted also seems quite relevant. I can try testing with a more recent version of gcc, although gcc 9.3 was released in March 2020.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW: 2019b uses GCC 8.3.0, 2020a (IIRC) 9.3.0 (which solves the TF issue for us) but as it is a toolchain generation it might also be related to dependencies being updated, so maybe not only the compiler, but that is the best bet as it looks like a misoptimization. |
||
| ] | ||
| } | ||
|
|
||
| runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py --verbose %(excluded_tests)s' | ||
|
|
||
| sanity_check_commands = ["python -c 'import caffe2.python'"] | ||
| tests = ['PyTorch-check-cpp-extension.py'] | ||
|
|
||
| moduleclass = 'devel' | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@branfosj Any concerns here w.r.t. reproducibility? Or are the submodules "locked" to a particular commit anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all locked to specific commits - see https://github.com/pytorch/pytorch/tree/v1.8.0/third_party and subdirectories. The only issue we'd have is if PyTorch reused the tag - then we'd get a different download (with, potentially, a different set of items in the
third_partydirectory).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, a downside of this I see is that
--fetchlikely doesn't work, i.e. a full offline install fails, or does EB handle that?Also no checksums...
BTW: There is a script in framework to create the sources list out of a valid git checkout (must have
git submodule updatedone)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--fetchworks (so long as you do not hit easybuilders/easybuild-framework#3619).