{ai}[foss/2022a] PyTorch v2.1.2 w/ CUDA 11.7.0 #21690

Flamefire · 2024-10-17T14:27:34Z

(created using eb --new-pr)

Requires:

{tools}[GCCcore/10.3.0 - 14.2.0] unittest-xml-reporting v3.1.0, lxml v5.3.0, libxslt v1.1.42 #22205

Flamefire · 2024-10-17T20:23:48Z

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
n1069 - Linux RHEL 8.9 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.17
See https://gist.github.com/Flamefire/7d92c53ec4ff113f15aca3227ffb7a70 for a full test report.

Flamefire · 2024-10-18T04:18:15Z

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
i8006 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 555.42.06, Python 3.8.17
See https://gist.github.com/Flamefire/594b505ba13b42102dd4bc1314217979 for a full test report.

Flamefire · 2024-10-26T14:36:11Z

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
ml6 - Linux AlmaLinux 8.7 (Stone Smilodon), POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 530.30.02, Python 3.8.13
See https://gist.github.com/Flamefire/ce36733fa874d4f4dea9cdbb110423d0 for a full test report.

github-actions · 2024-11-19T15:45:50Z

Updated software `PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb`

Diff against PyTorch-2.3.0-foss-2023b.eb

easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b.eb

diff --git a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b.eb b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb
index 308397336a..515a5b6c8b 100644
--- a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0-foss-2023b.eb
+++ b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb
@@ -1,16 +1,18 @@
 name = 'PyTorch'
-version = '2.3.0'
+version = '2.1.2'
+versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://pytorch.org/'
 description = """Tensors and Dynamic neural networks in Python with strong GPU acceleration.
 PyTorch is a deep learning framework that puts Python first."""
 
-toolchain = {'name': 'foss', 'version': '2023b'}
+toolchain = {'name': 'foss', 'version': '2022a'}
 
 source_urls = [GITHUB_RELEASE]
 sources = ['%(namelower)s-v%(version)s.tar.gz']
 patches = [
     'PyTorch-1.7.0_disable-dev-shm-test.patch',
+    'PyTorch-1.11.1_skip-test_init_from_local_shards.patch',
     'PyTorch-1.12.1_add-hypothesis-suppression.patch',
     'PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch',
     'PyTorch-1.12.1_fix-TestTorch.test_to.patch',
@@ -22,34 +24,51 @@ patches = [
     'PyTorch-1.13.1_skip-tests-without-fbgemm.patch',
     'PyTorch-2.0.1_avoid-test_quantization-failures.patch',
     'PyTorch-2.0.1_fix-skip-decorators.patch',
+    'PyTorch-2.0.1_fix-ub-in-inductor-codegen.patch',
     'PyTorch-2.0.1_fix-vsx-loadu.patch',
+    'PyTorch-2.0.1_no-cuda-stubs-rpath.patch',
     'PyTorch-2.0.1_skip-failing-gradtest.patch',
     'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch',
     'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch',
+    'PyTorch-2.1.0_disable-cudnn-tf32-for-too-strict-tests.patch',
+    'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch',
+    'PyTorch-2.1.0_fix-validationError-output-test.patch',
     'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch',
     'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch',
+    'PyTorch-2.1.0_remove-sparse-csr-nnz-overflow-test.patch',
     'PyTorch-2.1.0_remove-test-requiring-online-access.patch',
     'PyTorch-2.1.0_skip-diff-test-on-ppc.patch',
     'PyTorch-2.1.0_skip-dynamo-test_predispatch.patch',
     'PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch',
+    'PyTorch-2.1.0_skip-test_linear_fp32-without-MKL.patch',
+    'PyTorch-2.1.0_skip-test_wrap_bad.patch',
+    'PyTorch-2.1.2_add-cuda-skip-markers.patch',
+    'PyTorch-2.1.2_fix-conj-mismatch-test-failures.patch',
+    'PyTorch-2.1.2_fix-device-mesh-check.patch',
+    'PyTorch-2.1.2_fix-fsdp-tp-integration-test.patch',
+    ('PyTorch-2.1.2_fix-kineto-cuda-11.7-compat.patch', 'third_party/kineto'),
+    'PyTorch-2.1.2_fix-test_cuda-non-x86.patch',
+    'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch',
+    'PyTorch-2.1.2_fix-test_memory_profiler.patch',
+    'PyTorch-2.1.2_fix-test_parallelize_api.patch',
+    'PyTorch-2.1.2_fix-test-tp_random_state.patch',
+    'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch',
+    'PyTorch-2.1.2_fix-vsx-vector-abs.patch',
+    'PyTorch-2.1.2_fix-vsx-vector-div.patch',
+    'PyTorch-2.1.2_fix-with_temp_dir-decorator.patch',
+    'PyTorch-2.1.2_fix-wrong-device-mesh-size-in-tests.patch',
+    'PyTorch-2.1.2_relax-cuda-tolerances.patch',
+    'PyTorch-2.1.2_remove-nccl-backend-default-without-gpus.patch',
     'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch',
+    'PyTorch-2.1.2_skip-failing-test_dtensor_ops-subtests.patch',
+    'PyTorch-2.1.2_skip-test_fsdp_tp_checkpoint_integration.patch',
     'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch',
-    'PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch',
-    'PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch',
-    'PyTorch-2.3.0_skip-test_init_from_local_shards.patch',
-    'PyTorch-2.3.0_no-cuda-stubs-rpath.patch',
-    'PyTorch-2.3.0_disable-gcc12-warning.patch',
-    'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch',
-    'PyTorch-2.3.0_disable_tests_which_need_network_download.patch',
-    'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch',
-    'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch',
-    'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch',
-    'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch',
-    'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch',
 ]
 checksums = [
-    {'pytorch-v2.3.0.tar.gz': '69579513b26261bbab32e13b7efc99ad287fcf3103087f2d4fdf1adacd25316f'},
+    {'pytorch-v2.1.2.tar.gz': '85effbcce037bffa290aea775c9a4bad5f769cb229583450c40055501ee1acd7'},
     {'PyTorch-1.7.0_disable-dev-shm-test.patch': '622cb1eaeadc06e13128a862d9946bcc1f1edd3d02b259c56a9aecc4d5406b8a'},
+    {'PyTorch-1.11.1_skip-test_init_from_local_shards.patch':
+     '4aeb1b0bc863d4801b0095cbce69f8794066748f0df27c6aaaf729c5ecba04b7'},
     {'PyTorch-1.12.1_add-hypothesis-suppression.patch':
      'e71ffb94ebe69f580fa70e0de84017058325fdff944866d6bd03463626edc32c'},
     {'PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch':
@@ -69,16 +88,27 @@ checksums = [
     {'PyTorch-2.0.1_avoid-test_quantization-failures.patch':
      '02e3f47e4ed1d7d6077e26f1ae50073dc2b20426269930b505f4aefe5d2f33cd'},
     {'PyTorch-2.0.1_fix-skip-decorators.patch': '2039012cef45446065e1a2097839fe20bb29fe3c1dcc926c3695ebf29832e920'},
+    {'PyTorch-2.0.1_fix-ub-in-inductor-codegen.patch':
+     '1b37194f55ae678f3657b8728dfb896c18ffe8babe90987ce468c4fa9274f357'},
     {'PyTorch-2.0.1_fix-vsx-loadu.patch': 'a0ffa61da2d47c6acd09aaf6d4791e527d8919a6f4f1aa7ed38454cdcadb1f72'},
+    {'PyTorch-2.0.1_no-cuda-stubs-rpath.patch': '8902e58a762240f24cdbf0182e99ccdfc2a93492869352fcb4ca0ec7e407f83a'},
     {'PyTorch-2.0.1_skip-failing-gradtest.patch': '8030bdec6ba49b057ab232d19a7f1a5e542e47e2ec340653a246ec9ed59f8bc1'},
     {'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch':
      '7047862abc1abaff62954da59700f36d4f39fcf83167a638183b1b7f8fec78ae'},
     {'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch':
      '166c134573a95230e39b9ea09ece3ad8072f39d370c9a88fb2a1e24f6aaac2b5'},
+    {'PyTorch-2.1.0_disable-cudnn-tf32-for-too-strict-tests.patch':
+     'd895018ebdfd46e65d9f7645444a3b4c5bbfe3d533a08db559a04be34e01e478'},
+    {'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch':
+     'b15b1291a3c37bf6a4982cfbb3483f693acb46a67bc0912b383fd98baf540ccf'},
+    {'PyTorch-2.1.0_fix-validationError-output-test.patch':
+     '7eba0942afb121ed92fac30d1529447d892a89eb3d53c565f8e9d480e95f692b'},
     {'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch':
      '3793b4b878be1abe7791efcbd534774b87862cfe7dc4774ca8729b6cabb39e7e'},
     {'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch':
      'aef38adf1210d0c5455e91d7c7a9d9e5caad3ae568301e0ba9fc204309438e7b'},
+    {'PyTorch-2.1.0_remove-sparse-csr-nnz-overflow-test.patch':
+     '0ac36411e76506b3354c85a8a1260987f66af947ee52ffc64230aee1fa02ea8b'},
     {'PyTorch-2.1.0_remove-test-requiring-online-access.patch':
      '35184b8c5a1b10f79e511cc25db3b8a5585a5d58b5d1aa25dd3d250200b14fd7'},
     {'PyTorch-2.1.0_skip-diff-test-on-ppc.patch': '394157dbe565ffcbc1821cd63d05930957412156cc01e949ef3d3524176a1dda'},
@@ -86,68 +116,82 @@ checksums = [
      '6298daf9ddaa8542850eee9ea005f28594ab65b1f87af43d8aeca1579a8c4354'},
     {'PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch':
      '5229ca88a71db7667a90ddc0b809b2c817698bd6e9c5aaabd73d3173cf9b99fe'},
+    {'PyTorch-2.1.0_skip-test_linear_fp32-without-MKL.patch':
+     '5dcc79883b6e3ec0a281a8e110db5e0a5880de843bb05653589891f16473ead5'},
+    {'PyTorch-2.1.0_skip-test_wrap_bad.patch': 'b8583125ee94e553b6f77c4ab4bfa812b89416175dc7e9b7390919f3b485cb63'},
+    {'PyTorch-2.1.2_add-cuda-skip-markers.patch': 'd007d6d0cdb533e7d01f503e9055218760123a67c1841c57585385144be18c9a'},
+    {'PyTorch-2.1.2_fix-conj-mismatch-test-failures.patch':
+     'c164357efa4ce88095376e590ba508fc1daa87161e1e59544eda56daac7f2847'},
+    {'PyTorch-2.1.2_fix-device-mesh-check.patch': 'c0efc288bf3d9a9a3c8bbd2691348a589a2677ea43880a8c987db91c8de4806b'},
+    {'PyTorch-2.1.2_fix-fsdp-tp-integration-test.patch':
+     'f583532c59f35f36998851957d501b3ac8c883884efd61bbaa308db55cb6bdcd'},
+    {'PyTorch-2.1.2_fix-kineto-cuda-11.7-compat.patch':
+     '742f8e9b911b6cde19cdc05804d60a043e7cb936e994e66a6fe6a9490c53e34f'},
+    {'PyTorch-2.1.2_fix-test_cuda-non-x86.patch': '1ed76fcc87e6c50606ac286487292a3d534707068c94af74c3a5de8153fa2c2c'},
+    {'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch':
+     'cd1455495886a7d6b2d30d48736eb0103fded21e2e36de6baac719b9c52a1c92'},
+    {'PyTorch-2.1.2_fix-test_memory_profiler.patch':
+     '30b0c9355636c0ab3dedae02399789053825dc3835b4d7dac6e696767772b1ce'},
+    {'PyTorch-2.1.2_fix-test_parallelize_api.patch':
+     'f8387a1693af344099c806981ca38df1306d7f4847d7d44713306338384b1cfd'},
+    {'PyTorch-2.1.2_fix-test-tp_random_state.patch':
+     'da352e32471ae9a68920c91a122c7194f3c53d947536116e33216d3ae6b15e61'},
+    {'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch':
+     'a0ef99192ee2ad1509c78a8377023d5be2b5fddb16f84063b7c9a0b53d979090'},
+    {'PyTorch-2.1.2_fix-vsx-vector-abs.patch': 'd67d32407faed7dc1dbab4bba0e2f7de36c3db04560ced35c94caf8d84ade886'},
+    {'PyTorch-2.1.2_fix-vsx-vector-div.patch': '11f497a6892eb49b249a15320e4218e0d7ac8ae4ce67de39e4a018a064ca1acc'},
+    {'PyTorch-2.1.2_fix-with_temp_dir-decorator.patch':
+     '90bd001e034095329277d70c6facc4026b4ce6d7f8b8d6aa81c0176eeb462eb1'},
+    {'PyTorch-2.1.2_fix-wrong-device-mesh-size-in-tests.patch':
+     '07a5e4233d02fb6348872838f4d69573c777899c6f0ea4e39ae23c08660d41e5'},
+    {'PyTorch-2.1.2_relax-cuda-tolerances.patch': '554ad09787f61080fafdb84216e711e32327aa357e2a9c40bb428eb6503dee6e'},
+    {'PyTorch-2.1.2_remove-nccl-backend-default-without-gpus.patch':
+     'e6a1efe3d127fcbf4723476a7a1c01cfcf2ccb16d1fb250f478192623e8b6a15'},
     {'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch':
      '7ace835af60c58d9e0754a34c19d4b9a0c3a531f19e5d0eba8e2e49206eaa7eb'},
+    {'PyTorch-2.1.2_skip-failing-test_dtensor_ops-subtests.patch':
+     '6cf711bf26518550903b09ed4431de9319791e79d61aab065785d6608fd5cc88'},
+    {'PyTorch-2.1.2_skip-test_fsdp_tp_checkpoint_integration.patch':
+     '943ee92f5fd518f608a59e43fe426b9bb45d7e7ad0ba04639e516db2d61fa57d'},
     {'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch':
      'fb96eefabf394617bbb3fbd3a7a7c1aa5991b3836edc2e5d2a30e708bfe49ba1'},
-    {'PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch':
-     '23416f2d9d5226695ec3fbea0671e3650c655c19deefd3f0f8ddab5afa50f485'},
-    {'PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch':
-     '0dcbdfde6752c3ff54c5376f521b4a742167669feb7f0f1d4e1d4d55f72b664f'},
-    {'PyTorch-2.3.0_skip-test_init_from_local_shards.patch':
-     '90ed9c2870f57ee6dc032d00873a37e2217a2b92a13035ded1c25ad5306455f2'},
-    {'PyTorch-2.3.0_no-cuda-stubs-rpath.patch':
-     '7ba26824b5def7379cff02ae821a080698e6affea0da45bc846e9ecb89939cb1'},
-    {'PyTorch-2.3.0_disable-gcc12-warning.patch':
-     'a8a624e1a2a5f4c82610173e50bd0f853e49bd5621b432f5aac689f9f6eb1514'},
-    {'PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch':
-     '36aa2d5ba175be17f4e996f4fb2d544fe477d4a0bd0644cd59a85063779afc8e'},
-    {'PyTorch-2.3.0_disable_tests_which_need_network_download.patch':
-     'b7fd1a5135dfd4098cdc054182f7bf84a23ac98462a00477712182b5442da855'},
-    {'PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch':
-     '041adcd91d994b8c2ab57d227f081cd57e572c157117b37171e1eb8eb576f8fc'},
-    {'PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch':
-     'aa6ff764f3f7bf84372a8a257fe1b4ae6dc4b9744ad35f0f9015f2696c62a41e'},
-    {'PyTorch-2.3.0_skip_test_var_mean_differentiable.patch':
-     '9703fd0f1fca8916f6d79d83e9a7efe8e3f717362a5fdaa8f5d9da90d0c75018'},
-    {'PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch':
-     '7955f2655db3da18606574fdcbc5990be24098f49ad1db5e86ea756ea1cc506f'},
-    {'PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch':
-     'ee07d21c3ac7aeb0bd0e39507b18a417b9125284a529102929c4b5c6727c2976'},
 ]
 
 osdependencies = [OS_PKG_IBVERBS_DEV]
 
 builddependencies = [
-    ('CMake', '3.27.6'),
-    ('hypothesis', '6.90.0'),
+    ('CMake', '3.23.1'),
+    ('hypothesis', '6.46.7'),
     # For tests
     ('pytest-flakefinder', '1.1.0'),
-    ('pytest-rerunfailures', '14.0'),
+    ('pytest-rerunfailures', '11.1'),
     ('pytest-shard', '0.1.2'),
-    ('tlparse', '0.3.5'),
-    ('optree', '0.13.0'),
+    ('pytest-xdist', '2.5.0'),
     ('unittest-xml-reporting', '3.1.0'),
 ]
 
 dependencies = [
-    ('Ninja', '1.11.1'),  # Required for JIT compilation of C++ extensions
-    ('Python', '3.11.5'),
-    ('Python-bundle-PyPI', '2023.10'),
-    ('protobuf', '25.3'),
-    ('protobuf-python', '4.25.3'),
-    ('pybind11', '2.11.1'),
-    ('SciPy-bundle', '2023.11'),
-    ('PyYAML', '6.0.1'),
-    ('MPFR', '4.2.1'),
-    ('GMP', '6.3.0'),
-    ('numactl', '2.0.16'),
-    ('FFmpeg', '6.0'),
-    ('Pillow', '10.2.0'),
-    ('expecttest', '0.2.1'),
-    ('networkx', '3.2.1'),
-    ('sympy', '1.12'),
-    ('Z3', '4.13.0',),
+    ('CUDA', '11.7.0', '', SYSTEM),
+    ('cuDNN', '8.4.1.50', '-CUDA-%(cudaver)s', SYSTEM),
+    ('magma', '2.6.2', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.12.12', '-CUDA-%(cudaver)s'),
+    ('Ninja', '1.10.2'),  # Required for JIT compilation of C++ extensions
+    ('Python', '3.10.4'),
+    ('protobuf', '3.19.4'),
+    ('protobuf-python', '3.19.4'),
+    ('pybind11', '2.9.2'),
+    ('SciPy-bundle', '2022.05'),
+    ('typing-extensions', '4.3.0'),
+    ('PyYAML', '6.0'),
+    ('MPFR', '4.1.0'),
+    ('GMP', '6.2.1'),
+    ('numactl', '2.0.14'),
+    ('FFmpeg', '4.4.2'),
+    ('Pillow', '9.1.1'),
+    ('expecttest', '0.1.3'),
+    ('networkx', '2.8.4'),
+    ('sympy', '1.10.1'),
+    ('Z3', '4.10.2', '-Python-%(pyver)s'),
 ]
 
 buildcmd = '%(python)s setup.py build'  # Run the (long) build in the build step
@@ -166,24 +210,22 @@ excluded_tests = {
         # intermittent failures on various systems
         # See https://github.com/easybuilders/easybuild-easyconfigs/issues/17712
         'distributed/rpc/test_tensorpipe_agent',
-        # This test is expected to fail when run in their CI, but won't in our case.
-        # It just checks for a "CI" env variable
-        'test_ci_sanity_check_fail',
-        # This fails consistently and is disabled upstream
-        # See https://github.com/pytorch/pytorch/issues/100152 and
-        # https://github.com/pytorch/pytorch/pull/124712
-        'test_cpp_extensions_open_device_registration',
-
     ]
 }
 
-local_test_opts = '--continue-through-error --pipe-logs --verbose %(excluded_tests)s'
-runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py ' + local_test_opts
+runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py --continue-through-error  --verbose %(excluded_tests)s'
 
 # Especially test_quantization has a few corner cases that are triggered by the random input values,
 # those cannot be easily avoided, see https://github.com/pytorch/pytorch/issues/107030
 # So allow a low number of tests to fail as the tests "usually" succeed
-max_failed_tests = 6
+max_failed_tests = 2
+
+# The readelf sanity check command can be taken out once the TestRPATH test from
+# https://github.com/pytorch/pytorch/pull/109493 is accepted, since it is then checked as part of the PyTorch test suite
+local_libcaffe2 = "$EBROOTPYTORCH/lib/python%%(pyshortver)s/site-packages/torch/lib/libcaffe2_nvrtc.%s" % SHLIB_EXT
+sanity_check_commands = [
+    "readelf -d %s | egrep 'RPATH|RUNPATH' | grep -v stubs" % local_libcaffe2,
+]
 
 tests = ['PyTorch-check-cpp-extension.py']

Diff against PyTorch-2.1.2-foss-2023b.eb

easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023b.eb

diff --git a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023b.eb b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb
index b4b25bd33e..515a5b6c8b 100644
--- a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023b.eb
+++ b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb
@@ -1,11 +1,12 @@
 name = 'PyTorch'
 version = '2.1.2'
+versionsuffix = '-CUDA-%(cudaver)s'
 
 homepage = 'https://pytorch.org/'
 description = """Tensors and Dynamic neural networks in Python with strong GPU acceleration.
 PyTorch is a deep learning framework that puts Python first."""
 
-toolchain = {'name': 'foss', 'version': '2023b'}
+toolchain = {'name': 'foss', 'version': '2022a'}
 
 source_urls = [GITHUB_RELEASE]
 sources = ['%(namelower)s-v%(version)s.tar.gz']
@@ -29,9 +30,8 @@ patches = [
     'PyTorch-2.0.1_skip-failing-gradtest.patch',
     'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch',
     'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch',
-    'PyTorch-2.1.0_disable-gcc12-warning.patch',
+    'PyTorch-2.1.0_disable-cudnn-tf32-for-too-strict-tests.patch',
     'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch',
-    'PyTorch-2.1.0_fix-test_numpy_torch_operators.patch',
     'PyTorch-2.1.0_fix-validationError-output-test.patch',
     'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch',
     'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch',
@@ -42,13 +42,26 @@ patches = [
     'PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch',
     'PyTorch-2.1.0_skip-test_linear_fp32-without-MKL.patch',
     'PyTorch-2.1.0_skip-test_wrap_bad.patch',
+    'PyTorch-2.1.2_add-cuda-skip-markers.patch',
+    'PyTorch-2.1.2_fix-conj-mismatch-test-failures.patch',
+    'PyTorch-2.1.2_fix-device-mesh-check.patch',
+    'PyTorch-2.1.2_fix-fsdp-tp-integration-test.patch',
+    ('PyTorch-2.1.2_fix-kineto-cuda-11.7-compat.patch', 'third_party/kineto'),
+    'PyTorch-2.1.2_fix-test_cuda-non-x86.patch',
     'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch',
     'PyTorch-2.1.2_fix-test_memory_profiler.patch',
+    'PyTorch-2.1.2_fix-test_parallelize_api.patch',
+    'PyTorch-2.1.2_fix-test-tp_random_state.patch',
     'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch',
     'PyTorch-2.1.2_fix-vsx-vector-abs.patch',
     'PyTorch-2.1.2_fix-vsx-vector-div.patch',
+    'PyTorch-2.1.2_fix-with_temp_dir-decorator.patch',
+    'PyTorch-2.1.2_fix-wrong-device-mesh-size-in-tests.patch',
+    'PyTorch-2.1.2_relax-cuda-tolerances.patch',
+    'PyTorch-2.1.2_remove-nccl-backend-default-without-gpus.patch',
     'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch',
-    'PyTorch-2.1.2_skip-memory-leak-test.patch',
+    'PyTorch-2.1.2_skip-failing-test_dtensor_ops-subtests.patch',
+    'PyTorch-2.1.2_skip-test_fsdp_tp_checkpoint_integration.patch',
     'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch',
 ]
 checksums = [
@@ -84,11 +97,10 @@ checksums = [
      '7047862abc1abaff62954da59700f36d4f39fcf83167a638183b1b7f8fec78ae'},
     {'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch':
      '166c134573a95230e39b9ea09ece3ad8072f39d370c9a88fb2a1e24f6aaac2b5'},
-    {'PyTorch-2.1.0_disable-gcc12-warning.patch': 'c858b8db0010f41005dc06f9a50768d0d3dc2d2d499ccbdd5faf8a518869a421'},
+    {'PyTorch-2.1.0_disable-cudnn-tf32-for-too-strict-tests.patch':
+     'd895018ebdfd46e65d9f7645444a3b4c5bbfe3d533a08db559a04be34e01e478'},
     {'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch':
      'b15b1291a3c37bf6a4982cfbb3483f693acb46a67bc0912b383fd98baf540ccf'},
-    {'PyTorch-2.1.0_fix-test_numpy_torch_operators.patch':
-     '84bb51a719abc677031a7a3dfe4382ff098b0cbd8b39b8bed2a7fa03f80ac1e9'},
     {'PyTorch-2.1.0_fix-validationError-output-test.patch':
      '7eba0942afb121ed92fac30d1529447d892a89eb3d53c565f8e9d480e95f692b'},
     {'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch':
@@ -107,17 +119,40 @@ checksums = [
     {'PyTorch-2.1.0_skip-test_linear_fp32-without-MKL.patch':
      '5dcc79883b6e3ec0a281a8e110db5e0a5880de843bb05653589891f16473ead5'},
     {'PyTorch-2.1.0_skip-test_wrap_bad.patch': 'b8583125ee94e553b6f77c4ab4bfa812b89416175dc7e9b7390919f3b485cb63'},
+    {'PyTorch-2.1.2_add-cuda-skip-markers.patch': 'd007d6d0cdb533e7d01f503e9055218760123a67c1841c57585385144be18c9a'},
+    {'PyTorch-2.1.2_fix-conj-mismatch-test-failures.patch':
+     'c164357efa4ce88095376e590ba508fc1daa87161e1e59544eda56daac7f2847'},
+    {'PyTorch-2.1.2_fix-device-mesh-check.patch': 'c0efc288bf3d9a9a3c8bbd2691348a589a2677ea43880a8c987db91c8de4806b'},
+    {'PyTorch-2.1.2_fix-fsdp-tp-integration-test.patch':
+     'f583532c59f35f36998851957d501b3ac8c883884efd61bbaa308db55cb6bdcd'},
+    {'PyTorch-2.1.2_fix-kineto-cuda-11.7-compat.patch':
+     '742f8e9b911b6cde19cdc05804d60a043e7cb936e994e66a6fe6a9490c53e34f'},
+    {'PyTorch-2.1.2_fix-test_cuda-non-x86.patch': '1ed76fcc87e6c50606ac286487292a3d534707068c94af74c3a5de8153fa2c2c'},
     {'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch':
      'cd1455495886a7d6b2d30d48736eb0103fded21e2e36de6baac719b9c52a1c92'},
     {'PyTorch-2.1.2_fix-test_memory_profiler.patch':
      '30b0c9355636c0ab3dedae02399789053825dc3835b4d7dac6e696767772b1ce'},
+    {'PyTorch-2.1.2_fix-test_parallelize_api.patch':
+     'f8387a1693af344099c806981ca38df1306d7f4847d7d44713306338384b1cfd'},
+    {'PyTorch-2.1.2_fix-test-tp_random_state.patch':
+     'da352e32471ae9a68920c91a122c7194f3c53d947536116e33216d3ae6b15e61'},
     {'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch':
      'a0ef99192ee2ad1509c78a8377023d5be2b5fddb16f84063b7c9a0b53d979090'},
     {'PyTorch-2.1.2_fix-vsx-vector-abs.patch': 'd67d32407faed7dc1dbab4bba0e2f7de36c3db04560ced35c94caf8d84ade886'},
     {'PyTorch-2.1.2_fix-vsx-vector-div.patch': '11f497a6892eb49b249a15320e4218e0d7ac8ae4ce67de39e4a018a064ca1acc'},
+    {'PyTorch-2.1.2_fix-with_temp_dir-decorator.patch':
+     '90bd001e034095329277d70c6facc4026b4ce6d7f8b8d6aa81c0176eeb462eb1'},
+    {'PyTorch-2.1.2_fix-wrong-device-mesh-size-in-tests.patch':
+     '07a5e4233d02fb6348872838f4d69573c777899c6f0ea4e39ae23c08660d41e5'},
+    {'PyTorch-2.1.2_relax-cuda-tolerances.patch': '554ad09787f61080fafdb84216e711e32327aa357e2a9c40bb428eb6503dee6e'},
+    {'PyTorch-2.1.2_remove-nccl-backend-default-without-gpus.patch':
+     'e6a1efe3d127fcbf4723476a7a1c01cfcf2ccb16d1fb250f478192623e8b6a15'},
     {'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch':
      '7ace835af60c58d9e0754a34c19d4b9a0c3a531f19e5d0eba8e2e49206eaa7eb'},
-    {'PyTorch-2.1.2_skip-memory-leak-test.patch': '8d9841208e8a00a498295018aead380c360cf56e500ef23ca740adb5b36de142'},
+    {'PyTorch-2.1.2_skip-failing-test_dtensor_ops-subtests.patch':
+     '6cf711bf26518550903b09ed4431de9319791e79d61aab065785d6608fd5cc88'},
+    {'PyTorch-2.1.2_skip-test_fsdp_tp_checkpoint_integration.patch':
+     '943ee92f5fd518f608a59e43fe426b9bb45d7e7ad0ba04639e516db2d61fa57d'},
     {'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch':
      'fb96eefabf394617bbb3fbd3a7a7c1aa5991b3836edc2e5d2a30e708bfe49ba1'},
 ]
@@ -125,32 +160,38 @@ checksums = [
 osdependencies = [OS_PKG_IBVERBS_DEV]
 
 builddependencies = [
-    ('CMake', '3.27.6'),
-    ('hypothesis', '6.90.0'),
+    ('CMake', '3.23.1'),
+    ('hypothesis', '6.46.7'),
     # For tests
     ('pytest-flakefinder', '1.1.0'),
-    ('pytest-rerunfailures', '14.0'),
+    ('pytest-rerunfailures', '11.1'),
     ('pytest-shard', '0.1.2'),
+    ('pytest-xdist', '2.5.0'),
+    ('unittest-xml-reporting', '3.1.0'),
 ]
 
 dependencies = [
-    ('Ninja', '1.11.1'),  # Required for JIT compilation of C++ extensions
-    ('Python', '3.11.5'),
-    ('Python-bundle-PyPI', '2023.10'),
-    ('protobuf', '25.3'),
-    ('protobuf-python', '4.25.3'),
-    ('pybind11', '2.11.1'),
-    ('SciPy-bundle', '2023.11'),
-    ('PyYAML', '6.0.1'),
-    ('MPFR', '4.2.1'),
-    ('GMP', '6.3.0'),
-    ('numactl', '2.0.16'),
-    ('FFmpeg', '6.0'),
-    ('Pillow', '10.2.0'),
-    ('expecttest', '0.2.1'),
-    ('networkx', '3.2.1'),
-    ('sympy', '1.12'),
-    ('Z3', '4.13.0',),
+    ('CUDA', '11.7.0', '', SYSTEM),
+    ('cuDNN', '8.4.1.50', '-CUDA-%(cudaver)s', SYSTEM),
+    ('magma', '2.6.2', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.12.12', '-CUDA-%(cudaver)s'),
+    ('Ninja', '1.10.2'),  # Required for JIT compilation of C++ extensions
+    ('Python', '3.10.4'),
+    ('protobuf', '3.19.4'),
+    ('protobuf-python', '3.19.4'),
+    ('pybind11', '2.9.2'),
+    ('SciPy-bundle', '2022.05'),
+    ('typing-extensions', '4.3.0'),
+    ('PyYAML', '6.0'),
+    ('MPFR', '4.1.0'),
+    ('GMP', '6.2.1'),
+    ('numactl', '2.0.14'),
+    ('FFmpeg', '4.4.2'),
+    ('Pillow', '9.1.1'),
+    ('expecttest', '0.1.3'),
+    ('networkx', '2.8.4'),
+    ('sympy', '1.10.1'),
+    ('Z3', '4.10.2', '-Python-%(pyver)s'),
 ]
 
 buildcmd = '%(python)s setup.py build'  # Run the (long) build in the build step
@@ -179,6 +220,13 @@ runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py --continue-throu
 # So allow a low number of tests to fail as the tests "usually" succeed
 max_failed_tests = 2
 
+# The readelf sanity check command can be taken out once the TestRPATH test from
+# https://github.com/pytorch/pytorch/pull/109493 is accepted, since it is then checked as part of the PyTorch test suite
+local_libcaffe2 = "$EBROOTPYTORCH/lib/python%%(pyshortver)s/site-packages/torch/lib/libcaffe2_nvrtc.%s" % SHLIB_EXT
+sanity_check_commands = [
+    "readelf -d %s | egrep 'RPATH|RUNPATH' | grep -v stubs" % local_libcaffe2,
+]
+
 tests = ['PyTorch-check-cpp-extension.py']
 
 moduleclass = 'ai'

Diff against PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb

easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb

diff --git a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb
index 6432bd1932..515a5b6c8b 100644
--- a/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb
+++ b/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb
@@ -6,7 +6,7 @@ homepage = 'https://pytorch.org/'
 description = """Tensors and Dynamic neural networks in Python with strong GPU acceleration.
 PyTorch is a deep learning framework that puts Python first."""
 
-toolchain = {'name': 'foss', 'version': '2023a'}
+toolchain = {'name': 'foss', 'version': '2022a'}
 
 source_urls = [GITHUB_RELEASE]
 sources = ['%(namelower)s-v%(version)s.tar.gz']
@@ -30,10 +30,8 @@ patches = [
     'PyTorch-2.0.1_skip-failing-gradtest.patch',
     'PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch',
     'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch',
-    'PyTorch-2.1.0_disable-gcc12-warning.patch',
     'PyTorch-2.1.0_disable-cudnn-tf32-for-too-strict-tests.patch',
     'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch',
-    'PyTorch-2.1.0_fix-test_numpy_torch_operators.patch',
     'PyTorch-2.1.0_fix-validationError-output-test.patch',
     'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch',
     'PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch',
@@ -47,9 +45,13 @@ patches = [
     'PyTorch-2.1.2_add-cuda-skip-markers.patch',
     'PyTorch-2.1.2_fix-conj-mismatch-test-failures.patch',
     'PyTorch-2.1.2_fix-device-mesh-check.patch',
-    'PyTorch-2.1.2_fix-locale-issue-in-nvrtcCompileProgram.patch',
+    'PyTorch-2.1.2_fix-fsdp-tp-integration-test.patch',
+    ('PyTorch-2.1.2_fix-kineto-cuda-11.7-compat.patch', 'third_party/kineto'),
+    'PyTorch-2.1.2_fix-test_cuda-non-x86.patch',
     'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch',
     'PyTorch-2.1.2_fix-test_memory_profiler.patch',
+    'PyTorch-2.1.2_fix-test_parallelize_api.patch',
+    'PyTorch-2.1.2_fix-test-tp_random_state.patch',
     'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch',
     'PyTorch-2.1.2_fix-vsx-vector-abs.patch',
     'PyTorch-2.1.2_fix-vsx-vector-div.patch',
@@ -59,7 +61,6 @@ patches = [
     'PyTorch-2.1.2_remove-nccl-backend-default-without-gpus.patch',
     'PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch',
     'PyTorch-2.1.2_skip-failing-test_dtensor_ops-subtests.patch',
-    'PyTorch-2.1.2_skip-memory-leak-test.patch',
     'PyTorch-2.1.2_skip-test_fsdp_tp_checkpoint_integration.patch',
     'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch',
 ]
@@ -96,13 +97,10 @@ checksums = [
      '7047862abc1abaff62954da59700f36d4f39fcf83167a638183b1b7f8fec78ae'},
     {'PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch':
      '166c134573a95230e39b9ea09ece3ad8072f39d370c9a88fb2a1e24f6aaac2b5'},
-    {'PyTorch-2.1.0_disable-gcc12-warning.patch': 'c858b8db0010f41005dc06f9a50768d0d3dc2d2d499ccbdd5faf8a518869a421'},
     {'PyTorch-2.1.0_disable-cudnn-tf32-for-too-strict-tests.patch':
      'd895018ebdfd46e65d9f7645444a3b4c5bbfe3d533a08db559a04be34e01e478'},
     {'PyTorch-2.1.0_fix-bufferoverflow-in-oneDNN.patch':
      'b15b1291a3c37bf6a4982cfbb3483f693acb46a67bc0912b383fd98baf540ccf'},
-    {'PyTorch-2.1.0_fix-test_numpy_torch_operators.patch':
-     '84bb51a719abc677031a7a3dfe4382ff098b0cbd8b39b8bed2a7fa03f80ac1e9'},
     {'PyTorch-2.1.0_fix-validationError-output-test.patch':
      '7eba0942afb121ed92fac30d1529447d892a89eb3d53c565f8e9d480e95f692b'},
     {'PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch':
@@ -125,12 +123,19 @@ checksums = [
     {'PyTorch-2.1.2_fix-conj-mismatch-test-failures.patch':
      'c164357efa4ce88095376e590ba508fc1daa87161e1e59544eda56daac7f2847'},
     {'PyTorch-2.1.2_fix-device-mesh-check.patch': 'c0efc288bf3d9a9a3c8bbd2691348a589a2677ea43880a8c987db91c8de4806b'},
-    {'PyTorch-2.1.2_fix-locale-issue-in-nvrtcCompileProgram.patch':
-     'f7adafb4e4d3b724b93237a259797b6ed6f535f83be0e34a7b759c71c6a8ddf2'},
+    {'PyTorch-2.1.2_fix-fsdp-tp-integration-test.patch':
+     'f583532c59f35f36998851957d501b3ac8c883884efd61bbaa308db55cb6bdcd'},
+    {'PyTorch-2.1.2_fix-kineto-cuda-11.7-compat.patch':
+     '742f8e9b911b6cde19cdc05804d60a043e7cb936e994e66a6fe6a9490c53e34f'},
+    {'PyTorch-2.1.2_fix-test_cuda-non-x86.patch': '1ed76fcc87e6c50606ac286487292a3d534707068c94af74c3a5de8153fa2c2c'},
     {'PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch':
      'cd1455495886a7d6b2d30d48736eb0103fded21e2e36de6baac719b9c52a1c92'},
     {'PyTorch-2.1.2_fix-test_memory_profiler.patch':
      '30b0c9355636c0ab3dedae02399789053825dc3835b4d7dac6e696767772b1ce'},
+    {'PyTorch-2.1.2_fix-test_parallelize_api.patch':
+     'f8387a1693af344099c806981ca38df1306d7f4847d7d44713306338384b1cfd'},
+    {'PyTorch-2.1.2_fix-test-tp_random_state.patch':
+     'da352e32471ae9a68920c91a122c7194f3c53d947536116e33216d3ae6b15e61'},
     {'PyTorch-2.1.2_fix-test_torchinductor-rounding.patch':
      'a0ef99192ee2ad1509c78a8377023d5be2b5fddb16f84063b7c9a0b53d979090'},
     {'PyTorch-2.1.2_fix-vsx-vector-abs.patch': 'd67d32407faed7dc1dbab4bba0e2f7de36c3db04560ced35c94caf8d84ade886'},
@@ -146,7 +151,6 @@ checksums = [
      '7ace835af60c58d9e0754a34c19d4b9a0c3a531f19e5d0eba8e2e49206eaa7eb'},
     {'PyTorch-2.1.2_skip-failing-test_dtensor_ops-subtests.patch':
      '6cf711bf26518550903b09ed4431de9319791e79d61aab065785d6608fd5cc88'},
-    {'PyTorch-2.1.2_skip-memory-leak-test.patch': '8d9841208e8a00a498295018aead380c360cf56e500ef23ca740adb5b36de142'},
     {'PyTorch-2.1.2_skip-test_fsdp_tp_checkpoint_integration.patch':
      '943ee92f5fd518f608a59e43fe426b9bb45d7e7ad0ba04639e516db2d61fa57d'},
     {'PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch':
@@ -156,36 +160,38 @@ checksums = [
 osdependencies = [OS_PKG_IBVERBS_DEV]
 
 builddependencies = [
-    ('CMake', '3.26.3'),
-    ('hypothesis', '6.82.0'),
+    ('CMake', '3.23.1'),
+    ('hypothesis', '6.46.7'),
     # For tests
     ('pytest-flakefinder', '1.1.0'),
-    ('pytest-rerunfailures', '12.0'),
+    ('pytest-rerunfailures', '11.1'),
     ('pytest-shard', '0.1.2'),
+    ('pytest-xdist', '2.5.0'),
+    ('unittest-xml-reporting', '3.1.0'),
 ]
 
 dependencies = [
-    ('CUDA', '12.1.1', '', SYSTEM),
-    ('cuDNN', '8.9.2.26', '-CUDA-%(cudaver)s', SYSTEM),
-    ('magma', '2.7.2', '-CUDA-%(cudaver)s'),
-    ('NCCL', '2.18.3', '-CUDA-%(cudaver)s'),
-    ('Ninja', '1.11.1'),  # Required for JIT compilation of C++ extensions
-    ('Python', '3.11.3'),
-    ('Python-bundle-PyPI', '2023.06'),
-    ('protobuf', '24.0'),
-    ('protobuf-python', '4.24.0'),
-    ('pybind11', '2.11.1'),
-    ('SciPy-bundle', '2023.07'),
+    ('CUDA', '11.7.0', '', SYSTEM),
+    ('cuDNN', '8.4.1.50', '-CUDA-%(cudaver)s', SYSTEM),
+    ('magma', '2.6.2', '-CUDA-%(cudaver)s'),
+    ('NCCL', '2.12.12', '-CUDA-%(cudaver)s'),
+    ('Ninja', '1.10.2'),  # Required for JIT compilation of C++ extensions
+    ('Python', '3.10.4'),
+    ('protobuf', '3.19.4'),
+    ('protobuf-python', '3.19.4'),
+    ('pybind11', '2.9.2'),
+    ('SciPy-bundle', '2022.05'),
+    ('typing-extensions', '4.3.0'),
     ('PyYAML', '6.0'),
-    ('MPFR', '4.2.0'),
+    ('MPFR', '4.1.0'),
     ('GMP', '6.2.1'),
-    ('numactl', '2.0.16'),
-    ('FFmpeg', '6.0'),
-    ('Pillow', '10.0.0'),
-    ('expecttest', '0.1.5'),
-    ('networkx', '3.1'),
-    ('sympy', '1.12'),
-    ('Z3', '4.12.2'),
+    ('numactl', '2.0.14'),
+    ('FFmpeg', '4.4.2'),
+    ('Pillow', '9.1.1'),
+    ('expecttest', '0.1.3'),
+    ('networkx', '2.8.4'),
+    ('sympy', '1.10.1'),
+    ('Z3', '4.10.2', '-Python-%(pyver)s'),
 ]
 
 buildcmd = '%(python)s setup.py build'  # Run the (long) build in the build step
@@ -204,16 +210,6 @@ excluded_tests = {
         # intermittent failures on various systems
         # See https://github.com/easybuilders/easybuild-easyconfigs/issues/17712
         'distributed/rpc/test_tensorpipe_agent',
-        # Broken test, can't ever succeed, see https://github.com/pytorch/pytorch/issues/122184
-        'distributed/tensor/parallel/test_tp_random_state',
-        # failures on OmniPath systems, which don't support some optional InfiniBand features
-        # See https://github.com/pytorch/tensorpipe/issues/413
-        'distributed/pipeline/sync/skip/test_gpipe',
-        'distributed/pipeline/sync/skip/test_leak',
-        'distributed/pipeline/sync/test_bugs',
-        'distributed/pipeline/sync/test_inplace',
-        'distributed/pipeline/sync/test_pipe',
-        'distributed/pipeline/sync/test_transparency',
     ]
 }
 
@@ -221,7 +217,6 @@ runtest = 'cd test && PYTHONUNBUFFERED=1 %(python)s run_test.py --continue-throu
 
 # Especially test_quantization has a few corner cases that are triggered by the random input values,
 # those cannot be easily avoided, see https://github.com/pytorch/pytorch/issues/107030
-# test_nn is also prone to spurious failures: https://github.com/pytorch/pytorch/issues/118294
 # So allow a low number of tests to fail as the tests "usually" succeed
 max_failed_tests = 2

Flamefire · 2024-11-20T05:08:06Z

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8016 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 555.42.06, Python 3.8.17
See https://gist.github.com/Flamefire/50987943cc3b3ba7dacbf1f2c394bba5 for a full test report.

Flamefire · 2024-11-29T20:24:18Z

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8034 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 555.42.06, Python 3.8.17
See https://gist.github.com/Flamefire/f8ed04653024e3b120d45e700864afb2 for a full test report.

Flamefire · 2025-02-27T23:54:54Z

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3633
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i7133 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.8.17
See https://gist.github.com/Flamefire/dc0ce733d08110a763f10156997d8500 for a full test report.

Flamefire · 2025-03-01T20:28:42Z

Test report by @Flamefire
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3633
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8004 - Linux Rocky Linux 8.9 (Green Obsidian), x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 555.42.06, Python 3.8.17
See https://gist.github.com/Flamefire/088d0cd8c841bf84b669e795da911935 for a full test report.

boegel · 2025-10-13T16:40:32Z

No longer relevant since 2022a toolchains are deprecated since foss/2025b was defined, see also https://docs.easybuild.io/policies/toolchains/, so closing...

github-actions bot added the update label Nov 19, 2024

Flamefire force-pushed the 20241017162730_new_pr_PyTorch212 branch from b7661d1 to fd81dea Compare November 19, 2024 15:46

Flamefire mentioned this pull request Dec 2, 2024

{ai}[foss/2023b] PyTorch v2.3.0 #20489

Merged

1 task

Flamefire force-pushed the 20241017162730_new_pr_PyTorch212 branch from e93a71f to 286d453 Compare February 21, 2025 12:45

Flamefire mentioned this pull request Feb 24, 2025

Use unittest XML files to parse PyTorch test results easybuilders/easybuild-easyblocks#3633

Merged

3 tasks

Flamefire added 5 commits March 26, 2025 13:25

adding easyconfigs: PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb

175d798

Remove use_pip

b2bb4fc

Fix test failures on Power9 and systems with 6 GPUs

daabd8e

Fix checksums

90293f7

Add unittest-xml-reporting

fc4ea6a

Flamefire force-pushed the 20241017162730_new_pr_PyTorch212 branch from 044a4ed to fc4ea6a Compare March 26, 2025 12:25

boegel closed this Oct 13, 2025

Flamefire deleted the 20241017162730_new_pr_PyTorch212 branch October 14, 2025 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

{ai}[foss/2022a] PyTorch v2.1.2 w/ CUDA 11.7.0 #21690

{ai}[foss/2022a] PyTorch v2.1.2 w/ CUDA 11.7.0 #21690

Uh oh!

Flamefire commented Oct 17, 2024 •

edited

Loading

Uh oh!

Flamefire commented Oct 17, 2024

Uh oh!

Flamefire commented Oct 18, 2024

Uh oh!

Flamefire commented Oct 26, 2024

Uh oh!

github-actions bot commented Nov 19, 2024 •

edited

Loading

Uh oh!

Flamefire commented Nov 20, 2024

Uh oh!

Flamefire commented Nov 29, 2024

Uh oh!

Flamefire commented Feb 27, 2025

Uh oh!

Flamefire commented Mar 1, 2025

Uh oh!

boegel commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

{ai}[foss/2022a] PyTorch v2.1.2 w/ CUDA 11.7.0 #21690

{ai}[foss/2022a] PyTorch v2.1.2 w/ CUDA 11.7.0 #21690

Uh oh!

Conversation

Flamefire commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Flamefire commented Oct 17, 2024

Uh oh!

Flamefire commented Oct 18, 2024

Uh oh!

Flamefire commented Oct 26, 2024

Uh oh!

github-actions bot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updated software PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb

Uh oh!

Flamefire commented Nov 20, 2024

Uh oh!

Flamefire commented Nov 29, 2024

Uh oh!

Flamefire commented Feb 27, 2025

Uh oh!

Flamefire commented Mar 1, 2025

Uh oh!

boegel commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Flamefire commented Oct 17, 2024 •

edited

Loading

github-actions bot commented Nov 19, 2024 •

edited

Loading

Updated software `PyTorch-2.1.2-foss-2022a-CUDA-11.7.0.eb`