-
Notifications
You must be signed in to change notification settings - Fork 772
{ai}[foss/2022b] PyTorch v2.1.2 w/ CUDA 12.0.0 #20155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{ai}[foss/2022b] PyTorch v2.1.2 w/ CUDA 12.0.0 #20155
Conversation
easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022b-CUDA-12.0.0.eb
Outdated
Show resolved
Hide resolved
|
Test report by @casparvl |
|
Here and in #20156 |
|
Hm, the log contains a lot, it's a bit hard to read, but I think this is the relevant part: Error log |
|
Error in Error log: |
|
Yep that is a known issue: Reinstall your pybind11 with the latest EC |
|
Great, will do! Sorry, there are so many fixes that I often can't keep up and don't always rebuild stuff XD I'll send a new test report after the pybind11 rebuild. |
Yeah I know that is annoying, but we can't do much better than updating the existing EC(s) for such major bugs. It came up recently with someone else too so I remembered it. Side note: This is actually a good reason to run the PyTorch test suite and investigate errors: Our pybind11 version isn't (wasn't) compatible with this PyTorch version which would make it less usable as this error is likely to pop up in user code using this module. |
|
Ok, I rebuild |
|
Test report sassy-crick: |
|
Test report by @casparvl |
|
Failures are the same for
The only new one was a failure in |
|
@boegelbot please test @ generoso |
|
@casparvl: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2016793002 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
@boegelbot please test @ jsc-zen3 |
|
@casparvl: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2020037044 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2-foss-2022b-CUDA-12.0.0.eb
Outdated
Show resolved
Hide resolved
|
Test report by @casparvl |
You are missing the patches from #19666 which are in develop |
|
Ah, let me sync your branch with develop - I'm assuming you won't mind... :) |
|
Ok, rebuild started succesfully now. Test reporting should be there somewhere tonight. I'll trigger one more rebuild on one of the test clusters for good measure. Should be good to go afterwards... |
|
@boegelbot please test @ jsc-zen3 |
|
@casparvl: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2032212011 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
|
Test report by @casparvl |
5564421 to
8bb0d57
Compare
|
Test report by @akesandgren |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
Updated software
|
|
Test report by @Flamefire |
Updated software
|
|
Test report by @Flamefire |
Updated software
|
3 similar comments
Updated software
|
Updated software
|
Updated software
|
Updated software
|
ead558a to
7f8e2b5
Compare
Updated software
|
…es: PyTorch-2.1.2_add-cuda-skip-markers.patch, PyTorch-2.1.2_fix-conj-mismatch-test-failures.patch, PyTorch-2.1.2_fix-device-mesh-check.patch, PyTorch-2.1.2_fix-locale-issue-in-nvrtcCompileProgram.patch, PyTorch-2.1.2_fix-test_extension_backend-without-vectorization.patch, PyTorch-2.1.2_fix-test_memory_profiler.patch, PyTorch-2.1.2_fix-test_torchinductor-rounding.patch, PyTorch-2.1.2_fix-vsx-vector-abs.patch, PyTorch-2.1.2_fix-vsx-vector-div.patch, PyTorch-2.1.2_fix-with_temp_dir-decorator.patch, PyTorch-2.1.2_fix-wrong-device-mesh-size-in-tests.patch, PyTorch-2.1.2_relax-cuda-tolerances.patch, PyTorch-2.1.2_remove-nccl-backend-default-without-gpus.patch, PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch, PyTorch-2.1.2_skip-failing-test_dtensor_ops-subtests.patch, PyTorch-2.1.2_skip-test_fsdp_tp_checkpoint_integration.patch, PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch
7f8e2b5 to
8ba9179
Compare
Updated software
|
|
Test report by @akesandgren |
|
@akesandgren Hm, multiple segfaults. Maybe try #20520 on the same machine which uses another NCCL version |
|
Superseded by #20520 which uses the correct NCCL for PyTorch |
(created using
eb --new-pr)Requires: