Skip to content

Conversation

@smoors
Copy link
Contributor

@smoors smoors commented Feb 14, 2022

(created using eb --new-pr)

…tches: TensorFlow-2.7.1_fix_protobuf_error_message.patch, TensorFlow-2.7.1_remove-duplicate-gpu-tests.patch
@smoors smoors added the update label Feb 14, 2022
@SebastianAchilles SebastianAchilles added this to the 4.x milestone Feb 14, 2022
@smoors smoors changed the title {lib}[foss/2021b] TensorFlow v2.7.1 w/ Python 3.9.6 WIP {lib}[foss/2021b] TensorFlow v2.7.1 w/ Python 3.9.6 Feb 14, 2022
@branfosj
Copy link
Member

Test report by @branfosj
FAILED
Build succeeded for 19 out of 20 (3 easyconfigs in total)
bear-pg0103u01a.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 2 x NVIDIA NVIDIA A100-PCIE-40GB, 470.57.02, Python 3.6.8
See https://gist.github.com/b10727cc5df60b9d131f8f4afdf85fd8 for a full test report.

@smoors
Copy link
Contributor Author

smoors commented Feb 18, 2022

this builds and installs fine, but there's a bunch of tests that fail, which are new in 2.7
the failing tests are all in the roundtrip dir, for example:

FAILED: //tensorflow/core/ir/importexport/tests/roundtrip:parse_example.pbtxt.test (Summary)
FAILED: //tensorflow/core/ir/importexport/tests/roundtrip:const-values.pbtxt.test (Summary)
FAILED: //tensorflow/core/ir/importexport/tests/roundtrip:shape-attrs.pbtxt.test (Summary)
FAILED: //tensorflow/core/ir/importexport/tests/roundtrip:test12.pbtxt.test (Summary)

they all fail in the same way with a segfault. if anybody has a clue what could be wrong, that would be greatly appreciated.

<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
  <testsuite name="tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test" tests="1" failures="0" errors="1">
    <testcase name="tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test" status="run" duration="0" time="0"><error message="exited with error code 139"></error></testcase>
      <system-out>
Generated test.log (if the file is not UTF-8, then this may be unreadable):
<![CDATA[exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //tensorflow/core/ir/importexport/tests/roundtrip:parse_example.pbtxt.test
-----------------------------------------------------------------------------
TensorFlow crashed, please file a bug on https://github.com/tensorflow/tensorflow/issues with the trace below.
Stack dump:
0.	Program arguments: /tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt
 #0 0x00002b001ee83460 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/../../../../../../_solib_local/_U_S_Stensorflow_Score_Sir_Simportexport_Stests_Sroundtrip_Cverify-roundtrip___Utensorflow/libtensorflow_framework.so.2+0xe0e460)
 #1 0x00002b001ee810b5 llvm::sys::RunSignalHandlers() (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/../../../../../../_solib_local/_U_S_Stensorflow_Score_Sir_Simportexport_Stests_Sroundtrip_Cverify-roundtrip___Utensorflow/libtensorflow_framework.so.2+0xe0c0b5)
 #2 0x00002b001ee8124c SignalHandler(int) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/../../../../../../_solib_local/_U_S_Stensorflow_Score_Sir_Simportexport_Stests_Sroundtrip_Cverify-roundtrip___Utensorflow/libtensorflow_framework.so.2+0xe0c24c)
 #3 0x00002b001fb82630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 #4 0x00002b00206f8d40 __strncmp_sse42 (/lib64/libc.so.6+0x140d40)
 #5 0x000055d925d55c80 mlir::operator<(std::pair<mlir::Identifier, mlir::Attribute> const&, llvm::StringRef) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2a3c80)
 #6 0x000055d925d5c6ea mlir::DictionaryAttr::getNamed(llvm::StringRef) const (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2aa6ea)
 #7 0x000055d925d5c78d mlir::DictionaryAttr::get(llvm::StringRef) const (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2aa78d)
 #8 0x000055d925cfa250 mlir::tfg::TFGraphDialect::getOperationPrinter(mlir::Operation*) const::'lambda'(mlir::Operation*, mlir::OpAsmPrinter&)::operator()(mlir::Operation*, mlir::OpAsmPrinter&) const (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x248250)
 #9 0x000055d925d515be (anonymous namespace)::OperationPrinter::print(mlir::Operation*) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x29f5be)
#10 0x000055d925d51bc8 (anonymous namespace)::OperationPrinter::print(mlir::Block*, bool, bool) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x29fbc8)
#11 0x000055d925d521be (anonymous namespace)::OperationPrinter::printRegion(mlir::Region&, bool, bool, bool) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2a01be)
#12 0x000055d925cf99a2 mlir::tfg::GraphOp::print(mlir::OpAsmPrinter&) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2479a2)
#13 0x000055d925cf99f6 mlir::Op<mlir::tfg::GraphOp, mlir::OpTrait::OneRegion, mlir::OpTrait::ZeroResult, mlir::OpTrait::ZeroSuccessor, mlir::OpTrait::ZeroOperands, mlir::OpTrait::HasOnlyGraphRegion, mlir::OpTrait::SingleBlock, mlir::OpTrait::IsIsolatedFromAbove, mlir::OpAsmOpInterface::Trait, mlir::OpTrait::NoTerminator, mlir::RegionKindInterface::Trait>::printAssembly(mlir::Operation*, mlir::OpAsmPrinter&, llvm::StringRef) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2479f6)
#14 0x000055d925d516e0 (anonymous namespace)::OperationPrinter::print(mlir::Operation*) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x29f6e0)
#15 0x000055d925d51bc8 (anonymous namespace)::OperationPrinter::print(mlir::Block*, bool, bool) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x29fbc8)
#16 0x000055d925d521be (anonymous namespace)::OperationPrinter::printRegion(mlir::Region&, bool, bool, bool) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2a01be)
#17 0x000055d925d7a1b6 mlir::Op<mlir::ModuleOp, mlir::OpTrait::OneRegion, mlir::OpTrait::ZeroResult, mlir::OpTrait::ZeroSuccessor, mlir::OpTrait::ZeroOperands, mlir::OpTrait::AffineScope, mlir::OpTrait::IsIsolatedFromAbove, mlir::OpTrait::NoRegionArguments, mlir::OpTrait::SymbolTable, mlir::SymbolOpInterface::Trait, mlir::OpAsmOpInterface::Trait, mlir::OpTrait::NoTerminator, mlir::OpTrait::SingleBlock, mlir::RegionKindInterface::Trait, mlir::OpTrait::HasOnlyGraphRegion>::printAssembly(mlir::Operation*, mlir::OpAsmPrinter&, llvm::StringRef) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2c81b6)
#18 0x000055d925d516e0 (anonymous namespace)::OperationPrinter::print(mlir::Operation*) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x29f6e0)
#19 0x000055d925d52461 mlir::Operation::print(llvm::raw_ostream&, mlir::AsmState&, mlir::OpPrintingFlags const&) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2a0461)
#20 0x000055d925d55bcc mlir::Operation::print(llvm::raw_ostream&, mlir::OpPrintingFlags const&) (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x2a3bcc)
#21 0x000055d925b119c7 main (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x5f9c7)
#22 0x00002b00205da555 __libc_start_main (/lib64/libc.so.6+0x22555)
#23 0x000055d925bca59f _start (/tmp/TensorFlow/2.7.1/foss-2021b-CUDA-11.4.1/tmpYhARYl-bazel-tf/dabfb3c0444812f9cbf64fde8263030c/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test.runfiles/org_tensorflow/tensorflow/core/ir/importexport/tests/roundtrip/parse_example.pbtxt.test+0x11859f)]]>
      </system-out>
    </testsuite>
</testsuites>

@boegel
Copy link
Member

boegel commented Feb 21, 2022

Test report by @boegel
FAILED
Build succeeded for 3 out of 4 (3 easyconfigs in total)
node3518.doduo.os - Linux RHEL 8.4, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/7e63d0d07f3f93e4d05b55dd5844a1ab for a full test report.

@boegel
Copy link
Member

boegel commented May 3, 2022

@smoors Any updates on this?

@smoors
Copy link
Contributor Author

smoors commented May 3, 2022

I'll try to update it this week.

@smoors
Copy link
Contributor Author

smoors commented May 9, 2022

there is still 1 test that sometimes fails:

//tensorflow/python/data/kernel_tests:interleave_test                    FAILED in 6 out of 28 in 6.7s

@smoors smoors changed the title WIP {lib}[foss/2021b] TensorFlow v2.7.1 w/ Python 3.9.6 {lib}[foss/2021b] TensorFlow v2.7.1 w/ Python 3.9.6 May 13, 2022
@easybuilders easybuilders deleted a comment from boegelbot May 13, 2022
@branfosj
Copy link
Member

Test report by @branfosj
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
bear-pg0103u01a.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 2 x NVIDIA NVIDIA A100-PCIE-40GB, 470.57.02, Python 3.6.8
See https://gist.github.com/ad59686ecb284ebfee987f9d8d3c90b6 for a full test report.

@smoors
Copy link
Contributor Author

smoors commented May 13, 2022

Test report by @smoors
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node400.hydra.os - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7282 16-Core Processor (zen2), 1 x NVIDIA NVIDIA A100-PCIE-40GB, 470.82.01, Python 2.7.5
See https://gist.github.com/dfc8559dab99da958278b1093ecebe54 for a full test report.

@SebastianAchilles
Copy link
Member

SebastianAchilles commented May 14, 2022

Test report by @SebastianAchilles
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
jsczen2g1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), 1 x NVIDIA GRID V100-4C, 460.73.01, Python 3.6.8
See https://gist.github.com/388c9cd6c0d7ed79b00fc3a8dae8183b for a full test report.

Failed because of: No space left on device. Will try again with --buildpath=/tmp instead of --buildpath=/dev/shm

@SebastianAchilles
Copy link
Member

SebastianAchilles commented May 14, 2022

Test report by @SebastianAchilles
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
jsczen2g1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), 1 x NVIDIA GRID V100-4C, 460.73.01, Python 3.6.8
See https://gist.github.com/b8c12e7aeb779eec4b7268692e997872 for a full test report.

Failed because of: CUDA_ERROR_OUT_OF_MEMORY: out of memory, since the V100-4C has only 4GB device memory

@boegel boegel changed the title {lib}[foss/2021b] TensorFlow v2.7.1 w/ Python 3.9.6 {lib}[foss/2021b] TensorFlow v2.7.1 w/ Python 3.9.6 + CUDA 11.4.1 Jun 8, 2022
@surak
Copy link
Contributor

surak commented Jun 8, 2022

Test report by @surak
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
haicluster1.fz-juelich.de - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 3 x NVIDIA NVIDIA GeForce RTX 3090, 470.103.01, Python 3.8.10
See https://gist.github.com/6008639e3eb475c3c8eb814d577664a9 for a full test report.

@surak
Copy link
Contributor

surak commented Jun 8, 2022

Test report by @surak
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
haicluster1.fz-juelich.de - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 3 x NVIDIA NVIDIA GeForce RTX 3090, 470.103.01, Python 3.8.10
See https://gist.github.com/c5f09fc5a86f13f6c737b17b6ab80673 for a full test report.

@boegel
Copy link
Member

boegel commented Jun 9, 2022

Test report by @boegel
SUCCESS
Build succeeded for 12 out of 12 (3 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.47.03, Python 3.6.8
See https://gist.github.com/7bc1730dbc485b5f2527580396d89278 for a full test report.

@boegel
Copy link
Member

boegel commented Jun 9, 2022

@surak Looks like your failed test reports are due to external issues (lock file, Permission denied)...

@boegel
Copy link
Member

boegel commented Jun 9, 2022

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3502.doduo.os - Linux RHEL 8.4, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/049805e61d2e6ad06ee1a94bec187ac6 for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 15 out of 15 (3 easyconfigs in total)
bear-pg0103u11a.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 2 x NVIDIA NVIDIA A100-PCIE-40GB, 470.57.02, Python 3.6.8
See https://gist.github.com/78bd914e5dba1eb53ed443fca3d6be7f for a full test report.

@surak
Copy link
Contributor

surak commented Jun 13, 2022

Test report by @surak
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
haicluster1.fz-juelich.de - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 3 x NVIDIA NVIDIA GeForce RTX 3090, 470.129.06, Python 3.8.10
See https://gist.github.com/e000856b9e9966c4d6ad431f3f49019a for a full test report.

@surak
Copy link
Contributor

surak commented Jun 13, 2022

Test report by @surak
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
haicluster1.fz-juelich.de - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 3 x NVIDIA NVIDIA GeForce RTX 3090, 470.129.06, Python 3.8.10
See https://gist.github.com/99409c84eea89c91828d693dd5047ad0 for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Jun 13, 2022

Test report by @bedroge
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
pg-gpu07 - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz, 1 x NVIDIA GRID V100D-32Q, 470.103.01, Python 3.6.8
See https://gist.github.com/6a2fb42ecd6bcd675ed5b6e64d9cbff1 for a full test report.

@surak
Copy link
Contributor

surak commented Jun 14, 2022

Test report by @surak
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
haicluster2 - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 2 x NVIDIA NVIDIA GeForce RTX 3090, 470.129.06, Python 3.8.10
See https://gist.github.com/a8ca4efc60838a7f3b0ca1344ab44110 for a full test report.

@surak
Copy link
Contributor

surak commented Jun 14, 2022

Test report by @surak
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
haicluster2 - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 2 x NVIDIA NVIDIA GeForce RTX 3090, 470.129.06, Python 3.8.10
See https://gist.github.com/1c6c3a38acd4db77b6001a3b07a9b9ed for a full test report.

@surak
Copy link
Contributor

surak commented Jun 14, 2022

Test report by @surak
FAILED
Build succeeded for 2 out of 3 (3 easyconfigs in total)
haicluster2 - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 2 x NVIDIA NVIDIA GeForce RTX 3090, 470.129.06, Python 3.8.10
See https://gist.github.com/94f821155cd9658b94c45d99d6e9f564 for a full test report.

@surak
Copy link
Contributor

surak commented Jun 14, 2022

Test report by @surak
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
haicluster2 - Linux Ubuntu 20.04, x86_64, AMD EPYC 7F72 24-Core Processor, 2 x NVIDIA NVIDIA GeForce RTX 3090, 470.129.06, Python 3.8.10
See https://gist.github.com/1718e481e633fd149b0e16bf622cf5e6 for a full test report.

@surak
Copy link
Contributor

surak commented Jun 14, 2022

@surak Looks like your failed test reports are due to external issues (lock file, Permission denied)...

These machines have power issues. With 4 gpus, they kill 1 of them and kernel panics. With 3 gpus, it still kills one (nvidia-smi shows 2), and then it works. Go figure

@boegel
Copy link
Member

boegel commented Jun 22, 2022

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3301.joltik.os - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.73.08, Python 3.6.8
See https://gist.github.com/a51cb24bdb9d58728283029b3416605b for a full test report.

@boegel boegel modified the milestones: 4.x, next release (4.5.6?) Jun 22, 2022
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Jun 22, 2022

Going in, thanks @smoors!

@boegel boegel merged commit 630170a into easybuilders:develop Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants