Skip to content

Conversation

@terjekv
Copy link
Collaborator

@terjekv terjekv commented Oct 20, 2020

(created using eb --new-pr)

Requires #11543

…-GCCcore-9.3.0.eb, hwloc-1.11.12-GCCcore-9.3.0.eb, JsonCpp-1.9.3-GCCcore-9.3.0.eb, nsync-1.24.0-GCCcore-9.3.0.eb, protobuf-python-3.10.0-foss-2020a-Python-3.8.2.eb, TensorFlow-2.3.1-foss-2020a-Python-3.8.2.eb
@terjekv terjekv added the update label Oct 20, 2020
@boegelbot

This comment has been minimized.

@boegel boegel changed the title {devel,lib}[GCCcore/9.3.0] flatbuffers v1.12.0, giflib v5.2.1, hwloc v1.11.12, ... w/ Python 3.8.2 {lib}[foss/2020a] TensorFlow v2.3.1 w/ Python 3.8.2 Oct 20, 2020
# Dependencies created and updated using findPythonDeps.sh:
# https://gist.github.com/Flamefire/49426e502cd8983757bd01a08a10ae0d
exts_list = [
('Markdown', '3.2.2', {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terjekv These should be revisited, some may already included with Python, some should probably be updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I did: eb --dump-env on this EC, then source it and run the linked script. That should give you a nice list of packages and versions to use.

@boegel boegel added this to the next release (4.3.1) milestone Oct 20, 2020
@boegelbot

This comment has been minimized.

@smoors
Copy link
Contributor

smoors commented Oct 21, 2020

@terjekv you've added hwloc-1.11.12, but you're not actually using it in the TF easyconfig, is that on purpose?

@terjekv
Copy link
Collaborator Author

terjekv commented Oct 21, 2020

@terjekv you've added hwloc-1.11.12, but you're not actually using it in the TF easyconfig, is that on purpose?

Bah, thanks. I moved the PR to use hwloc-2.2.0 and forgot to remove the old one. Fixed now. Thanks again!

@terjekv terjekv closed this Oct 21, 2020
@terjekv terjekv reopened this Oct 21, 2020
@boegelbot

This comment has been minimized.

@boegelbot

This comment has been minimized.

@smoors
Copy link
Contributor

smoors commented Oct 21, 2020

Test report by @smoors
FAILED
Build succeeded for 8 out of 9 (7 easyconfigs in total)
node316.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (skylake_avx512), Python 2.7.5
See https://gist.github.com/5c62e0788eaef18cd669e164123f4a41 for a full test report.

@terjekv
Copy link
Collaborator Author

terjekv commented Oct 21, 2020

Test report by @smoors
FAILED
Build succeeded for 8 out of 9 (7 easyconfigs in total)
node316.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (skylake_avx512), Python 2.7.5
See https://gist.github.com/5c62e0788eaef18cd669e164123f4a41 for a full test report.

Nice catch.

== 2020-10-21 22:02:54,945 modules.py:809 DEBUG Output of module command '/usr/share/lmod/lmod/libexec/lmod python load protobuf-python/3.13.0-foss-2020a-Python-3.8.2': stdout: _mlstatus = False
; stderr: Lmod has detected the following error: A different version of the 'protobuf'
module is already loaded (see output of 'ml').
You should load another 'protobuf-python' module that is compatible with the
currently loaded version of 'protobuf'.
Use 'ml spider protobuf-python' to get an overview of the available versions.


If you don't understand the warning or error, contact the helpdesk at
hpc@[...] 
While processing the following module(s):
    Module fullname                                 Module Filename
    ---------------                                 ---------------
    protobuf-python/3.13.0-foss-2020a-Python-3.8.2  /tmp/vsc10009/ebinstall/11546/modules/all/protobuf-python/3.13.0-foss-2020a-Python-3.8.2.lua

Should be resolved in 8d8dbd6.

@smoors
Copy link
Contributor

smoors commented Oct 22, 2020

Test report by @smoors
SUCCESS
Build succeeded for 9 out of 9 (7 easyconfigs in total)
node379.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (skylake_avx512), Python 2.7.5
See https://gist.github.com/29d7b1b37bda1b46722b1d00242cb385 for a full test report.

smoors
smoors previously approved these changes Oct 22, 2020
Copy link
Contributor

@smoors smoors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Oct 22, 2020

@boegelbot please test @ generoso

boegel
boegel previously approved these changes Oct 22, 2020
@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11546 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11546 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8193

Test results coming soon (I hope)...

Details

- notification for comment with ID 714332997 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 6 out of 7 (7 easyconfigs in total)
generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/d89ee0ab1145f1a84c7037dd39f4f685 for a full test report.

@boegel
Copy link
Member

boegel commented Oct 22, 2020

Test report by @boegel
FAILED
Build succeeded for 6 out of 7 (7 easyconfigs in total)
node3401.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5
See https://gist.github.com/497b3d5cecd3f1c8d47bf8838108353e for a full test report.

@boegel
Copy link
Member

boegel commented Oct 22, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11546 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11546 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8194

Test results coming soon (I hope)...

Details

- notification for comment with ID 714379585 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@terjekv
Copy link
Collaborator Author

terjekv commented Oct 22, 2020

Test report by @terjekv
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
arm2 - Linux ubuntu 18.04, AArch64, UNKNOWN, Python 3.6.9
See https://gist.github.com/9bc903a2fd3009f7a6e122dcaf0d364b for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
FAILED
Build succeeded for 0 out of 1 (7 easyconfigs in total)
bear-pg0305u03a.bear.cluster - Linux RHEL 7.6, POWER, 8335-GTX (power9le), Python 3.6.8
See https://gist.github.com/0cd2496e94ab194e7178f8d02283d0bc for a full test report.

@boegel
Copy link
Member

boegel commented Oct 22, 2020

@branfosj Hmm...

ImportError: Traceback (most recent call last):
  File "/dev/shm/build-branfosj-admin/branfosj-admin-up/TensorFlow/2.3.1/foss-2020a-Python-3.8.2/tmpja4zujxf-bazel-tf/output_base/execroot/org_tensorflow/bazel-out/ppc-opt/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/pywrap_tensorflow.py", line 64, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: /dev/shm/build-branfosj-admin/branfosj-admin-up/TensorFlow/2.3.1/foss-2020a-Python-3.8.2/tmpja4zujxf-bazel-tf/output_base/execroot/org_tensorflow/bazel-out/ppc-opt/bin/tensorflow/python/keras/api/create_tensorflow.python_api_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZTVN6icu_669ErrorCodeE


Failed to load the native TensorFlow runtime.

=> _pywrap_tensorflow_internal.so: undefined symbol: _ZTVN6icu_669ErrorCodeE

@Flamefire Any ideas?

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/6dc163411807d3b1bcb2ffa2079247b2 for a full test report.

@boegel
Copy link
Member

boegel commented Oct 22, 2020

Test report by @boegel
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
node3401.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5
See https://gist.github.com/7487cc7d0525cd386ffead6249728030 for a full test report.

@boegel
Copy link
Member

boegel commented Oct 22, 2020

Test report by @boegel
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
node3107.skitty.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/cce41122794e01e6155fefd78097083d for a full test report.

@boegel
Copy link
Member

boegel commented Oct 22, 2020

Test report by @boegel
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
node2426.golett.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5
See https://gist.github.com/921c3d311ed4b215b18a2cb624389b66 for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
bear-pg0306u19a.bear.cluster - Linux RHEL 8.2, POWER, 8335-GTX (power9le), Python 3.6.8
See https://gist.github.com/046f4c8bc5eeaed54e9f7cfa6c534e6c for a full test report.

@ocaisa
Copy link
Member

ocaisa commented Oct 22, 2020

@branfosj That last report looks like it's coming from the same system as the previous one that showed the error, do you know what fixed it? Just --rebuild?

@branfosj
Copy link
Member

@ocaisa The reports are from different systems. The failed build (bear-pg0305u03a) is on RHEL 7 - with the dependencies built over the last few months. The successful build (bear-pg0306u19a) is on RHEL 8 - with all the dependencies built today. I'm currently trying selective rebuild of the dependencies to see if I can get it to build.

@ocaisa
Copy link
Member

ocaisa commented Oct 22, 2020

Ok, @surak is seeing a similar error on our systems for centos7 I think

@branfosj
Copy link
Member

Ok. In order I've:

  1. rebuilt pybind11-2.4.3-GCCcore-9.3.0-Python-3.8.2.eb
  2. switched to a direct login (i.e. escaped the cgroup)
  3. rebuilt ICU-66.1-GCCcore-9.3.0.eb

With those TensorFlow builds for me on the system where it was failing before. I've set off a test report.

I cannot go back and check, but it may just need ICU rebuilding. My guess for rebuilding ICU was hoping that the icu in the missing symbol _ZTVN6icu_669ErrorCodeE was useful information.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 7 out of 7 (7 easyconfigs in total)
bear-pg0305u03a.bear.cluster - Linux RHEL 7.6, POWER, 8335-GTX (power9le), Python 3.6.8
See https://gist.github.com/ccb832d1244ee87ecb7cef51269e8983 for a full test report.

Copy link
Contributor

@smoors smoors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@smoors
Copy link
Contributor

smoors commented Oct 23, 2020

Going in, thanks @terjekv!

@smoors smoors merged commit d6da762 into easybuilders:develop Oct 23, 2020
@branfosj
Copy link
Member

I cannot go back and check, but it may just need ICU rebuilding. My guess for rebuilding ICU was hoping that the icu in the missing symbol _ZTVN6icu_669ErrorCodeE was useful information.

This was fixed by easybuilders/easybuild-framework#3401

@casparvl
Copy link
Contributor

casparvl commented Dec 4, 2020

Just to add for future reference: I hit exactly the same missing symbol _ZTVN6icu_669ErrorCodeE that @branfosj hit. That ICU was built with EasyBuild 4.2.1. Rebuilding it with EasyBuild 4.3.1 was sufficient to resolve this error, as @branfosj also anticipated, i.e. his steps 1 and 2 listed in his post above were not needed to resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants