-
Notifications
You must be signed in to change notification settings - Fork 773
{lib} [foss/2023a] Bazel v6.1.0, grpcio v1.57.0, TensorFlow v2.15.1 #20191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib} [foss/2023a] Bazel v6.1.0, grpcio v1.57.0, TensorFlow v2.15.1 #20191
Conversation
|
I attempted to get TF-2.15.1 building in EB. For imcompatibilities I was trying to go back to TF-pinned versions when it happens; not sure whether I'm doing it the right way, so I attached a bit detailed rationale/background info. :> Bazel 6.1.0 dependence (with patch from 6.3.1)There seems to be issue with bazel linking Notebly I also switched compiler to Protobuf 23.4 & Protobuf-python 4.23.4I was missing a bunch of bazel rules using systemlib protobuf, on top of that I am not sure if the issue with EB+TF-2.13 would persist (tf#61593); and protobuf-24.0 is explicitly excluded since TF-2.15 due to known bug (protobuf#13485). So I went back to 23.4. But since there's a patch for the bug (easyconfig#19671), maybe we can do without this version? TensorFlow 2.15.1 patchesThose give me an initial working build:
Havn't probed everything with the Misc notesSome are perhaps better suited for easyblock changes, but for now I am monkey patching in the config:
|
|
Test report by @yqshao |
easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.15.1-foss-2023a.eb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, good to mention that these are deps of grpcio
|
@Flamefire I'm really missing some expertise here... Any chance you can have a look at this PR and give your opinion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about using Clang here, even if that is what upstream TensorFlow is now doing. In principle the toolchain defines the compiler. I.e. using foss means we are using GCC as compiler. We do have Clang toolchains (https://docs.easybuild.io/version-specific/toolchains/) in EasyBuild, but they are not widely used and you'd miss a lot of the dependencies needed for TensorFlow...
Was the only reason to switch to Clang that it was a tested build configuration, as per https://www.tensorflow.org/install/source#tested_build_configurations ? I'd say for us that's not per se a reason to use that compiler. But maybe you had other reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was the only reason; thanks for the explanation, I'll work on a GCC build.
easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.15.1-foss-2023a.eb
Outdated
Show resolved
Hide resolved
|
Wow! Hats up for the courage to battle the new TF version. Haven't dared to look at it yet. ;-)
Possibly. Worth a try at least as another protobuf in the toolchain might cause headaches later on.
Awesome -.- Not only that, they also seem to have killed more of the "System" python packages we were able to specify, but not bothered to remove them from the "allow list". Might be related to the hermetic Python thing below. Not sure what it really does.
That sounds like more
The last message to this was "I'd want to install ml_dtypes from source, instead of wheel, before I'd make a PR." from Simon.
Why? At least for reference it would be good to know. They also seem to support(?) a few more: https://github.com/grpc/grpc/blob/c2758e87fcc8146ea9b7ed5999f863f2f2106df6/setup.py#L142-L168 |
See #20119 for |
|
Thanks for comments! I got the following to work on then:
Also:
|
Check reusing the tensorboard EC which IIRC includes grpc, although it doesn't build with system libs (yet, we should improve there)
The former doesn't do much/anything IIRC. It is just a list of possible "system" dependencies in TF which is read by the easyblock and compared against the result of its |
Co-authored-by: Alexander Grund <[email protected]>
Co-authored-by: Alexander Grund <[email protected]>
…gfbf-2023a-CUDA-12.1.1.eb typo, zlib dependency order, foss->gfbf
…bf-2023a.eb foss->gfbf
|
Some small progress...
|
Might make sense especially given how picky/difficult it is... Generally I think if we use something non-trivial in 2 ECs it is worth adding it as an extra EC
IIRC I also never managed to get TF to work with our Abseil. Always issues over issues... I might have some actual issues open in the TF repo about that.
Maybe. But check the install folder of TF for any references to e.g
I'm not sure what you exactly want to patch here. What is the intended change? I.e. what is "one can have both from EasyBuild"? |
The main problem is that including our Abseil easyconfig (even if just intended for the grpcio extension) as a dependency for TensorFlow easyconfig, it now conflicts with the (different?) absl that TF's bazel pulls in. By breaking out grpcio into it's own easyconfig which only uses Abseil as a builddep, one can avoid having our Abseil visible in CPATH/LIBRARY_PATH when building TF. |
|
Sorry, was distracted by holidy etc.... back now
No hit with a naive grep, hopefully it's OK...
Yes, that is what I was thinking
I was looking at things like this... I reckon that this has to do with change of bazel rules updated in the third party dep., which do not get a corresponding
Maybe that's how it is? But it would be a bit tedius and ad-hoc. I wonder if there is some good magic to generate the "would be needed" systemlib rules from a bazel query or something... |
This comment was marked as outdated.
This comment was marked as outdated.
|
Not much progress with bazel rules, maybe I can proceed without |
|
Ugh, disk quota exceeded. I'll clean up and try again :) |
|
Test report by @boegelbot |
|
Test report by @casparvl |
|
@boegelbot please test @ generoso |
|
@casparvl: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2146216179 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Edit by @casparvl : this build failed because of an existing lock file. I restarted a build since I figured the first never came back - but it just seems to have taken very long (see next test report). |
|
Test report by @boegelbot |
Thanks, I took the merged version and removed the jax files.
Right, we are relying on Abseil being statically linked and its ABI being versioned, but since Abseil is still a dependency this is not optimal. To be more specific about the difficulty, I was stopped by how the Bazel requries access to build rules of the dependencies (that changes between versions) without providing updated "system" counterparts. I didn't find a good strategy to track this problem.
|
|
Test report by @yqshao |
|
@boegelbot please test @ generoso |
|
I'm not sure why your |
|
@casparvl: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2147136409 processed Message to humans: this is just bookkeeping information for me, |
|
|
|
Test report by @casparvl |
|
Test report by @yqshao |
|
@boegelbot please test @ jsc-zen3 |
|
@casparvl: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 2147526608 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @casparvl |
|
Test report by @boegelbot |
|
Test report by @boegelbot |
casparvl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, time to merge :)
|
Going in, thanks @yqshao! |
(created using
eb --new-pr)depends on:
{tools}[gfbf/2023a] jax v0.4.25 w/ CUDA 12.1.1 #20119