-
Notifications
You must be signed in to change notification settings - Fork 772
{bio}[foss/2021a] AlphaFold v2.2.2 w/ Python 3.9.5 + CUDA 11.3.1 #15129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{bio}[foss/2021a] AlphaFold v2.2.2 w/ Python 3.9.5 + CUDA 11.3.1 #15129
Conversation
|
Test report by @boegel |
|
Test report by @branfosj |
|
| ('HH-suite', '3.3.0'), | ||
| ('HMMER', '3.3.2'), | ||
| ('Kalign', '3.3.1'), | ||
| ('jax', '0.2.24', versionsuffix), # also provides absl-py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis_size was added to jax.vmap in jax-ml/jax@50e7e95 and that is tagged 0.2.26. We are using jax 0.2.24 as a dependency here, so an newer version is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, now using jax 0.3.9
bde5047 to
f243ac0
Compare
|
Test report by @branfosj |
|
Test report by @boegel |
|
@branfosj I'm seeing the same failing test on our A100 system; any ideas there? |
|
Test report by @jfgrimm |
|
Test report by @boegel |
I'm not sure when I'll have a chance to look at this. It is a fairly impressive crash though to stop the tests running altogether! |
|
I've also tried with jax 0.2.26, same hard crash when running the tests (which doesn't occur with jax 0.2.24) |
|
On my side, AlphaFold v2.2.0 passes all tests with jax v0.2.14 and v0.2.24. Since their requirements.txt still lists jax v0.2.14, I think that nobody might be testing AlphaFold with these newer versions of jax. So we might be going too much ahead here. |
It also passed the tests for me. It failed when I tried to run it. |
|
@branfosj this issue might be limited to the reduced DB. I ran the same test with T1050 but with a full DB and it worked fine using AlphaFold 2.2.0 + jax v0.2.14. |
|
Hi! When installing from this PR, I've got the error "Couldn't find script jaxlib_local-tensorflow-repo.sed anywhere". It's the 'jax-0.2.28-foss-2021a-CUDA-11.3.1.eb' submitted to this PR, which gives an error. I use EasyBuild-4.5.4 where this file is definitely present in the local repo. What can I do wrong? |
|
The reduced DB works fine on V100 for that test. |
|
When trying to build jax-0.2.28-foss-2021a-CUDA-11.3.1.eb from this PR on a A100 system i get: So, am I missing something from develop that this one needs (using EB 4.5.4 for building) Trying with develop now... |
|
@akesandgren Several of us have seen the same issue with |
|
Ah, didn't notice that they where during building of jax... |
|
@arkdavy I see the same problem with jaxlib_local-tensorflow-repo.sed missing although it is available in the robot-paths/j/jax dir The problem is that the file is in jax but the component being built is jaxlib so it can't find it. And jaxlib-0.1.70_add-bazel-args-to-shutdown.patch and TensorFlow-2.7.0_cuda-noncanonical-include-paths.patch will have the same problem. |
|
Thanks @akesandgren! I see. I have managed to solve it by downloading it into the workdir.. this may be a problem for new users (as, I am, relatively), who would wish to have a hint... maybe it is worth making a corresponding comment inside the easyconfig? |
|
The PR is wrong, the files must reside in the correct place. |
I think our options here are:
Maybe jax-ml/jax#5713 gives some clues, not sure. |
|
AlphfaFold 2.2.0 actually builds with jax 0.3.9 (and chex 0.1.3) now on to actually running tests (this is on A100 btw) The above --db_preset=reduced_dbs test works ok on an A100 system when built with jax-0.3.9 and chex-0.1.3 |
|
@akesandgren Which easyconfig file did you use for jax 0.3.9 with I'm bumping into an installation failure with the one in #15660... |
|
I have these changes from #15660: ('jaxlib', '0.3.7', {
'sources': [
'%(name)s-v%(version)s.tar.gz',
{
'download_filename': '%s.tar.gz' % local_tf_commit,
'filename': 'tensorflow-%s.tar.gz' % local_tf_commit,
}
],
'source_urls': [
'https://github.com/google/jax/archive/',
'https://github.com/tensorflow/tensorflow/archive/'
],
'patches': [
('jaxlib_local-tensorflow-repo.sed', '.'),
'jaxlib-0.1.70_add-bazel-args-to-shutdown.patch',
('TensorFlow-2.7.0_cuda-noncanonical-include-paths.patch', '../' + local_tf_dir),
],also see my comment above regarding placement of the jaxlib patches, |
|
I totally overlooked #15420... 🤦♂️ |
easybuild/easyconfigs/a/AlphaFold/AlphaFold-2.2.0-foss-2021a-CUDA-11.3.1.eb
Outdated
Show resolved
Hide resolved
|
Test report by @boegel |
verdurin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine.
|
Going in, thanks @boegel! |
(created using
eb --new-pr)