Skip to content

Conversation

@boegel
Copy link
Member

@boegel boegel commented Mar 15, 2022

(created using eb --new-pr)

@boegel boegel added the update label Mar 15, 2022
@boegel boegel changed the title {bio}[foss/2021a] AlphaFold v2.2.0 w/ Python 3.9.5 {bio}[foss/2021a] AlphaFold v2.2.0 w/ Python 3.9.5 + CUDA 11.3.1 Mar 16, 2022
@boegel boegel added this to the next release (4.5.4?) milestone Mar 16, 2022
@boegel
Copy link
Member Author

boegel commented Mar 16, 2022

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3902.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.39.01, Python 3.6.8
See https://gist.github.com/f357fc07560304df98628d7efb82b6b8 for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0103u11a.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 2 x NVIDIA NVIDIA A100-PCIE-40GB, 470.57.02, Python 3.6.8
See https://gist.github.com/ce0008eb9c241742cb7dc6d0f12c3e6b for a full test report.

@branfosj
Copy link
Member

$ alphafold --db_preset=reduced_dbs --fasta_paths=T1050.fasta --max_template_date=2020-05-14 --output_dir=t220
I0319 11:01:44.880872 139725906442048 templates.py:857] Using precomputed obsolete pdbs /rds/bear-apps/apps-data//AlphaFold/20211118/pdb_mmcif/obsolete.dat.                                                                                
I0319 11:01:45.188533 139725906442048 xla_bridge.py:231] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:                                                                             
I0319 11:01:45.424303 139725906442048 xla_bridge.py:231] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.                                                                                                
I0319 11:01:54.301806 139725906442048 alphafold:419] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']                                                                              
I0319 11:01:54.302030 139725906442048 alphafold:436] Using random seed 840525158312012985 for the data pipeline                                                                                                                            
I0319 11:01:54.302284 139725906442048 alphafold:201] Predicting T1050           
I0319 11:01:55.814904 139725906442048 jackhmmer.py:136] Launching subprocess "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/HMMER/3.3.2-gompi-2021a/bin/jackhmmer -o /dev/null -A /tmp/tmpae357_1s/output.sto --noali --F1 0.0005
 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 T1050.fasta /rds/bear-apps/apps-data//AlphaFold/20211118/uniref90/uniref90.fasta"
I0319 11:01:55.902633 139725906442048 utils.py:36] Started Jackhmmer (uniref90.fasta) query                                                                                                                                                
I0319 11:07:30.798996 139725906442048 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 334.896 seconds
I0319 11:07:35.970967 139725906442048 jackhmmer.py:136] Launching subprocess "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/HMMER/3.3.2-gompi-2021a/bin/jackhmmer -o /dev/null -A /tmp/tmpraagz6a_/output.sto --noali --F1 0.0005
 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 T1050.fasta /rds/bear-apps/apps-data//AlphaFold/20211118/mgnify/mgy_clusters_2018_12.fa"
I0319 11:07:36.010791 139725906442048 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query                                                                                                                                       
I0319 11:13:15.454982 139725906442048 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 339.444 seconds
I0319 11:13:34.335129 139725906442048 hhsearch.py:85] Launching subprocess "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/HH-suite/3.3.0-gompi-2021a/bin/hhsearch -i /tmp/tmp3okixd9i/query.a3m -o /tmp/tmp3okixd9i/output.hhr -m
axseq 1000000 -d /rds/bear-apps/apps-data//AlphaFold/20211118/pdb70/pdb70"
I0319 11:13:34.411740 139725906442048 utils.py:36] Started HHsearch query                                                                                                                                                                  
I0319 11:15:03.778825 139725906442048 utils.py:40] Finished HHsearch query in 89.366 seconds
I0319 11:15:09.521845 139725906442048 jackhmmer.py:136] Launching subprocess "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/HMMER/3.3.2-gompi-2021a/bin/jackhmmer -o /dev/null -A /tmp/tmp5oar21bp/output.sto --noali --F1 0.0005
 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 T1050.fasta /rds/bear-apps/apps-data//AlphaFold/20211118/small_bfd/bfd-first_non_consensus_sequences.fasta"
I0319 11:15:09.577493 139725906442048 utils.py:36] Started Jackhmmer (bfd-first_non_consensus_sequences.fasta) query                                                                                                                       
I0319 11:16:38.344569 139725906442048 utils.py:40] Finished Jackhmmer (bfd-first_non_consensus_sequences.fasta) query in 88.767 seconds
I0319 11:16:51.397182 139725906442048 templates.py:878] Searching for template for: MASQSYLFKHLEVSDGLSNNSVNTIYKDRDGFMWFGTTTGLNRYDGYTFKIYQHAENEPGSLPDNYITDIVEMPDGRFWINTARGYVLFDKERDYFITDVTGFMKNLESWGVPEQVFVDREGNTWLSVAGEGCYRYKEGGKRLFFSYTEHSL
PEYGVTQMAECSDGILLIYNTGLLVCLDRATLAIKWQSDEIKKYIPGGKTIELSLFVDRDNCIWAYSLMGIWAYDCGTKSWRTDLTGIWSSRPDVIIHAVAQDIEGRIWVGKDYDGIDVLEKETGKVTSLVAHDDNGRSLPHNTIYDLYADRDGVMWVGTYKKGVSYYSESIFKFNMYEWGDITCIEQADEDRLWLGTNDHGILLWNRSTGKAEPFWRDAEGQLPNPVVSMLKSKD
GKLWVGTFNGGLYCMNGSQVRSYKEGTGNALASNNVWALVEDDKGRIWIASLGGGLQCLEPLSGTFETYTSNNSALLENNVTSLCWVDDNTLFFGTASQGVGTMDMRTREIKKIQGQSDSMKLSNDAVNHVYKDSRGLVWIATREGLNVYDTRRHMFLDLFPVVEAKGNFIAAITEDQERNMWVSTSRKVIRVTVASDGKGSYLFDSRAYNSEDGLQNCDFNQRSIKTLHNGIIAI
GGLYGVNIFAPDHIRYNKMLPNVMFTGLSLFDEAVKVGQSYGGRVLIEKELNDVENVEFDYKQNIFSVSFASDNYNLPEKTQYMYKLEGFNNDWLTLPVGVHNVTFTNLAPGKYVLRVKAINSDGYVGIKEATLGIVVNPPFKLAAALQHHHHHH
I0319 11:16:53.528060 139725906442048 templates.py:267] Found an exact template match 4a2m_B.                                                                                                                                              
I0319 11:16:56.459409 139725906442048 templates.py:267] Found an exact template match 4a2l_F.
I0319 11:16:58.601003 139725906442048 templates.py:267] Found an exact template match 3v9f_B.                                                                                                                                              
I0319 11:16:59.535832 139725906442048 templates.py:267] Found an exact template match 3va6_A.
I0319 11:17:00.889435 139725906442048 templates.py:267] Found an exact template match 3ott_B.                                                                                                                                              
I0319 11:17:01.421702 139725906442048 templates.py:267] Found an exact template match 5m11_A.
I0319 11:17:01.449406 139725906442048 templates.py:267] Found an exact template match 4a2m_B.                                                                                                                                              
I0319 11:17:01.478878 139725906442048 templates.py:267] Found an exact template match 4a2l_F.
I0319 11:17:01.507219 139725906442048 templates.py:267] Found an exact template match 4a2m_B.                                                                                                                                              
I0319 11:17:01.535158 139725906442048 templates.py:267] Found an exact template match 4a2l_F.
I0319 11:17:01.563768 139725906442048 templates.py:267] Found an exact template match 5m11_A.                                                                                                                                              
I0319 11:17:01.590504 139725906442048 templates.py:267] Found an exact template match 3v9f_B.
I0319 11:17:01.619200 139725906442048 templates.py:267] Found an exact template match 3ott_B.                                                                                                                                              
I0319 11:17:01.647203 139725906442048 templates.py:267] Found an exact template match 3va6_A.
I0319 11:17:01.675859 139725906442048 templates.py:267] Found an exact template match 3ott_B.                                                                                                                                              
I0319 11:17:01.703635 139725906442048 templates.py:267] Found an exact template match 3va6_A.
I0319 11:17:01.732017 139725906442048 templates.py:267] Found an exact template match 5m11_A.                                                                                  
I0319 11:17:01.758488 139725906442048 templates.py:267] Found an exact template match 4a2m_B.
I0319 11:17:01.786634 139725906442048 templates.py:267] Found an exact template match 4a2l_F.                                                                                           
I0319 11:17:01.814535 139725906442048 templates.py:267] Found an exact template match 3v9f_B.
I0319 11:17:06.567965 139725906442048 pipeline.py:234] Uniref90 MSA size: 10000 sequences.                                                                                                                                                 
I0319 11:17:06.568319 139725906442048 pipeline.py:235] BFD MSA size: 29562 sequences.
I0319 11:17:06.568364 139725906442048 pipeline.py:236] MGnify MSA size: 501 sequences.                                                                                                                                                     
I0319 11:17:06.568406 139725906442048 pipeline.py:237] Final (deduplicated) MSA size: 39072 sequences.
I0319 11:17:06.568780 139725906442048 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.                                                                                  
I0319 11:17:06.930505 139725906442048 alphafold:230] Running model model_1_pred_0 on T1050                                              
I0319 11:17:20.366679 139725906442048 model.py:165] Running predict with shape(feat) = {'aatype': (4, 779), 'residue_index': (4, 779), 'seq_length': (4,), 'template_aatype': (4, 4, 779), 'template_all_atom_masks': (4, 4, 779, 37), 'temp
late_all_atom_positions': (4, 4, 779, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 779), 'msa_mask': (4, 508, 779), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4,
4), 'template_pseudo_beta': (4, 4, 779, 3), 'template_pseudo_beta_mask': (4, 4, 779), 'atom14_atom_exists': (4, 779, 14), 'residx_atom14_to_atom37': (4, 779, 14), 'residx_atom37_to_atom14': (4, 779, 37), 'atom37_atom_exists': (4, 779, 3
7), 'extra_msa': (4, 5120, 779), 'extra_msa_mask': (4, 5120, 779), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 779), 'true_msa': (4, 508, 779), 'extra_has_deletion': (4, 5120, 779), 'extra_deletion_value': (4, 5120, 779), 'ms
a_feat': (4, 508, 779, 49), 'target_feat': (4, 779, 22)}                                                                                                                                                                                   
Traceback (most recent call last):                                              
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/bin/alphafold", line 459, in <module>                                                                                               
    app.run(main)                                                                                                                            
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/absl/app.py", line 312, in run                                                                              
    _run_main(main, args)                                                                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main                                                                         
    sys.exit(main(argv))                                                                                                                            
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/bin/alphafold", line 441, in main                                                                                                  
    predict_structure(                                                                                                  
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/bin/alphafold", line 238, in predict_structure                                                                                      
    prediction_result = model_runner.predict(processed_feature_dict,      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/model.py", line 167, in predict                                                        
    result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback                                    
    return fun(*args, **kwargs)                                                                                                                                        
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/api.py", line 416, in cache_miss                                                                   
    out_flat = xla.xla_call(                                                                                                           
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/core.py", line 1632, in bind                                                                             
    return call_bind(self, fun, *args, **params)                                                                                                                                                                                            
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/core.py", line 1623, in call_bind                                                                        
    outs = primitive.process(top_trace, fun, tracers, params)                                                                                              
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/core.py", line 1635, in process                                                                         
    return trace.process_call(self, fun, tracers, params)                                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/core.py", line 627, in process_call                                                                     
    return primitive.impl(f, *tracers, **params)                                             
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/xla.py", line 581, in _xla_call_impl                                                       
    compiled_fun = _xla_callable(fun, device, backend, name, donated_invars,                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/linear_util.py", line 263, in memoized_fun                                                              
    ans = call(fun, *args)                                                                   
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/xla.py", line 653, in _xla_callable_uncached                                               
    return lower_xla_callable(fun, device, backend, name, donated_invars,                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/xla.py", line 665, in lower_xla_callable                                                   
    jaxpr, out_avals, consts = pe.trace_to_jaxpr_final(                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/partial_eval.py", line 1542, in trace_to_jaxpr_final                                       
    jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(fun, main, in_avals)                
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/partial_eval.py", line 1520, in trace_to_subjaxpr_dynamic                                  
    ans = fun.call_wrapped(*in_tracers)                                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/linear_util.py", line 166, in call_wrapped  
    ans = self.f(*args, **dict(self.params, **kwargs))                                       
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/transform.py", line 127, in apply_fn     
    out, state = f.apply(params, {}, *args, **kwargs)
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/transform.py", line 400, in apply_fn                                                        
    out = f(*args, **kwargs)                                                         
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/model.py", line 83, in _forward_fn                                                     
    return model(                                                                                     
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped                                                            
    out = f(*args, **kwargs)                                                                                                            
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors                                                    
    return bound_method(*args, **kwargs)                                                                                                                                                                                                   
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 377, in __call__                                                      
    _, prev = hk.while_loop(                                                                                                                                                                                                                
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 743, in while_loop                                                       
    val, state = jax.lax.while_loop(pure_cond_fun, pure_body_fun, init_val)     
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback                                    
    return fun(*args, **kwargs)                                                                                                              
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/lax/control_flow.py", line 305, in while_loop                                                      
    init_vals, init_avals, body_jaxpr, in_tree, *rest = _create_jaxpr(init_val)                                
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/lax/control_flow.py", line 288, in _create_jaxpr                                                    
    body_jaxpr, body_consts, body_tree = _initial_style_jaxpr(                                                                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/util.py", line 187, in wrapper                                                                     
    return cached(config._trace_context(), *args, **kwargs)                                                             
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/util.py", line 180, in cached                                                                       
    return f(*args, **kwargs)                                             
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/lax/control_flow.py", line 77, in _initial_style_jaxpr                                             
    jaxpr, consts, out_tree = _initial_style_open_jaxpr(                                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/util.py", line 187, in wrapper                                                                      
    return cached(config._trace_context(), *args, **kwargs)                                                                                                            
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/util.py", line 180, in cached                                                                      
    return f(*args, **kwargs)                                                                                                          
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/lax/control_flow.py", line 71, in _initial_style_open_jaxpr                                         
    jaxpr, _, consts = pe.trace_to_jaxpr_dynamic(wrapped_fun, in_avals, debug)                                                                                                                                                              
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/partial_eval.py", line 1510, in trace_to_jaxpr_dynamic                                      
    jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(fun, main, in_avals)                                                                              
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/partial_eval.py", line 1520, in trace_to_subjaxpr_dynamic                                  
    ans = fun.call_wrapped(*in_tracers)                                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/linear_util.py", line 166, in call_wrapped                                                              
    ans = self.f(*args, **dict(self.params, **kwargs))                                       
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 738, in pure_body_fun                                                    
    val = body_fun(val)                                                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 370, in <lambda>                                                     
    get_prev(do_call(x[1], recycle_idx=x[0],                                                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 337, in do_call                                                      
    return impl(                                                                             
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped                                                            
    out = f(*args, **kwargs)                                                                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors                                                   
    return bound_method(*args, **kwargs)                                                     
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 161, in __call__                                                     
    representations = evoformer_module(batch0, is_training)                                  
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped
    out = f(*args, **kwargs)                                                                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors
    return bound_method(*args, **kwargs) 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 1777, in __call__                                                    
    template_pair_representation = TemplateEmbedding(c.template, gc)(                
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped                                                            
    out = f(*args, **kwargs)                                                                          
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors                                                   
    return bound_method(*args, **kwargs)                                                                                                
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 2072, in __call__                                                     
    template_pair_representation = mapping.sharded_map(map_fn, in_axes=0)(                                                                                                                                                                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/mapping.py", line 145, in mapped_fn                                                     
    remainder_shape_dtype = hk.eval_shape(                                                                                                                                                                                                  
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 811, in eval_shape                                                       
    out_shape = jax.eval_shape(stateless_fun, internal_state(), *args, **kwargs)
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/_src/api.py", line 2797, in eval_shape                                                                   
    out = pe.abstract_eval_fun(wrapped_fun.call_wrapped,                                                                                     
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/partial_eval.py", line 418, in abstract_eval_fun                                           
    _, avals_out, _ = trace_to_jaxpr_dynamic(                                                                  
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/partial_eval.py", line 1510, in trace_to_jaxpr_dynamic                                      
    jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(fun, main, in_avals)                                                                       
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/interpreters/partial_eval.py", line 1520, in trace_to_subjaxpr_dynamic                                  
    ans = fun.call_wrapped(*in_tracers)                                                                                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/linear_util.py", line 166, in call_wrapped                                                               
    ans = self.f(*args, **dict(self.params, **kwargs))                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/jax/linear_util.py", line 166, in call_wrapped                                                              
    ans = self.f(*args, **dict(self.params, **kwargs))                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 807, in stateless_fun                                                     
    out = fun(*args, **kwargs)                                                                                                                                         
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/mapping.py", line 143, in apply_fun_to_slice                                           
    return fun(*input_slice)                                                                                                           
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 680, in mapped_fun                                                        
    mapped_pure_fun = jax.vmap(pure_fun, in_axes=in_axes, out_axes=out_axes,                                                                                                                                                                
jax._src.traceback_util.UnfilteredStackTrace: TypeError: vmap() got an unexpected keyword argument 'axis_size'                                                                                                                              
                                                                                                                                                           
The stack trace below excludes JAX-internal frames.                                                                                                                                                                                        
The preceding is the original exception that occurred, unmodified.                           
                                                                                                                                                                                                                                           
--------------------                                                                         
                                                                                                                                                                                                                                           
The above exception was the direct cause of the following exception:                         
                                                                                                                                                                                                                                           
Traceback (most recent call last):                                                           
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/bin/alphafold", line 459, in <module>                                                                                              
    app.run(main)                                                                            
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/absl/app.py", line 312, in run                                                                              
    _run_main(main, args)                                                                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/jax/0.2.24-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main                                                                        
    sys.exit(main(argv))                                                                     
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/bin/alphafold", line 441, in main                                                                                                  
    predict_structure(                                                                       
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/bin/alphafold", line 238, in predict_structure                         
    prediction_result = model_runner.predict(processed_feature_dict,                         
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/model.py", line 167, in predict     
    result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/transform.py", line 127, in apply_fn                                                        
    out, state = f.apply(params, {}, *args, **kwargs)                                
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/transform.py", line 400, in apply_fn                                                        
    out = f(*args, **kwargs)                                                                          
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/model.py", line 83, in _forward_fn                                                     
    return model(                                                                                                                       
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped                                                             
    out = f(*args, **kwargs)                                                                                                                                                                                                               
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors                                                    
    return bound_method(*args, **kwargs)                                                                                                                                                                                                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 377, in __call__                                                     
    _, prev = hk.while_loop(                                                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 743, in while_loop                                                        
    val, state = jax.lax.while_loop(pure_cond_fun, pure_body_fun, init_val)                                                                  
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 738, in pure_body_fun                                                    
    val = body_fun(val)                                                                                        
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 370, in <lambda>                                                      
    get_prev(do_call(x[1], recycle_idx=x[0],                                                                                                        
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 337, in do_call                                                      
    return impl(                                                                                                        
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped                                                             
    out = f(*args, **kwargs)                                              
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors                                                   
    return bound_method(*args, **kwargs)                                                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 161, in __call__                                                      
    representations = evoformer_module(batch0, is_training)                                                                                                            
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped                                                            
    out = f(*args, **kwargs)                                                                                                           
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors                                                    
    return bound_method(*args, **kwargs)                                                                                                                                                                                                    
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 1777, in __call__                                                     
    template_pair_representation = TemplateEmbedding(c.template, gc)(                                                                                      
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 433, in wrapped                                                            
    out = f(*args, **kwargs)                                                                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/module.py", line 284, in run_interceptors                                                   
    return bound_method(*args, **kwargs)                                                     
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/modules.py", line 2072, in __call__                                                    
    template_pair_representation = mapping.sharded_map(map_fn, in_axes=0)(                   
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/mapping.py", line 145, in mapped_fn                                                    
    remainder_shape_dtype = hk.eval_shape(                                                   
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 811, in eval_shape                                                       
    out_shape = jax.eval_shape(stateless_fun, internal_state(), *args, **kwargs)             
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 807, in stateless_fun                                                    
    out = fun(*args, **kwargs)                                                               
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/alphafold/model/mapping.py", line 143, in apply_fun_to_slice                                           
    return fun(*input_slice)                                                                 
  File "/rds/bear-apps/devel/eb-sjb-up/EL8/EL8-ice-a100/software/AlphaFold/2.2.0-foss-2021a-CUDA-11.3.1/lib/python3.9/site-packages/haiku/_src/stateful.py", line 680, in mapped_fun                                                       
    mapped_pure_fun = jax.vmap(pure_fun, in_axes=in_axes, out_axes=out_axes,                 
TypeError: vmap() got an unexpected keyword argument 'axis_size'

('HH-suite', '3.3.0'),
('HMMER', '3.3.2'),
('Kalign', '3.3.1'),
('jax', '0.2.24', versionsuffix), # also provides absl-py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis_size was added to jax.vmap in jax-ml/jax@50e7e95 and that is tagged 0.2.26. We are using jax 0.2.24 as a dependency here, so an newer version is necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, now using jax 0.3.9

@boegel boegel force-pushed the 20220315190641_new_pr_AlphaFold220 branch from bde5047 to f243ac0 Compare March 23, 2022 16:28
@branfosj
Copy link
Member

Test report by @branfosj
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
bear-pg0103u01a.bear.cluster - Linux RHEL 8.5, x86_64, Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz (icelake), 1 x NVIDIA NVIDIA A100-PCIE-40GB, 470.57.02, Python 3.6.8
See https://gist.github.com/56909026069076629c049fbc5d153a19 for a full test report.

@boegel
Copy link
Member Author

boegel commented Mar 23, 2022

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3303.joltik.os - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.47.03, Python 3.6.8
See https://gist.github.com/2d3b4dfacb763f33d0d324a43932cf05 for a full test report.

@boegel
Copy link
Member Author

boegel commented Mar 23, 2022

@branfosj I'm seeing the same failing test on our A100 system; any ideas there?

@jfgrimm
Copy link
Member

jfgrimm commented Mar 24, 2022

Test report by @jfgrimm
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
gpu02.pri.viking.alces.network - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz (skylake_avx512), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.47.03, Python 3.6.8
See https://gist.github.com/bbd7c84f4a547d1d93ca9e43a03fd372 for a full test report.

@boegel
Copy link
Member Author

boegel commented Mar 25, 2022

Test report by @boegel
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
node3902.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.47.03, Python 3.6.8
See https://gist.github.com/9053737ad3244fcb82e2a0966e5bddc0 for a full test report.

@branfosj
Copy link
Member

@branfosj I'm seeing the same failing test on our A100 system; any ideas there?

I'm not sure when I'll have a chance to look at this. It is a fairly impressive crash though to stop the tests running altogether!

@boegel
Copy link
Member Author

boegel commented Mar 30, 2022

I've also tried with jax 0.2.26, same hard crash when running the tests (which doesn't occur with jax 0.2.24)

@lexming
Copy link
Contributor

lexming commented Apr 7, 2022

On my side, AlphaFold v2.2.0 passes all tests with jax v0.2.14 and v0.2.24. Since their requirements.txt still lists jax v0.2.14, I think that nobody might be testing AlphaFold with these newer versions of jax. So we might be going too much ahead here.

@branfosj
Copy link
Member

branfosj commented Apr 7, 2022

On my side, AlphaFold v2.2.0 passes all tests with jax v0.2.14 and v0.2.24. Since their requirements.txt still lists jax v0.2.14, I think that nobody might be testing AlphaFold with these newer versions of jax. So we might be going too much ahead here.

It also passed the tests for me. It failed when I tried to run it.

@lexming
Copy link
Contributor

lexming commented Apr 9, 2022

@branfosj this issue might be limited to the reduced DB. I ran the same test with T1050 but with a full DB and it worked fine using AlphaFold 2.2.0 + jax v0.2.14.

@arkdavy
Copy link

arkdavy commented Apr 13, 2022

Hi! When installing from this PR, I've got the error "Couldn't find script jaxlib_local-tensorflow-repo.sed anywhere". It's the 'jax-0.2.28-foss-2021a-CUDA-11.3.1.eb' submitted to this PR, which gives an error. I use EasyBuild-4.5.4 where this file is definitely present in the local repo. What can I do wrong?

@akesandgren
Copy link
Contributor

The reduced DB works fine on V100 for that test.

@akesandgren
Copy link
Contributor

When trying to build jax-0.2.28-foss-2021a-CUDA-11.3.1.eb from this PR on a A100 system i get:

================================================== test session starts ==================================================
platform linux -- Python 3.9.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28, configfil
e: pytest.ini
plugins: forked-1.3.0, xdist-2.3.0
collected 16035 items

tests/ann_test.py ........................................                                                        [  0%]
tests/api_test.py ...................................s.............................s.s........................... [  0%]
........................................................................ss..........s............................ [  1%]
................................................................................................................. [  2%]
..........s.........s....................................................................................Fatal Python err
or: Aborted

Thread 0x0000149cb1242740 (most recent call first):
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/_src/di
spatch.py", line 537 in backend_compile
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/_src/pr
ofiler.py", line 206 in wrapper
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/_src/di
spatch.py", line 582 in compile_or_get_cached
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/_src/di
spatch.py", line 613 in from_xla_computation
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/_src/dispatch.py", line 528 in compile
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/_src/dispatch.py", line 169 in _xla_callable_uncached
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/linear_util.py", line 272 in memoized_fun
  File "/mimer/NOBACKUP/groups/c3-staff/akesa/eb-builds/jax/0.2.28/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.28/jax/_src/dispatch.py", line 142 in _xla_call_impl
...

So, am I missing something from develop that this one needs (using EB 4.5.4 for building)

Trying with develop now...

@branfosj
Copy link
Member

branfosj commented May 3, 2022

@akesandgren Several of us have seen the same issue with jax-0.2.28-foss-2021a-CUDA-11.3.1.eb on A100 - see #15129 (comment) and #15129 (comment).

@akesandgren
Copy link
Contributor

Ah, didn't notice that they where during building of jax...
So, do we have a plan forward here?
We have users that want this version installed. We don't have A100's here so I could technically do it, but I'd prefer if we have a working solultion that is identical for all GPU's

@akesandgren
Copy link
Contributor

akesandgren commented May 3, 2022

@arkdavy I see the same problem with jaxlib_local-tensorflow-repo.sed missing although it is available in the robot-paths/j/jax dir

The problem is that the file is in jax but the component being built is jaxlib so it can't find it.

And jaxlib-0.1.70_add-bazel-args-to-shutdown.patch and TensorFlow-2.7.0_cuda-noncanonical-include-paths.patch will have the same problem.

@arkdavy
Copy link

arkdavy commented May 3, 2022

Thanks @akesandgren! I see. I have managed to solve it by downloading it into the workdir.. this may be a problem for new users (as, I am, relatively), who would wish to have a hint... maybe it is worth making a corresponding comment inside the easyconfig?

@akesandgren
Copy link
Contributor

The PR is wrong, the files must reside in the correct place.

@boegel
Copy link
Member Author

boegel commented May 3, 2022

Ah, didn't notice that they where during building of jax...
So, do we have a plan forward here?

I think our options here are:

  1. Ignore the hard crash while running the jax test suite, and get this PR merged.
    We should probably open an issue upstream to see what the hell is going on here.

  2. We figure out what is going on with the crash, and try to come up with a fix for it.

Maybe jax-ml/jax#5713 gives some clues, not sure.

@akesandgren
Copy link
Contributor

akesandgren commented May 6, 2022

AlphfaFold 2.2.0 actually builds with jax 0.3.9 (and chex 0.1.3) now on to actually running tests (this is on A100 btw)

The above --db_preset=reduced_dbs test works ok on an A100 system when built with jax-0.3.9 and chex-0.1.3
There is a minor runtime complaint about "FutureWarning: jax.tree_util.tree_multimap() is deprecated. Please use jax.tree_util.tree_map() instead as a drop-in replacement." but considering the amount of output from AlphaFold that's really minor.

@boegel
Copy link
Member Author

boegel commented Jun 10, 2022

@akesandgren Which easyconfig file did you use for jax 0.3.9 with foss/2021a + CUDA 11.3.1?

I'm bumping into an installation failure with the one in #15660...

@akesandgren
Copy link
Contributor

akesandgren commented Jun 11, 2022

I have these changes from #15660:

('jaxlib', '0.3.7', {
        'sources': [
            '%(name)s-v%(version)s.tar.gz',
            {
                'download_filename': '%s.tar.gz' % local_tf_commit,
                'filename': 'tensorflow-%s.tar.gz' % local_tf_commit,
            }
        ],
        'source_urls': [
            'https://github.com/google/jax/archive/',
            'https://github.com/tensorflow/tensorflow/archive/'
        ],
        'patches': [
            ('jaxlib_local-tensorflow-repo.sed', '.'),
            'jaxlib-0.1.70_add-bazel-args-to-shutdown.patch',
            ('TensorFlow-2.7.0_cuda-noncanonical-include-paths.patch', '../' + local_tf_dir),
        ],

also see my comment above regarding placement of the jaxlib patches,
and for the jax runtest:

 'runtest': "NVIDIA_TF32_OVERRIDE=0 CUDA_VISIBLE_DEVICES=0 XLA_PYTHON_CLIENT_ALLOCATOR=platform "
                   "JAX_ENABLE_X64=true pytest tests",

And as per Simons comment in #15660, we used #15420

@boegel
Copy link
Member Author

boegel commented Jun 11, 2022

I totally overlooked #15420... 🤦‍♂️

@boegel boegel changed the title {bio}[foss/2021a] AlphaFold v2.2.0 w/ Python 3.9.5 + CUDA 11.3.1 {bio}[foss/2021a] AlphaFold v2.2.2 w/ Python 3.9.5 + CUDA 11.3.1 Jun 22, 2022
@boegel
Copy link
Member Author

boegel commented Jun 22, 2022

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3907.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.73.08, Python 3.6.8
See https://gist.github.com/96a9a14d4fe9117927cc593947f58d33 for a full test report.

@boegel boegel dismissed branfosj’s stale review June 22, 2022 14:24

fixed, now using jax 0.3.9

Copy link
Member

@verdurin verdurin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.

@verdurin
Copy link
Member

Going in, thanks @boegel!

@verdurin verdurin merged commit 541a828 into easybuilders:develop Jun 22, 2022
@boegel boegel deleted the 20220315190641_new_pr_AlphaFold220 branch June 22, 2022 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants