Skip to content

RFdiffusion-1.1.0-foss-2022a-CUDA-11.7.0.eb with models and missing dependencies #18359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ThomasHoffmann77
Copy link
Contributor

@ThomasHoffmann77 ThomasHoffmann77 commented Jul 20, 2023

@lexming
Copy link
Contributor

lexming commented Jul 23, 2023

Test report by @lexming
FAILED
Build succeeded for 6 out of 10 (9 easyconfigs in total)
node200.hydra.os - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (broadwell), Python 3.6.8
See https://gist.github.com/lexming/92ef10c7ef1ab4a78e49ae586dd4cada for a full test report.

@ThomasHoffmann77
Copy link
Contributor Author

Test report by @lexming FAILED Build succeeded for 6 out of 10 (9 easyconfigs in total) node200.hydra.os - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (broadwell), Python 3.6.8 See https://gist.github.com/lexming/92ef10c7ef1ab4a78e49ae586dd4cada for a full test report.
@lexming I updated the checksums for the DGL easyconfigs.

{'dgl-1.1.1.tar.gz': '076026a7818f2396056252269d7960c561840bb0fe89494e11f8d6c524891c49'},
{'metis-5.1.1-DistDGL-v0.5.tar.gz': 'cedf0b32d32a8496bac7eb078b2b8260fb00ddb8d50c27e4082968a01bc33331'},
{'GKlib-METIS-v5.1.1-DistDGL-0.5.tar.gz': '52aa0d383d42360f4faa0ae9537ba2ca348eeab4db5f2dfd6343192d0ff4b833'},
{'tensorpipe-20230206.tar.gz': '97dcc6ab648d38f62ddc9d2a33de2a789c967e43e819481f817341881912a524'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checksums for git checkouts are not stable, so should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smoors I was not aware of the unstable checksums. Is this the case for any commit or only for recursive checkouts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the case whenever you are not downloading a tarball directly, but creating the tarball yourself, so any git clone

remove unstable checksums for tensorpipe, pcg, and libxsmm
remove unstable checksums for tensorpipe, pcg, and libxsmm
Comment on lines 76 to 77
# uncomment for installing models and schedules as components.
# Then comment out _get_schedules and _get_models in postinstallcmds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between the components and _get_schedules/_get_models?
i would prefer to use the best method and remove the other one to keep things clean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smoors Using components would archive the models and would allow to reinstall RFdiffusion even if the models were not available for download any more. I'd prefer this option since RFdiffusion is not usable without the models.
However, I understand @lexming's concerns about to store the models data (3.9GB) at least twice, in the module directory and the sources directory.
Therefore I kept the components approach commented out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which solution do you prefer and which should I delete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, i agree with you that having to re-download on reinstall is suboptimal, but i also agree with @lexming that storing the models twice is suboptimal.

a possible solution is to create a custom easyblock that checks the installdir and does not delete/re-download the models if they are already present and if their checksums are correct.
to make this configurable, we could define an extra parameter in the easyblock, e.g. keep_models_if_installed.
this is similar to easyblock Tarball, which allows merging an installation into an existing installation instead of wiping it first.

what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For long term it is probably better to place the data into a separate module RFdiffusion_data, which can be reused for minor updates of RFdiffusion, if the data does not change.
It might be good to have a generic easyblock for such data modules, because we can expect more of them with all the upcoming AI packages. (see also #19157 Model-Angelo_data, #19240 relion-classranker_data + relion-blush_data).
There was also a brief discussion at https://github.com/easybuilders/easybuild/wiki/Conference-call-notes-20230719

Copy link
Contributor

@smoors smoors Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's indeed a better solution long-term.
i'll try to give a stab at a generic data easyblock, but feel free to do this yourself if you're up for it.

in the meantime, i suggest to already go ahead with creating a separate easyconfig for RFdiffusion_data. you should be able to use the Bundle easyblock (which does not install anything), and download the models in a postinstallcmds, as you have done here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look on RFdiffusion_data in the next few days. RFdiffusion-1.1.0_data_paths.patch requires an update allowing to read the models and schedules from elsewhere defined in a environment variable.

@easybuilders easybuilders deleted a comment from boegelbot Nov 20, 2023
@easybuilders easybuilders deleted a comment from boegelbot Nov 20, 2023
@easybuilders easybuilders deleted a comment from boegelbot Nov 20, 2023
@easybuilders easybuilders deleted a comment from boegelbot Nov 20, 2023
@easybuilders easybuilders deleted a comment from boegelbot Nov 20, 2023
@pavelToman
Copy link
Collaborator

Hi! I juts start to work on vscentrum/vsc-software-stack#283 pipeline of RFdiffusion - ProteinMPNN -AlphaFold. I'd like to use RFd with data models either. Do you working on that or you stop?

@ThomasHoffmann77
Copy link
Contributor Author

Hi! I juts start to work on vscentrum/vsc-software-stack#283 pipeline of RFdiffusion - ProteinMPNN -AlphaFold. I'd like to use RFd with data models either. Do you working on that or you stop?

@pavelToman I am currently working on AlphaFold/2.3.2-foss-2023a-CUDA-12.1.1. see also #19841

@pavelToman
Copy link
Collaborator


sanity_check_commands = [
"run_inference.py --help",
"run_inference.py 'contigmap.contigs=[150-150]' " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command will create two output directories in %(installdir)s, is there a way to avoid this?

],
}),
]

sanity_pip_check = True
postinstallcmds = [
'cp -rpP config examples %s/%%(namelower)s' % _pysp,
Copy link
Contributor

@WilleBell WilleBell Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'cp -rpP config examples %s/%%(namelower)s' % _pysp,
'cp -rpP config examples helper_scripts %s/%%(namelower)s' % _pysp,
'cp -pP %%(namelower)s/inference/sym_rots.npz %s/%%(namelower)s/inference/' % _pysp,

RFdiffusion needs sym_rots.npz to perform symmetry operations which is needed to use different symmetry settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in #23110

@garadar
Copy link
Contributor

garadar commented Sep 19, 2024

From https://github.com/easybuilders/easybuild-easyconfigs/pull/18359/files#diff-03d4d74a13067c23b1773097b070efb41c27ac715ad10f07b0fd98775ba1a144

    {
        'source_urls': ['https://github.com/KarypisLab/GKlib/archive'],
        'download_filename': 'METIS-v6.1.1-DistDGL-0.5.tar.gz',
        'filename': 'GKlib-METIS-v6.1.1-DistDGL-0.5.tar.gz',
        'extract_cmd': "tar -C %(namelower)s-%(version)s/third_party/METIS/GKlib --strip-components=2 -xf %s",
    },

It seems the source v6.1.1-DistDGL-0.5 is not available : https://github.com/KarypisLab/GKlib/tags

@sassy-crick
Copy link
Collaborator

@ThomasHoffmann77 You might want to sync your PR with development, something like this:

eb --sync-pr-with-develop NUMBER_OF_PR

Also, there are some changes regarding GitHub and stable checksums in EasyBuild-5. Finally, RFdiffusion-1.1.0-foss-2022a-CUDA-11.7.0.eb is merged in EasyBuild-5.1.0 so do we still need this PR?

@jfgrimm
Copy link
Member

jfgrimm commented Jun 16, 2025

I'll close this, since we have already merged RFdiffusion easyconfigs for this version and toolchain
There are some differences -- if you feel that changes to the existing easyconfigs are appropriate, please open a new PR with those changes.

@jfgrimm jfgrimm closed this Jun 16, 2025
@ThomasHoffmann77
Copy link
Contributor Author

I'll close this, since we have already merged RFdiffusion easyconfigs for this version and toolchain There are some differences -- if you feel that changes to the existing easyconfigs are appropriate, please open a new PR with those changes.

I thinks the one already merged still does not take care of the data downloads. That's why we have RFdiffusion-models and -schedules.
See also https://github.com/easybuilders/easybuild/wiki/Sync%E2%80%90meeting%E2%80%90EMBL%E2%80%9020240522#rfdiffusion

It's probably still also missing some required dependencies, e.g., as DGL and Hydra.

@jfgrimm
Copy link
Member

jfgrimm commented Jun 16, 2025

@ThomasHoffmann77 the data stuff can probably use the Dataset easyblock, which is now available (easybuilders/easybuild-easyblocks#3246)
I'd suggest still tackling this in a new PR, and maybe targetting a newer toolchain

@ThomasHoffmann77
Copy link
Contributor Author

@ThomasHoffmann77 the data stuff can probably use the Dataset easyblock, which is now available (easybuilders/easybuild-easyblocks#3246) I'd suggest still tackling this in a new PR, and maybe targetting a newer toolchain

this was already implemented with commit b7ec8ea

I'll open a new pr with foss/2022a again and then one with an update to 2023a

ThomasHoffmann77 added a commit to ThomasHoffmann77/easybuild-easyconfigs that referenced this pull request Jun 16, 2025
@ThomasHoffmann77
Copy link
Contributor Author

reopened as #23110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants