google-deepmind/alphafold

Proper release and build instructions ?

mboisson opened this issue ยท 11 comments

Hi,
This code is generating a lot of interest in the advanced research computing community. Unfortunately, it currently can not be supported on most clusters because of its use of docker, and its lack of release versions.

Any plan to address these would be appreciated:

  1. Make proper release versions (I recommend Semantic versionning: https://semver.org/lang)
  2. Provide instructions to install the code without the use of containers or anaconda (I suggest using Autoconf or CMake as a build tool)
  3. If 2) is impossible or too hard, providing a singularity container, rather than docker, would already be a good start.

I second that. The current lack of a release version and instructions how to install it without the use of containers is holding us back too.

You can follow the docker file as if it was regular script, can't you? I am more concerned with the following

RUN git clone --branch v3.3.0 https://github.com/soedinglab/hh-suite.git /tmp/hh-suite \
    && mkdir /tmp/hh-suite/build

Which pulls in what ever is at v3.3.0 branch. I would suggest copying that into current repository instead.

Actually, what should be done, instead of a git clone, is to download the release:
https://github.com/soedinglab/hh-suite/archive/refs/tags/v3.3.0.tar.gz
and compare it against a known checksum

The same apply to this current repository. We never want to clone whatever is in git, we want to download specific release which we can validate that they don't change.

Hi,

We have created a small How-To for non-docker setup of AlphaFold, hope it helps.

Can be found here: https://github.com/kalininalab/alphafold_non_docker

Furthermore, kalin2 is using version 2.04 (from https://msa.sbc.su.se/cgi-bin/msa.cgi?mode=downloads), However, there is a updated and improved version which is maintained on the GitHub page (https://github.com/TimoLassmann/kalign). The maintainer of kalin2 recommended to use the upgraded and improved version from the GitHub page.
Is there a reason for still using version 2.04?

@Dragas Although the git option is --branch, v3.3.0 is actually a tag. So the clone is reproducable. Using git clone has the advantage of automatically checksumming, so @mboisson's suggestion seems redundant.

If you're concerned about alphafold's version (1), they have a v2.0.0 tag in place. However there have been a lot of changes since then so I guess most people will want to be using master.

FYI for anyone else looking to run alphafold on an HPC with only singularity, this is how I'm doing it.

The Dockerfile is built on a computer with docker, or use one already on dockerhub such as https://hub.docker.com/r/catgumag/alphafold:2.1.1

 git clone https://github.com/deepmind/alphafold.git
 docker build -f alphafold/docker/Dockerfile -t alphafold .
 docker tag alphafold:latest yourusername/alphafold:latest
 docker push yourusername/alphafold:latest

Then on an HPC with singularity

export SINGULARITY_CACHEDIR=$SCRATCH/singularity_cache
singularity pull docker://yourusername/alphafold:latest

A minimal singularity run command follows https://www.rc.virginia.edu/userinfo/rivanna/software/alphafold/ for the mounts, and follows run_docker.py for --env to prevent memory errors on long proteins (also added OPENMM_CPU_THREADS=8 per @sittr's comment below)

singularity run --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0,OPENMM_CPU_THREADS=8 -B /scratch/gpfs/cmcwhite/alphafold_data:/data -B .:/etc --pwd /app/alphafold --nv /path/to/alphafold_latest.sif \
--fasta_paths /full/path/to/fasta \
--output_dir  /full/path/to/output_alphafold/ \
--data_dir /data/ \
--uniref90_database_path /data/uniref90/uniref90.fasta \
--mgnify_database_path /data/mgnify/mgy_clusters_2018_12.fa \
--small_bfd_database_path /data/small_bfd/bfd-first_non_consensus_sequences.fasta \
--pdb70_database_path /data/pdb70/pdb70 \
--template_mmcif_dir /data/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path /data/pdb_mmcif/obsolete.dat \
--max_template_date=2021-07-28 \
--model_names model_1,model_2,model_3,model_4,model_5 \
--preset reduced_dbs

@Dragas Although the git option is --branch, v3.3.0 is actually a tag. So the clone is reproducable. Using git clone has the advantage of automatically checksumming, so @mboisson's suggestion seems redundant.

If you're concerned about alphafold's version (1), they have a v2.0.0 tag in place. However there have been a lot of changes since then so I guess most people will want to be using master.

if there is a tag, then downloading the archive of the tag (git produces an archive for every tag) would make more sense. Many HPC centers will store the archived sources locally anyway. Locally-produced archives don't have stable checksums (because it depends on the creation time of the files in the archive), whereas archives produced by Github do have stable checksums.

Basically, we only ever do a git clone in cases when the releases on github aren't actually proper releases, usually because the project took the ill-advised route of using git submodules to vendor its dependencies, and a recursive clone is necessary to get the proper release.

Our setup procedure here at UVA - from Dockerfile to Singularity to Slurm - is fully documented in the link that @clairemcwhite provided. (Thank you for citing us!) Our users have been able to run jobs on our GPU nodes successfully. The most common error encountered here is insufficient CPU memory, in which case increasing the memory request via the --mem sbatch directive usually works.

sittr commented

Hello,
we're using a similar procedure as outlined above to run AlphaFold on an HPC cluster through Singularity.
One addition I'd suggest, especially if the cluster is running in non-exclusive mode (i.e. multiple jobs running concurrently on a single compute node), is adding
OPENMM_CPU_THREADS=8
to Singularity's --env (see openmm/openmm#242 ).
We chose setting the amount of threads to 8 since the jackhmmer step is also set to --cpu=8 inside the container, so reserving a minimum of 8 cores is necessary anyway.

The reason is that amber_minimize.py will try to start as many threads as the the total amount of CPU cores detected on the node if this parameter is not set, which can lead to CPU oversubscription issues (our compute nodes have 64 cores, so without setting OPENMM_CPU_THREADS=8 we have 64 threads competing for 8 CPU cores).

Unfortunately we don't have the capacity to maintain and support Singularity. We added a link in the README to this issue and issue #24 that have links to third party repositories with Singularity definition files. I am going to close this issue as it is directly accessible from the README.