Indrek Kalvet, PhD (Institute for Protein Design, University of Washington), ikalvet@uw.edu
As implemented in the publication "Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom"
Link:
The included notebook pipeline.ipynb
illustrates the design of heme-binding proteins, starting from minimal information (heme + substrate + CYS motif). It should work with minor modifications also for any other ligand.
The pipeline consists of 7 steps:
- The protein backbones are generated with RFdiffusionAA
- Sequence is designed with proteinMPNN (without the ligand)
- Structures are predicted with AlphaFold2
- Ligand binding site is designed with LigandMPNN/FastRelax, or Rosetta FastDesign
- Sequences surrounding the ligand pocket are diversified with LigandMPNN
- Final designed sequences are predicted with AlphaFold2
- Alphafold2-predicted models are relaxed with the ligand and analyzed
To download the LigandMPNN and AlphaFold2 (v2.3.2) repositories referenced in this pipeline run:
git submodule init
git submodule update
To download the model weight files for AlphaFold2 and proteinMPNN run this command:
bash get_af2_and_mpnn_model_params.sh
If you already have downloaded the weights elsewhere on your system then please edit these scripts with appropriate paths:
proteinMPNN: lib/LigandMPNN/mpnn_api.py
[lines 45-49]
AlphaFold2: scripts/af2/AlphaFold2.py
[line 40]
Download RFdiffusionAA from here: https://github.com/baker-laboratory/rf_diffusion_all_atom
and follow its instructions.
Make sure to provide a full path to the checkpoint file in this configuration file:
rf_diffusion_all_atom/config/inference/aa.yaml
(Optional) Download RFjoint Inpainting here: https://github.com/RosettaCommons/RFDesign
Inpainting is used to further resample/diversify diffusion outputs, and it may also increase AF2 success rates.
This pipeline consists of multiple different Python scripts using a multitude of different Python modules - most notably PyTorch, PyRosetta, Jax, Jaxlib, Tensorflow, Prody, OpenBabel. While it may be possible to set up a Python installation or a conda environment that includes all of these modules, it may be quite finicky.
Separate conda environments for AlphaFold2 and RFdiffusionAA/ligandMPNN were used to test this pipeline.
To create a conda environment capable of running RFdiffusionAA and LigandMPNN, set it up as follows:
conda create -n "diffusion" python=3.9
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c conda-forge omegaconf hydra-core=1.3.2 scipy icecream openbabel assertpy opt_einsum pandas pydantic deepdiff e3nn prody pyparsing=3.1.1
conda install dglteam/label/cu118::dgl
conda install pytorch::torchdata
Update as of 08.03.2024: PyRosetta is now freely available to download. You can add it to the above conda environment by running this command:
pip install pyrosetta_installer && python -c 'import pyrosetta_installer; pyrosetta_installer.install_pyrosetta()'
Packages for a minimal conda environment for AlphaFold2:
conda create -n "mlfold" python=3.10
conda install -c conda-forge numpy jax dm-tree dm-haiku tensorflow gcc scipy jaxlib[build=*cuda*]
conda install -c conda-forge mock biopython=1.79 ml-collections
For iterative LigandMPNN and FastRelax, an environment with both pytorch
and pyrosetta
is required.