De novo heme binding protein design pipeline using RFdiffusionAA

image

And other ligand binders too, I guess

Indrek Kalvet, PhD (Institute for Protein Design, University of Washington), ikalvet@uw.edu

As implemented in the publication "Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom"

Link:

The included notebook pipeline.ipynb illustrates the design of heme-binding proteins, starting from minimal information (heme + substrate + CYS motif). It should work with minor modifications also for any other ligand.

The pipeline consists of 7 steps:

  1. The protein backbones are generated with RFdiffusionAA
  2. Sequence is designed with proteinMPNN (without the ligand)
  3. Structures are predicted with AlphaFold2
  4. Ligand binding site is designed with LigandMPNN/FastRelax, or Rosetta FastDesign
  5. Sequences surrounding the ligand pocket are diversified with LigandMPNN
  6. Final designed sequences are predicted with AlphaFold2
  7. Alphafold2-predicted models are relaxed with the ligand and analyzed

Installation

Dependencies

LigandMPNN and AlphaFold2

To download the LigandMPNN and AlphaFold2 (v2.3.2) repositories referenced in this pipeline run:

git submodule init
git submodule update

To download the model weight files for AlphaFold2 and proteinMPNN run this command:
bash get_af2_and_mpnn_model_params.sh

If you already have downloaded the weights elsewhere on your system then please edit these scripts with appropriate paths:
proteinMPNN: lib/LigandMPNN/mpnn_api.py [lines 45-49]
AlphaFold2: scripts/af2/AlphaFold2.py [line 40]

RFdiffusionAA:

Download RFdiffusionAA from here: https://github.com/baker-laboratory/rf_diffusion_all_atom
and follow its instructions.
Make sure to provide a full path to the checkpoint file in this configuration file:
rf_diffusion_all_atom/config/inference/aa.yaml

RFjoint inpainting (proteininpainting)

(Optional) Download RFjoint Inpainting here: https://github.com/RosettaCommons/RFDesign

Inpainting is used to further resample/diversify diffusion outputs, and it may also increase AF2 success rates.

Python or Apptainer image

This pipeline consists of multiple different Python scripts using a multitude of different Python modules - most notably PyTorch, PyRosetta, Jax, Jaxlib, Tensorflow, Prody, OpenBabel. While it may be possible to set up a Python installation or a conda environment that includes all of these modules, it may be quite finicky.
Separate conda environments for AlphaFold2 and RFdiffusionAA/ligandMPNN were used to test this pipeline.

To create a conda environment capable of running RFdiffusionAA and LigandMPNN, set it up as follows:

conda create -n "diffusion" python=3.9
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c conda-forge omegaconf hydra-core=1.3.2 scipy icecream openbabel assertpy opt_einsum pandas pydantic deepdiff e3nn prody pyparsing=3.1.1
conda install dglteam/label/cu118::dgl
conda install pytorch::torchdata

Update as of 08.03.2024: PyRosetta is now freely available to download. You can add it to the above conda environment by running this command:
pip install pyrosetta_installer && python -c 'import pyrosetta_installer; pyrosetta_installer.install_pyrosetta()'

Packages for a minimal conda environment for AlphaFold2:

conda create -n "mlfold" python=3.10
conda install -c conda-forge numpy jax dm-tree dm-haiku tensorflow gcc scipy jaxlib[build=*cuda*]
conda install -c conda-forge mock biopython=1.79 ml-collections

For iterative LigandMPNN and FastRelax, an environment with both pytorch and pyrosetta is required.