/Proteus

Pytorch implementation for ICML 2024 paper Proteus: Exploring Protein Structure Generation for Enhanced Designability and Efficiency.

Primary LanguagePythonMIT LicenseMIT

Proteus

PyTorch Implementation for Proteus: Exploring Protein Structure Generation for Enhanced Designability and Efficiency.

Overview

Proteus is a novel deep diffusion network designed to generate protein backbones with enhanced designability and efficiency. Unlike RFDiffusion which relies on large pre-trained network RosettaFold for structure prediction, Proteus utilizes graph-based triangle methods and a multi-track interaction network, achieving state-of-the-art performance without the need for pre-training. Notably, the inference speed has been accelerated from 4x up to 10x compared to FrameDiff and RFdiffusion. Our model's capabilities have been validated through comprehensive in silico evaluations and experimental characterizations, demonstrating its potential to significantly advance the field of protein design.

image

Table of Contents

Install

We recommend miniconda (or anaconda). Run the following to install a conda environment with the necessary dependencies. Using mamba if possible for better install speed.

# install
conda env create -f se3.yml
# optional : using mamba for faster environment installation
conda install mamba
mamba env create -f se3.yml

# activate environment
conda activate Proteus

# install this repo as a local package
pip install -e .

Inference

The checkpoint is avaiable at ./weights/paper_weights.pt

monomer inference(command used in paper)

For the first time run, it might be a little slow because of downloading esmfold ckpt

weight_path=./weights/paper_weights.pt
python ./experiments/inference_se3_diffusion.py \
inference.output_dir=inference_outputs/monomer/ \
inference.weights_path=$weight_path \
inference.diffusion.samples.samples_lengths=[100,200,300,400,600,800] \
inference.diffusion.samples.samples_per_length=100 \
inference.diffusion.num_t=100

# config below is optional

# To disable esmfold prediction and mpnn design, add extra config
inference.mpnn.enable=False inference.esmfold.enable=False

# To disable esmfold prediction add extra config
inference.esmfold.enable=False

A self_consistency.csv will be generated in the inference_outputs/monomer/${timestap}/self_consistency.csv, report all necessary metrics like dssp or sc-rmsd, etc.

oligomer inference

baseline_weight_path=./weights/paper_weights.pt
python ./experiments/inference_se3_diffusion.py \
inference.output_dir=inference_outputs/oligomer/ \
inference.weights_path=$baseline_weight_path \
inference.diffusion.samples.contigs='60-80//60-80' \
inference.diffusion.samples.samples_per_length=100 \
inference.diffusion.num_t=100

Inference output wuold be like

inference_outputs
└── 12D_02M_2023Y_20h_46m_13s           # Date time of inference.
    ├── mpnn.fasta                      # mpnn designed seuences.
    ├── self_consistency.csv            # self consistency analysis, contains rmsd and tmscore between scaffold ans esmfold, mpnn score of sequence, scaffold path, esmf path etc.
    ├── diffusion                       # dir contains scaffold generated by proteus
    │    ├── 100_1_sample.pdb          
    │    ├── 100_2_sample.pdb           # {length}_{sample_id}_sample.pdb
    |    └── ...
    ├── trajctory                       # dir contains traj pdb, exists when inference.diffusion.option.save_trajactory=True
    │    ├── 100_1_bb_traj.pdb          
    │    ├── 100_2_bb_traj.pdb          # {length}_{sample_id}_traj.pdb
    |    └── ...
    ├── movie                           # dir contains full atom protein designed by mpnn, exists when inference.diffusion.option.plot.switch_on=True
    │    ├── 100_1_rigid_movie.gif      # movie of protein rigid at time t    
    │    ├── 100_1_rigid_0_movie.gif    # movie of predict protein rigid at time 0 from time t  
    |    └── ...
    ├── mpnn                            # dir exists when pyrosetta in installed and inference.mpnn.dump=True
    │    ├── 100_0_sample_mpnn_0.pdb      
    │    ├── 100_0_sample_mpnn_1.pdb    # {length}_{sample_id}_sample_mpnn_{sequence_id}.pdb
    |    └── ... 
    └── esmf                            # dir contians esmf predict strcture
         ├── 100_0_sample_esmf_0.pdb     
         ├── 100_0_sample_esmf_0.pdb     # {length}_{sample_id}_sample_esmf_{sequence_id}.pdb
         └── ... 

Code Structure

The local triangle attention is implemented below:

class LocalTriangleAttentionNew(nn.Module):

License

LICENSE: MIT

Citation

If you use our work then please cite

@article{wang2024proteus,
  title={Proteus: exploring protein structure generation for enhanced designability and efficiency},
  author={Wang, Chentong and Qu, Yannan and Peng, Zhangzhi and Wang, Yukai and Zhu, Hongli and Chen, Dachuan and Cao, Longxing},
  journal={bioRxiv},
  pages={2024--02},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

Appreciation

Proteus is built upon the following codebases, please give them a star if you enjoy Proteus :)