This package contains deep learning models and related scripts to run RoseTTAFold This repository is the official implementation of RoseTTAFold: Accurate prediction of protein structures and interactions using a 3-track network.
- clone the package
git clone https://github.com/RosettaCommons/RoseTTAFold
cd RoseTTAFold
- create conda environment using
RoseTTAFold-linux.yml
file andfolding-linux.yml
file. The latter required to run pyrosetta version only (run_pyrosetta_ver.sh).
conda env create -f RoseTTAFold-linux.yml
conda env create -f folding-linux.yml
- download network weights (under Rosetta-DL Software license -- please see below) While the code is licensed under the MIT License, the trained weights and data for RoseTTAFold are made available for non-commercial use only under the terms of the Rosetta-DL Software license. You can find details at https://files.ipd.uw.edu/pub/RoseTTAFold/Rosetta-DL_LICENSE.txt
wget https://files.ipd.uw.edu/pub/RoseTTAFold/weights.tar.gz
tar xfz weights.tar.gz
- download and install third-party software if you want to run the entire modeling script (run_pyrosetta_ver.sh)
./install_dependencies.sh
- download sequence and structure databases
# uniref30 [46G]
wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz
mkdir -p UniRef30_2020_06
tar xfz UniRef30_2020_06_hhsuite.tar.gz -C ./UniRef30_2020_06
# BFD [272G]
wget https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz
mkdir -p bfd
tar xfz bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz -C ./bfd
# structure templates [10G]
wget https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2021Mar03.tar.gz
tar xfz pdb100_2021Mar03.tar.gz
Obtain a PyRosetta licence and install the package in the newly created folding
conda environment (link).
cd example
../run_[pyrosetta, e2e]_ver.sh input.fa .
For the pyrosetta version, user will get five final models having estimated CA rms error at the B-factor column (model/model_[1-5].crderr.pdb). For the end-to-end version, there will be a single PDB output having estimated residue-wise CA-lddt at the B-factor column (t000_.e2e.pdb).
- Robetta server (RoseTTAFold option)
The code in the network/performer_pytorch.py is strongly based on this repo which is pytorch implementation of Performer architecture. The codes in network/equivariant_attention is from the original SE(3)-Transformer repo which accompanies the paper 'SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks' by Fabian et al.
M Baek, et al., Accurate prediction of protein structures and interactions using a 3-track network, bioRxiv (2021). link