Anti-CRISPR (Acr) proteins serve as the defence mechanism of bacteriophages to counter the bacterial/archaeal adaptive immunity, called CRISPR-Cas systems (Jansen et al. 2002; Mojica et al. 2005; Bondy-Denomy et al. 2013). They are natural protein therapeutics that could be used for future drug design.
Structural and functional analysis of these Acr proteins is essential for them to be used for drug design or in any other capacity. Currently, advanced equipments such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy are used to visualize the structure of proteins. These methods are highly accurate, but they can be very time consuming and expensive. To overcome these disadvantages, prediction of protein structures via machine learning has gained a lot of attention as of late. The Critical Assessment of protein Structure Prediction (CASP), for example, is a competition for predicting protein 3-D structures from their sequences. In the 14th CASP contest, AlphaFold (Jumper et al. 2021; Callaway 2020) from Google Deepmind achieved highely accurate molecular localization of a variety of proteins and won the contest. AlphaFold can predict protein structures in a matter of hours, which would normally take months in a wet lab.
In our study, we present strategies for drug design using Acr proteins as well as a new family of Acr proteins, structurally similar to those that show enzymatic activities. For this, we make use of the aforementioned AlphaFold to predict the 3-D structures of Acr proteins and perform structural and sequence analysis using the results.
Through this github repository, we share the predicted 3-D structures and sequences of Acr proteins used in our study. We do this for researchers conducting related research, to encourage them to cross-verify the structures predicted by AlphaFold, and to help minimize their time and effort in similar studies.
All Acr protein sequences were provided by anti-CRISPRdb and they have been divided into three sets: Set A, B and C. Set A contains proteins that have been verified as being Acr proteins via both the aforementioned database and other literature. Set B contains proteins that have been verified as being Acr proteins via the database but not through the literature. Lastly, Set C contains proteins that has not been verified yet as being Acr proteins but their 3-D structures have been discovered.
- The input protein sequences required for prediction via AlphaFold are stored here.
2. pdb_files
Alphafold_pdbs_from_protein_sequences
: The predicted 3-D structures from the input protein sequences are saved here.Ground_truth_pdb_files
: The ground truth 3-D structures corresponding to the predicted 3-D structures are stored here.
- The information about which
Ground_truth_pdb
matchesAlphafold_pdbs_from_protein_sequences
is saved here.
- Screenshots are saved for every 3-D structure.
Super_imposed
: Screenshots of (overlapped) predictions and ground truth is saved here.
Readers may use the following information to cite our research and dataset.
Park, H. M. et al. (2021). Rethinking protein drug design with highly accurate structure prediction of anti-CRISPR proteins. bioRxiv. https://doi.org/https://doi.org/10.1101/2021.11.28.470242
@article {Park2021.11.28.470242,
author = {Park, Ho-min and Park, Yunseol and Vankerschaver, Joris and Van Messem, Arnout and De Neve, Wesley and Shim, Hyunjin},
title = {Rethinking protein drug design with highly accurate structure prediction of anti-CRISPR proteins},
elocation-id = {2021.11.28.470242},
year = {2021},
doi = {10.1101/2021.11.28.470242},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
}
Please feel free to contact us using 'issues' or the following email address: homin.park@ghent.ac.kr
Contributors:
The research and development activities described in this paper were funded by Ghent University Global Campus (GUGC).