Improved protein model quality assessment by integrating sequential and pairwise features using deep learning
This package is the source code of ResNetQA, which is a single-model-based QA method for both local and global quality assessment.
- python >=3.6
- pytorch >=1.1.0
- Biopython (with DSSP installed) >=1.76
- scipy >=1.4.1
- numpy >=1.18.1
git clone git@github.com:AndersJing/ResNetQA.git
Following is the list of features that shall be prepared in advance, the sequence features are the same as that used by RaptorX-Contact, more detailed information can be found at https://github.com/j3xugit/RaptorX-Contact. The distance potential is predicted by RaptorX-Contact.
- sequence: primary sequence of the target.
- PSSM: Position-Specific Scoring Matrix represented as a L*20 matrix.
- SS3: predicted secondary structure confidence score represented as a L*3 matrix.
- ACC: predicted solvent accessibility score represented as a L*3 matrix.
- ccmpredZ: normalized CCMpred matrix (L*L).
- alnstats: pairwise relationship generated by alnstats in MetaPSICOV (LL3).
- DistPot: pairwise distance potential predicted by RaptorX-Contact.
Users can obtain the distance potential (and other sequence features) from the RaptorX web server (http://raptorx.uchicago.edu) or the new version of RaptorX software package (https://github.com/j3xugit/RaptorX-3DModeling).
Examples of the sequence features (*.inputFeatures.pkl) and distance potentials (*.distPotential.DFIRE16.pkl) can be found in the examples/
folder.
All decoy models for a protein target shall be saved under a separate folder in PDB format, and their residue indexs should correspond to the sequence in the feature file.
For example, you can download the QA models of T0955 stage1 from the CASP Data Archive, and unzip it to a folder.
wget https://predictioncenter.org/download_area/CASP13/server_predictions/T0955.stage1.3D.srv.tar.gz
tar -zxvf T0955.stage1.3D.srv.tar.gz
$ python ResNetQA.py -h
usage: ResNetQA.py [-h] [-device_id DEVICE_ID] [-n_worker N_WORKER] [-n_batch N_BATCH]
seq_feature dist_potential models_folder output_file
[{GDTTS,GDTTS_Ranking,lDDT,lDDT_Ranking}]
positional arguments:
seq_feature sequence feature file.
dist_potential distance potential file.
models_folder folder of decoy models.
output_file output file.
{GDTTS,GDTTS_Ranking,lDDT,lDDT_Ranking}
quality type. GDTTS_Ranking and lDDT_Ranking are
trained by an extra ranking loss to yield better
ranking performance for global quality assessment.
(default: GDTTS)
optional arguments:
-h, --help show this help message and exit
-device_id DEVICE_ID the device index to use (CPU: -1, GPU: 0,1,2,...). (default: -1)
-n_worker N_WORKER worker num to load data. (default: 0)
-n_batch N_BATCH minibatch size. (default: 1)
ResNetQA can output four quality estimations for different quality metrics: GDTTS, GDTTS_Ranking, lDDT, lDDT_Ranking. For GDTTS and GDTTS_Ranking, the local quality estimation is based on the distance deviation of each residue's C-alpha atom and the global quality estimation is based on the GDTTS. For lDDT and lDDT_Ranking, both the local and global quality estimations are based on the lDDT. GDTTS_Ranking and lDDT_Ranking are trained by an extra ranking loss to yield better ranking performances for global quality assessment, so if you pay more attention to the model ranking, it would be better to use GDTTS_Ranking or lDDT_Ranking, if you pay more attention to the absolute value, it would be better to use GDTTS or lDDT.
Suppose you save the model files of T0955 under examples/T0955/
, you can run the following command to get QA results.
cd ResNetQA/main/
python ResNetQA.py ../examples/T0955.inputFeatures.pkl ../examples/T0955.distPotential.DFIRE16.pkl ../examples/T0955/ ../examples/T0955.QA.pkl GDTTS
The result will be saved at examples/T0955.QA.pkl
.
Examples of the feature and result files for T0967 and T0980s2 can also be found in the folder examples/
.
The quality estimations of CASP12 and CASP13 stage2 decoys by different ResNetQA variants can be found in the folder data/
, in which you may also find the script to evaluate those estimations.
Xiaoyang Jing, Jinbo Xu, Improved Protein Model Quality Assessment By Integrating Sequential And Pairwise Features Using Deep Learning, Bioinformatics, , btaa1037, https://doi.org/10.1093/bioinformatics/btaa1037
Xiaoyang Jing, xyjing@ttic.edu
Jinbo Xu, jinboxu@gmail.com