/ResNetQA

ResNetQA: Improved protein model quality assessment by integrating sequential and pairwise features using deep learning

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

ResNetQA

Improved protein model quality assessment by integrating sequential and pairwise features using deep learning

This package is the source code of ResNetQA, which is a single-model-based QA method for both local and global quality assessment.

Requirements

  • python >=3.6
  • pytorch >=1.1.0
  • Biopython (with DSSP installed) >=1.76
  • scipy >=1.4.1
  • numpy >=1.18.1

Usage

Download package

git clone git@github.com:AndersJing/ResNetQA.git

Prepare sequence features

Following is the list of features that shall be prepared in advance, the sequence features are the same as that used by RaptorX-Contact, more detailed information can be found at https://github.com/j3xugit/RaptorX-Contact. The distance potential is predicted by RaptorX-Contact.

  • sequence: primary sequence of the target.
  • PSSM: Position-Specific Scoring Matrix represented as a L*20 matrix.
  • SS3: predicted secondary structure confidence score represented as a L*3 matrix.
  • ACC: predicted solvent accessibility score represented as a L*3 matrix.
  • ccmpredZ: normalized CCMpred matrix (L*L).
  • alnstats: pairwise relationship generated by alnstats in MetaPSICOV (LL3).
  • DistPot: pairwise distance potential predicted by RaptorX-Contact.

Users can obtain the distance potential (and other sequence features) from the RaptorX web server (http://raptorx.uchicago.edu) or the new version of RaptorX software package (https://github.com/j3xugit/RaptorX-3DModeling).

Examples of the sequence features (*.inputFeatures.pkl) and distance potentials (*.distPotential.DFIRE16.pkl) can be found in the examples/ folder.

Prepare decoy models

All decoy models for a protein target shall be saved under a separate folder in PDB format, and their residue indexs should correspond to the sequence in the feature file.

For example, you can download the QA models of T0955 stage1 from the CASP Data Archive, and unzip it to a folder.

wget https://predictioncenter.org/download_area/CASP13/server_predictions/T0955.stage1.3D.srv.tar.gz

tar -zxvf T0955.stage1.3D.srv.tar.gz

Run

$ python ResNetQA.py -h
usage: ResNetQA.py [-h] [-device_id DEVICE_ID] [-n_worker N_WORKER] [-n_batch N_BATCH]
                   seq_feature dist_potential models_folder output_file
                   [{GDTTS,GDTTS_Ranking,lDDT,lDDT_Ranking}]

positional arguments:
  seq_feature           sequence feature file.
  dist_potential        distance potential file.
  models_folder         folder of decoy models.
  output_file           output file.
  {GDTTS,GDTTS_Ranking,lDDT,lDDT_Ranking}
                        quality type. GDTTS_Ranking and lDDT_Ranking are
                        trained by an extra ranking loss to yield better
                        ranking performance for global quality assessment.
                        (default: GDTTS)

optional arguments:
  -h, --help            show this help message and exit
  -device_id DEVICE_ID  the device index to use (CPU: -1, GPU: 0,1,2,...). (default: -1)
  -n_worker N_WORKER    worker num to load data. (default: 0)
  -n_batch N_BATCH      minibatch size. (default: 1)

ResNetQA can output four quality estimations for different quality metrics: GDTTS, GDTTS_Ranking, lDDT, lDDT_Ranking. For GDTTS and GDTTS_Ranking, the local quality estimation is based on the distance deviation of each residue's C-alpha atom and the global quality estimation is based on the GDTTS. For lDDT and lDDT_Ranking, both the local and global quality estimations are based on the lDDT. GDTTS_Ranking and lDDT_Ranking are trained by an extra ranking loss to yield better ranking performances for global quality assessment, so if you pay more attention to the model ranking, it would be better to use GDTTS_Ranking or lDDT_Ranking, if you pay more attention to the absolute value, it would be better to use GDTTS or lDDT.

Suppose you save the model files of T0955 under examples/T0955/, you can run the following command to get QA results.

cd ResNetQA/main/

python ResNetQA.py ../examples/T0955.inputFeatures.pkl ../examples/T0955.distPotential.DFIRE16.pkl ../examples/T0955/ ../examples/T0955.QA.pkl GDTTS

The result will be saved at examples/T0955.QA.pkl.

Examples of the feature and result files for T0967 and T0980s2 can also be found in the folder examples/.

Quality estimations of CASP12 and CASP13 decoys

The quality estimations of CASP12 and CASP13 stage2 decoys by different ResNetQA variants can be found in the folder data/, in which you may also find the script to evaluate those estimations.

Reference

Xiaoyang Jing, Jinbo Xu, Improved Protein Model Quality Assessment By Integrating Sequential And Pairwise Features Using Deep Learning, Bioinformatics, , btaa1037, https://doi.org/10.1093/bioinformatics/btaa1037

Contact

Xiaoyang Jing, xyjing@ttic.edu

Jinbo Xu, jinboxu@gmail.com