/DAQ

DAQ: Residue-Wise Local Quality Estimation for Protein Models from Cryo-EM Maps

Primary LanguageCGNU General Public License v3.0GPL-3.0

DAQ


DAQ is a computational tool using deep learning that can estimate the residue-wise local quality for protein models from cryo-Electron Microscopy (EM) maps.

Copyright (C) 2021 Genki Terashi* , Xiao Wang*, Sai Raghavendra Maddhuri Venkata Subramaniya, John J. G. Tesmer, and Daisuke Kihara, and Purdue University.

License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)

Contact: Daisuke Kihara (dkihara@purdue.edu)

For technical problems or questions, please reach to Xiao Wang (wang3702@purdue.edu).

Citation:

Terashi, G., Wang, X., Maddhuri Venkata Subramaniya, S. R., Tesmer, J. J. G. & Kihara, D. Residue-wise local quality estimation for protein models from cryo-EM maps. Nature Methods 19, 1116-1125, doi:10.1038/s41592-022-01574-4 (2022).

@article{terashi2022residue,
  title={Residue-wise local quality estimation for protein models from cryo-EM maps},
  author={Terashi, Genki and Wang, Xiao and Maddhuri Venkata Subramaniya, Sai Raghavendra and Tesmer, John JG and Kihara, Daisuke},
  journal={Nature Methods},
  volume={19},
  number={9},
  pages={1116--1125},
  year={2022},
  publisher={Nature Publishing Group US New York}
}

All the functions in this github are available here. Related instructions are included in the Colab website.

Introduction

An increasing number of protein structures are determined by cryogenic electron microscopy (cryo-EM). Although the resolution of determined cryo-EM density maps is improving in general, there are still many cases where amino acids of a protein are assigned with different levels of confidence, including those assigned with relatively high ambiguity. Here, we developed a method that identifies potential misassignment of residues in the map, including residue shifts along an otherwise correct main-chain trace. The score, named DAQ, computes the likelihood that the local density corresponds to different amino acids, atoms, and secondary structures from the map density distribution and assesses how well amino acids in the reconstructed model structure agree with the likelihood. DAQ is complementary to existing model validation scores for cryo-EM that examine local density gradient in the map or stereochemical geometry of the structure model. When DAQ was applied to different versions of model structure entries in PDB that were derived from the same density maps, a clear improvement of DAQ-score was observed in the newer versions of the models. The DAQ-score also found potential misassignment errors in a substantial number of over 4400 deposited protein structure models built into cryo-EM maps.

Overall Protocol

protocol

Overview of DAQ Score

$DAQ(AA)(i) = \log{(\frac{P_{aa(i)}(i)}{\sum_{j}(P_{aa(i)}(j)/N)})}$

where $aa(i)$ is the amino acid type of residue i, $P_{aa(i)}(i)$ is the computed probability for amino acid type $aa(i)$ for the nearest grid point to the Cα atom of residue $i$. As shown in the equation, the probability is normalized by the average probability of amino acid type $aa(i)$ across over all atom positions in the protein model.
*If the assignment is correct, DAQ will be positive, and negative if the assignment may be incorrect.
*If a position in the map does not have distinct density pattern for the assigned amino acid (or secondary structure, Calpha atom), DAQ will be close to 0.

Pre-required software

Python 3 : https://www.python.org/downloads/
Pymol(for visualization): https://pymol.org/2/

Installation

2. Clone the repository in your computer

git clone https://github.com/kiharalab/DAQ && cd DAQ

3. Build dependencies.

You have two options to install dependency on your computer:

3.1 Install with pip and python(Ver 3.6.9).

3.1.2 Install dependency in command line.
pip3 install -r requirements.txt --user

If you encounter any errors, you can install each library one by one:

!pip install mrcfile==1.2.0 
!pip install numpy>=1.19.4
!pip install numba>=0.52.0
!pip install torch>=1.6.0
!pip install scipy>=1.6.0

3.2 Install with anaconda

3.2.2 Install dependency in command line
conda create -n daq python=3.8.5
conda activate daq
pip install -r requirements.txt 

Each time when you want to run my code, simply activate the environment by

conda activate daq
conda deactivate(If you want to exit) 

Usage

python3 main.py -h:
  -h, --help            show this help message and exit
  -F F                  Map file path
  -M M                  QA deep learning model path, default:"best_model/qa_model/Multimodel.pth"
  -P P                  PDB file path
  --mode MODE           Running Mode
  --stride STRIDE       Stride size for scanning maps (default:1)
  --voxel_size          input voxel size (default:11)
  --gpu GPU             specify the gpu to use
  --batch_size          batch size for inference (default:256)
  --cardinality         ResNeXt cardinality
  --window WINDOW       half window size to smooth the score for output (default:9)

1. Run DAQ

Since DAQ(AA) yields the best score

python main.py --mode=0 -F [Map_path]  -P [Structure_path] --window [half_window_size] --stride [stride_size] 

Please Run the script under "DAQ" directory, otherwise it may raise errors because of the complilation failure.

Here [Map_path] is the cryo-EM map file path in your computer, which can be *.mrc and *.mrc.gz format, [Structure_path] is the protein structure in pdb format; [half_window_size] is half of the window size that used for smoothing the residue-wise score based on a sliding window scanning the entire sequence, here half_window_size=(window_size-1)/2; [stride_size] is the stride step to scan the maps.
Output will be saved in "Predict_Result_WithPDB/[Input_Map_Name]".

Running Example

python main.py --mode=0 -F example/2566_3J6B_9.mrc -P example/3J6B_9.pdb --window 9 --stride 2

Results of this example is saved in 2566_Result

Preparing the input map

If the cryo-EM map grid spacing is not 1, it typically takes longer time to resample the map to have grid spacing 1 by our script. Hence, you can also use ChimeraX to accelerate the speed by providing the script a resampled map:

1 open your map via chimeraX. 
2 In the bottom command line to type command: vol resample #1 spacing 1.0
3 In the bottom command line to type command: save newmap.mrc model #2
4 Then you can use the resampled map to upload

2. Visualization Result

In Pymol, open "daq_score_w9.pdb" file, please type the following command line:

spectrum b, red_white_blue,  all, -1,1

Here blue region means the quality is acceptable while red region means the quality is not so good. Here we put DAQ(AA) score in the b-factor columns. Detailed explanation is here.

Output file

The detailed instructions are here.

DAQ Container

To build the Docker image, change the current directory to DAQ_container. Use the following command to create the image: sudo docker build -t daq .

To use the image, please follow the rest of the user manual available at https://kiharalab.org/emsuites/daq.php.

Q&A

For possible errors and solutions, please check QA.md