======================= DeepAlign_Open (v1.2) date: 2016.05.20 ======================= Author: Sheng Wang Contact email: realbigws@gmail.com , wangsheng@uchicago.edu, wangsheng@ttic.edu please refer to my homepage: http://ttic.uchicago.edu/~wangsheng/ -------------------------------------------------------- 1. Protein structure analysis [Reference] -------------------------------------------------------- Jianzhu Ma, Sheng Wang. ALGORITHMS, APPLICATIONS AND CHALLENGES OF PROTEIN STRUCTURE ALIGNMENT. Advances in Protein Chemistry and Structural Biology, 2014 (invited). http://www.ncbi.nlm.nih.gov/pubmed/24629187 -------------------------------------------------------- 2. Pairwise protein structure alignment or structure homolog search [Reference] -------------------------------------------------------- Sheng Wang, Jianzhu Ma, Jian Peng and Jinbo Xu. PROTEIN STRUCTURE ALIGNMENT BEYOND SPATIAL PROXIMITY Scientific Reports, 3, 1448, (2013) http://www.ncbi.nlm.nih.gov/pubmed/23486213 -------------------------------------------------------- 3. Multiple protein structure alignment [Reference] ------------------------------------------------------------ Sheng Wang, Jian Peng and Jinbo Xu. ALIGNMENT OF DISTANTLY-RELATED PROTEIN STRUCTURES: ALGORITHM, BOUND AND IMPLICATIONS TO HOMOLOGY MODELING. Bioinformatics. 2011 Sep 15;27(18):2537-45 http://www.ncbi.nlm.nih.gov/pubmed/21791532 ------------------------------------------------------------ ======== Upgrade: ======== For DeepAlign_Open (v1.2) 1. allow user to specify the function to be optimizated. ( -s score_func option) 2. allow user to specify the distance cutoff. ( -C distance_cut option) 3. multiple solution output is now functional. ( -p out_option option) ======== Install: ======== 1. download the package git clone --recursive https://github.com/realbigws/DeepAlign/ cd DeepAlign/ -------------- 2. compile cd source_code/ make cd ../ -------------- 3. update the package git pull git submodule foreach --recursive git pull origin master ===================== DeepAlign Web Server: ===================== The Web Server is located at http://raptorx.uchicago.edu/DeepAlign/submit/ ========================== Linux system and CPU type: ========================== DeepAlign_Open has been compiled and tested under the following Linux system, Linux version 3.2.0-72-generic (buildd@toyol) gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) with the following CPU type, Intel(R) Xeon(R) CPU E5-4607 0 @ 2.20GHz =================================== Setup for structure homolog search: =================================== In this package please find an example PDB database in databases/CAL_PDB.tar.gz reference_pdb_list contains all the proteins in this folder. Please do not delete or change CAL_PDB and reference_pdb_list. Users can download some protein structure databases from http://raptorx.uchicago.edu/download/ ------ 1. protein lists copy bc*_list to databases/, where '*' stands for 40, 70, 90, 100 ------ 2. PDB database If databases/pdb_BC100/ does not exist, create one. copy or link all the .pdb files to databases/pdb_BC100/ Note: If you just want to do pairwise or multiple structure alignment, you do not have to download these databases, which are supposed to be used for only structure homolog search. You can also build your own database for structure homology search. ------ 3. reference PDB database Note: The original compressed tar-ball file of CAL_PDB/ could be found at databases/CAL_PDB.tar.gz Type the following command to un-compress the reference PDB database: cd databases/ tar xzf CAL_PDB.tar.gz cd ../ ============== Main programs: ============== 1. Pairwise structure alignment To run pairwise alignment, type the following command, ./DeepAlign protein_T protein_S 2. Multiple structure alignment To run multiple alignment, type the following command, ./3DCOMB -i input_list 3. Database search To run database search, type the following command, ./DeepSearch -a NP -q query_file -l pdb_list -d path_to_folder_containing_PDB_files Usage information for all the programs are available by simply running these programs without any arguments. ============== Misc programs: ============== 1. PDB process tool. (convert official PDB to compact form. Reconstruct missing backbone and CB atom) ./PDB_Tool <-i input> <-r range> <-o output> For more details, please refer to https://github.com/realbigws/PDB_Tool 2. Evaluate quality scores of a given pairwise protein alignment. (i.e., GDT, TMscore, MaxSub, and DeepScore) ./DeepScore protein_1 protein_2 <-a alignment> ========= Examples: ========= Note: the first three examples are located in examples/ directory 1. Get a certain range in an official PDB, and save it to canonical form ./PDB_Tool -i 1col.pdb -r A -o 1colA.pdb 2. Extract the chains and the ligands from an official PDB, ./LIG_Tool -i 1paz.pdb By default, LIG_Tool is not available. To get LIG_Tool, please type the following command: cd source_code; make LIG_Tool; cd ../ 3. Run pairwise structure alignment for 1col.pdb and 1cpc.pdb ./DeepAlign 1col.pdb 1cpc.pdb 4. Run multiple structure alignment for proteins in input_list ./3DCOMB -i input_list 5. Run database search for 1pazA.pdb against bc40 template database ./DeepSearch -q 1pazA.pdb -d databases/CAL_PDB/ -l databases/reference_pdb_list Two result files will be: 1pazA.rank and 1pazA.rank_simp, which contain detailed and summary results, respectively. ====================== Screenout description: ====================== name1 name2 len1 len2 -> BLOSUM CLESUM DeepScore -> LALI RMSD TMscore -> MAXSUB GDT_TS GDT_HA -> SeqID nLen name1: Name of the first protein name2: Name of the second protein len1: Length of the first protein len2: Length of the second protein BLOSUM: BLOSUM62 score of the alignment (in 0.5 bit) CLESUM: CLESUM score of the alignment (in 0.5 bit) DeepScore: DeepScore of the alignment, calculated by {max(0,BLOSUM(i,j))+CLESUM(i,j)}*d(i,j)*v(i,j) LALI: Length of alignment RMSD: RMSD (Root Mean Square Deviation) of the alignment TMscore: TMscore of the alignment MAXSUB: Maxsub score of the alignment GDT_TS: GDT_TS score, using d<1, d<2, d<4 and d<8 GDT_HA: GDT_HA score, using d<0.5, d<1, d<2 and d<4 SeqID: the number of identical amino acid matches in the alignment nLen: Normalization length for TMscore, MAXSUB, GDT_TS, and GDT_HA ========================================= Specify residue range by -x or -y option: ========================================= symbol explanation: ------------------- ':' for PDB numbering, that the range is selected according to the residue number in the PDB file. '.' for SEQUENCE numbering, that the range is selected according to sequential order, starting from 1. '-' to indicate a certain beginning till the end. '+' to indicate a certain beginning followed by the length range. '@' to indicate the beginning or the ending of a certain chain. ',' to separate between two PDB chains. '!' to select all chains in the structure. '_' to use the first chain in the structure. example: -------- A:21-179,B.45-@ description: ------------ Two segments from the input structure are selected. For chain A we select from residue 21 to residue 179 according to PDB numbering, whereas for chain B we select from position 45 to the end in sequential order. __________________________________________________________