DeepAlign: An Objective-C repository from Mingchenchen

=======================
DeepAlign_Open (v1.2)
date: 2016.05.20
=======================

Author: Sheng Wang 
Contact email: realbigws@gmail.com , wangsheng@uchicago.edu, wangsheng@ttic.edu

please refer to my homepage:
http://ttic.uchicago.edu/~wangsheng/
--------------------------------------------------------


1. Protein structure analysis 
[Reference]
--------------------------------------------------------
Jianzhu Ma, Sheng Wang.
ALGORITHMS, APPLICATIONS AND CHALLENGES OF PROTEIN STRUCTURE ALIGNMENT.
Advances in Protein Chemistry and Structural Biology, 2014 (invited).
http://www.ncbi.nlm.nih.gov/pubmed/24629187
--------------------------------------------------------


2. Pairwise protein structure alignment or structure homolog search
[Reference]
--------------------------------------------------------
Sheng Wang, Jianzhu Ma, Jian Peng and Jinbo Xu.
    PROTEIN STRUCTURE ALIGNMENT BEYOND SPATIAL PROXIMITY
                     Scientific Reports, 3, 1448, (2013)
http://www.ncbi.nlm.nih.gov/pubmed/23486213
--------------------------------------------------------


3. Multiple protein structure alignment
[Reference]
------------------------------------------------------------
Sheng Wang, Jian Peng and Jinbo Xu.
ALIGNMENT OF DISTANTLY-RELATED PROTEIN STRUCTURES: ALGORITHM,
                 BOUND AND IMPLICATIONS TO HOMOLOGY MODELING.
                  Bioinformatics. 2011 Sep 15;27(18):2537-45
http://www.ncbi.nlm.nih.gov/pubmed/21791532
------------------------------------------------------------


========
Upgrade:
========

For DeepAlign_Open (v1.2)

1. allow user to specify the function to be optimizated. ( -s score_func option)

2. allow user to specify the distance cutoff. ( -C distance_cut option)

3. multiple solution output is now functional. ( -p out_option option)



========
Install:
========

1. download the package

git clone --recursive https://github.com/realbigws/DeepAlign/
cd DeepAlign/

--------------

2. compile

cd source_code/
        make
cd ../

--------------

3. update the package

git pull
git submodule foreach --recursive git pull origin master



=====================
DeepAlign Web Server:
=====================

The Web Server is located at http://raptorx.uchicago.edu/DeepAlign/submit/



==========================
Linux system and CPU type:
==========================

DeepAlign_Open has been compiled and tested under the following Linux system,
	Linux version 3.2.0-72-generic (buildd@toyol) 
	gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 

with the following CPU type,
	Intel(R) Xeon(R) CPU E5-4607 0 @ 2.20GHz



===================================
Setup for structure homolog search:
===================================

In this package please find an example PDB database in databases/CAL_PDB.tar.gz
reference_pdb_list contains all the proteins in this folder. 
Please do not delete or change CAL_PDB and reference_pdb_list.

Users can download some protein structure databases from http://raptorx.uchicago.edu/download/

------

1. protein lists
copy bc*_list to databases/, where '*' stands for 40, 70, 90, 100 

------

2. PDB database
If databases/pdb_BC100/ does not exist, create one.
copy or link all the .pdb files to databases/pdb_BC100/

Note: If you just want to do pairwise or multiple structure alignment, 
you do not have to download these databases, which are supposed to be used for only structure homolog search.
You can also build your own database for structure homology search.

------

3. reference PDB database
Note: The original compressed tar-ball file of CAL_PDB/ could be found at databases/CAL_PDB.tar.gz
Type the following command to un-compress the reference PDB database:

cd databases/
	tar xzf CAL_PDB.tar.gz
cd ../




==============
Main programs:
==============

1. Pairwise structure alignment
To run pairwise alignment, type the following command,
	./DeepAlign protein_T protein_S

2. Multiple structure alignment
To run multiple alignment, type the following command,
	./3DCOMB -i input_list

3. Database search
To run database search, type the following command,
	./DeepSearch -a NP -q query_file -l pdb_list -d path_to_folder_containing_PDB_files

Usage information for all the programs are available by simply running these programs without any arguments.



==============
Misc programs:
==============

1. PDB process tool.
(convert official PDB to compact form. Reconstruct missing backbone and CB atom)
	./PDB_Tool <-i input> <-r range> <-o output>
For more details, please refer to https://github.com/realbigws/PDB_Tool

2. Evaluate quality scores of a given pairwise protein alignment. 
(i.e., GDT, TMscore, MaxSub, and DeepScore)
	./DeepScore protein_1 protein_2 <-a alignment> 



=========
Examples:
=========

Note: the first three examples are located in examples/ directory

1. Get a certain range in an official PDB, and save it to canonical form
	./PDB_Tool -i 1col.pdb -r A -o 1colA.pdb 

2. Extract the chains and the ligands from an official PDB,
	./LIG_Tool -i 1paz.pdb
By default, LIG_Tool is not available. To get LIG_Tool, please type the following command:
	 cd source_code; make LIG_Tool; cd ../

3. Run pairwise structure alignment for 1col.pdb and 1cpc.pdb
	./DeepAlign 1col.pdb 1cpc.pdb

4. Run multiple structure alignment for proteins in input_list
	./3DCOMB -i input_list

5. Run database search for 1pazA.pdb against bc40 template database
	./DeepSearch -q 1pazA.pdb -d databases/CAL_PDB/ -l databases/reference_pdb_list
Two result files will be: 1pazA.rank and 1pazA.rank_simp, which contain detailed and summary results, respectively.



======================
Screenout description:
======================

name1 name2 len1 len2 -> BLOSUM CLESUM DeepScore ->  LALI RMSD TMscore -> MAXSUB GDT_TS GDT_HA -> SeqID nLen

name1:     Name of the first protein
name2:     Name of the second protein
len1:      Length of the first protein
len2:      Length of the second protein
BLOSUM:    BLOSUM62 score of the alignment (in 0.5 bit)
CLESUM:    CLESUM score of the alignment (in 0.5 bit)
DeepScore: DeepScore of the alignment, calculated by
             {max(0,BLOSUM(i,j))+CLESUM(i,j)}*d(i,j)*v(i,j)
LALI:      Length of alignment
RMSD:      RMSD (Root Mean Square Deviation) of the alignment
TMscore:   TMscore of the alignment
MAXSUB:    Maxsub score of the alignment
GDT_TS:    GDT_TS score, using d<1, d<2, d<4 and d<8
GDT_HA:    GDT_HA score, using d<0.5, d<1, d<2 and d<4
SeqID:     the number of identical amino acid matches in the alignment
nLen:      Normalization length for TMscore, MAXSUB, GDT_TS, and GDT_HA




=========================================
Specify residue range by -x or -y option:
=========================================

symbol explanation:
-------------------

':' for PDB numbering, that the range is selected according to the residue number in the PDB file.
'.' for SEQUENCE numbering, that the range is selected according to sequential order, starting from 1.
'-' to indicate a certain beginning till the end.
'+' to indicate a certain beginning followed by the length range.
'@' to indicate the beginning or the ending of a certain chain.
',' to separate between two PDB chains.
'!' to select all chains in the structure.
'_' to use the first chain in the structure.


example:
--------
A:21-179,B.45-@


description:
------------
Two segments from the input structure are selected.
For chain A we select from residue 21 to residue 179 according to PDB numbering,
whereas for chain B we select from position 45 to the end in sequential order.

__________________________________________________________
Mingchenchen/DeepAlign