This is a protein-protein docking package. Its main operation is to accept 3D structures of two proteins that are known to interact and generate several models of their 3D complex.
This package incorporates codes originally developed by the author while working on a docking software GRAMM-X.
The architecture has been redesigned for better portability and parallel execution on a wide range of distributed backends including high-throughput batch clusters, multicore workstations, MPI clusters as well as collections of virtual machine instances in the clouds with or without shared file systems.
Andrey Tovchigrechko
See original GRAMM-X publications regarding the force field and search procedure:
- Tovchigrechko A, Vakser IA. GRAMM-X public web server for protein-protein docking. Nucleic Acids Res. 2006; 34:W310-4.
- Tovchigrechko A, Vakser IA. Development and testing of an automated approach to protein docking. Proteins 2005;60(2):296-301.
Regarding portable distributed execution techniques used in this version:
- Tovchigrechko A, Venepally P, Payne SH. PGP: Parallel Prokaryotic Proteogenomics Pipeline for MPI clusters, high-throughput batch clusters and multicore workstations. Bioinformatics 2014.
National Science Foundation award 1048199 and allocation on Azure cloud from Microsoft. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or Microsoft Co.
GPLv3. See also COPYING file that accompanies the source code.
PRODDL2 uses CMake (>2.8.10) for configuration and building.
In config/<linux|win>
, edit toolchain.cmake and shell script files to help CMake find and use
PRODDL dependencies.
-
Build-time dependencies:
Python (>=2.7), FFTW3, Boost (>1.50), HDF5 (>1.8.3), Blitz++ (>0.10). PRODDL build procedure will also try to build several Python packages. One of them, Numpy (>1.8) might fail to build automatically depending on your Linux flavor. In that case you will have to install Numpy before building PRODDL.
-
Run-time dependencies:
IMP (>2.2.0). There is a sample shell script
config/linux/deps_build/
for building IMP on Linux. On Windows, we recommend installing a pre-compiled IMP package.Makeflow (>4.2.0). On Windows, use Cygwin pre-compiled binary. We recommend placing
makeflow
executable in the PATH defined in the shell scripts underconfig
. Alternatively, you can specify the full path for the executable inconfig/noarch/proddl.json.in
. -
If you will be using batch queuing system for distributed execution on Linux (SGE, PBS, Slurm), you might have to edit
config/linux/proddl-deps.rc.in
tosource
your~/.bashrc
. This will make sure that any environment settings from your~/.bashrc
, including LD_LIBRARY_PATH, are propagated to the cluster jobs submitted by PRODDL workflow.
On Linux, we used GCC (>4.7.0). On Windows, we used Visual Studio 2010 in 32 bit mode.
After you have configured the dependencies, use sample commands for configuring, building and installing
PRODDL from config/linux/build.sh
and config/win/build.bat
After successfully building and installing PRODDL, run ctest
in the build directory
to execute the unit tests.
Assuming that you have installed PRODDL under PRODDL_HOME, you can run:
PRODDL_HOME/bin/proddl-wrapper proddl-dock dock \
PRODDL_HOME/test_data/pdb/2ptc_E.pdb \
PRODDL_HOME/test_data/pdb/2ptc_I.pdb \
res.pdb
The above will dock proteins from two input files and generate an output file res.pdb with 10 top ranked models in a format that is used for multiple NMR structures in PDB (structures are separated by MODEL/ENDMDL records).
In that simple form, the program will run locally, using all available CPU cores for the computationally intensive parts of the workflow.
To run on distributed backends, add a parameter --makeflow-args
to the command line. Its value should be a quoted string
with all necessary Makeflow parameters.
For example, to use SGE, you can add to the command above something like:
--makeflow-args "-T sge -B '-b n -P 1111'"
Notice the use of single quotes inside double quotes above,
done in order to properly escape the native SGE arguments
supplied with the -B
argument of Makeflow.
Run PRODDL_HOME/bin/proddl-wrapper makeflow --help
for the list
of possible Makeflow arguments, including all flavors of the batch
systems that it can support.
For HPC MPI clusters, you can reuse jobs submission scripts and Makeflow parameters generated by our pipeline PGP.
For Microsoft Azure cloud, you can run PRODDL under our client-cloud execution framework PRODDL-C.
You can also expose PRODDL as a Galaxy tool by deploying
into Galaxy the tool files from api/galaxy/
.
Run PRODDL_HOME/bin/proddl-wrapper proddl-dock dock --help
to get the list of all available options for the docking
protocol.