/soesa

Automatically exported from code.google.com/p/soesa

Primary LanguageCOtherNOASSERTION

SOESA - Structure Optimization and Validation using Separations of Atoms

	Source code, binaries and scripts: Copyright 1999 by 
	Michael Wall and Rice University

	Files relating to PDF interatomic distance data: Copyright 1999
	by Shankar Subramaniam and the University of Illinois at
	Urbana-Champaign.

	Distribution and modification of the source code, binaries
	and scripts in this package are governed
	under the "Artistic License" (perl license) described in the
	file LICENSE in the SOESA distribution.
	
	Distribution and modification of the data files (files relating
	to PDF interatomic distance data) are governed under terms described in
	the file LICENSE.PDFDATA in the SOESA distribution.  The license 
	restricts redistribution of the data files and all derivatives to 
	the copyright holders and official SOESAwebsites.

	
Author: Michael Wall
Version 0.21; 

This document was prepared by Michael Wall <mewall@lanl.gov> and was
last updated January, 2000

Overview

Soesa is a program for use in evaluating and refining atomic models of
protein structures.  The program calculates an estimated prior
probability of the interatomic distances in the structure by reference
to an interatomic distance probability density function (PDF) database
compiled from known structures.  It also calculates derivatives with
respect to atomic position vectors for use in molecular dynamics
simulations and structure refinement.

See the following article for a description of the application of
soesa to crystallographic refinement:

Michael E. Wall, Shankar Subramaniam, and George N. Phillips,
Jr. Protein structure determination using a database of interatomic
distance probabilities. 1999. Protein Science 12:2720-27.

Disclaimer

This version has proven to be useful in the hands of the developer.
Good results are not guaranteed.  Even though tests have shown that
soesa can be used to improve protein structures, keep in mind that the
structure validation and optimization is always in reference to a
database of know structures, not the correct structure.  Soesa merely tells
you how much a protein structure resembles those in the database, and gives
a suggested perturbation for making a structure more like those in the
database.

System requirements

The program was initially developed under IRIX 6.2, and the initial
distribution will support this platform.  It has also been
successfully run under IRIX 6.5 and Digital Unix.  The system should have
200MB RAM to avoid slow behavior due to excessive disk access.  The
server mode does not yet work under linux, although the other modes
should work fine with data files of opposite byte order (they are 4-byte
floating point).

Distribution 

http://www.bioc.rice.edu/soesa

Installation

Source and binaries are distributed as soesa-<version>-<platform>.tar.gz. 
Database files are distributed as soesa-<version>-dat-<platform>.tar.gz.

1) Download the above files into the directory where you want the 
root directory soesa-0.21/ to be (e.g. /usr/local).  

2) Execute the following command from the UNIX shell:

gzip -dc soesa-<platform>-<version>.tar.gz | tar xf -
gzip -dc soesa-dat-<platform>-<version>.tar.gz | tar xf -

3) In order to build new binaries, execute the following commands 
from the UNIX shell:

cd soesa-0.21/
make

Using SOESA

SOESA can be used in either a command-line mode or a server mode.  In
the command-line mode, instructions are given by "-[option (arg1)
(arg2) ...]" directives on the unix shell command line.  In the
server mode, SOESA is started with the appropriate command-line
arguments, and listens for requests at a first-in, first-out (FIFO)
named pipe file.  The latter mode is recommended for refinement
implementations, as the program otherwise has to read in more than 150
MB of data each time an energy calculation is performed.  A simple
client program called "tellsoesa" is supplied to initiate energy
calculations in server mode.

Starting the server

Soesa is run from the UNIX command line as:

soesa [-option (arg1) (arg2) ...]

Some currently supported options are:

-datafile [filename] 	- Specify PDF data file
-hashfile [filename] 	- Specify index into PDF data file
-aacfg [filename]	- Specify amino acid configuration file for 
			  interpreting data files
-pdb [filename]	        - Specify name of input PDB file
-out [filename]		- Specify name of output file
-verbose		- Turn on verbose log information
-mexc [{m list}] 	- List space-delimited residue separation numbers 
			  which will not be included in calculating scores 
-minc [{m list}] 	- List space-delimited residue separation numbers 
			  to be explicitly included in calculating scores
-escale [weight]	- Weight of the output values.  Default is scaling of 
			  0.58 kcal/mol by analogy with Boltzmann 
			  statistics.
-servermode		- Start in the background in server mode
-fifo [filename]	- Filename for the FIFO used to request actions
-eval			- Output the PDF score by residue of the PDB file
-TNTlong		- Output the PDF score in TNT long format
-TNTshort		- Output the PDF score in TNT short format
-xplor			- Output the PDF score in xplor format
			  the segid of the PDB file indicates selection info

Some scripts are provided for common options:

seval.csh [pdb file] [out file] 

	  Evaluate the pdb file.  
	  Output in out is a list of residue #, score pairs 

sserve.csh [pdb file] [out file] [fifo file]

	   Used for refinement.
	   Start the server.  
	   Always read pdb from the named file (with path received via
                FIFO).  
	   Always output in the named file (with path received via FIFO).
	   Use the named fifo file for message passing.

Using Server Mode

The program tellsoesa is a client that tells SOESA to perform a
calculation, or possibly shut down.
   
It runs from the UNIX command line as:

tellsoesa [options] [-cmd <request>]

The following options are available:

-fifo [filename]      Specify the filename of the FIFO that soesa is
		      monitoring for requests.  If the PDF_HOME
		      environment variable is defined, use it as a 
		      prefix.  The default is "soesa.fifo"
-home [path]	      Specify the path name where the PDB file is read
		      (the filename must be that used for starting
		      soesa), and where the output file is to be written.

Five -cmd requests are possible in version 0.2:

short:	    	Calculate the pdf score in TNT short format.
long:		Calculate the pdf score in TNT long format.
xplor:		Calculate the pdf score in a format that can be
		recognized by a modified version of xplor 3.851.
eval:		Calculate he pdf score by residue (structrure validation).
shutdown:	Shut down the server nicely.  This is preferred, as
                a stale FIFO can cause problems with, e.g. recursive
         	copying under IRIX.

In order to run tellsoesa, either the full path name plus filename of
the FIFO should be given via -fifo or the environment variable
PDF_HOME should be set to the path name where the FIFO can be found.

When tellsoesa is run, it passes a message to SOESA through the FIFO
(e.g. soesa.fifo), telling it what kind of calculation to do.  Soesa
will then look in the directory path specified by -home for a PDB file
with name specified by -pdb on the soesa command line.  It will
perform the requested calculation and place the output in the same
directory path, using the file name specified by -out on the soesa
command line.

Structure validation

Please see Recommendations for advice about preparing the PDB file.

The shell script seval.csh can be used for evaluation:

#!/bin/csh -f
setenv b /soesa_root/bin
setenv d /soesa_root/dat
$b/soesa -aacfg $d/aa.cfg -hashfile $d/pdf.hash -datafile $d/pdf.data \
         -eval -pdb $1 -out $2

Run it by 'csh seval.csh <input pdb> <output score by residue>'

The output file will contain a list of residue number, PDF score pairs.

X-PLOR refinement

Please see Recommendations for advice about preparing the PDB file.

The shell script sserve.csh can be used to start the server for refinement:

#!/bin/csh -f
setenv b soesa_root/bin
setenv d soesa_root/dat
nohup $b/soesa -aacfg $d/aa.cfg -hashfile $d/pdf.hash -datafile $d/pdf.data \
         -servermode -verbose -fifo soesa.fifo -pdb user.pdb -out user.dat \
         >& soesa.log &


A modified version of X-PLOR 3.851 (see www.bioc.rice.edu/soesa/xplor) 
has been created to handle energy inputs by external programs, 
including soesa.  The USER energy flag must be turned on to use this 
feature.  There is full support of the CONStraints INTERactions statement 
with this energy.

During an energy calculation, a file "user.pdb" will be written in the
directory where X-PLOR was executed.  Then, X-PLOR will execute a
script named "user.csh" in the same directory.  It will then wait
until the script is finished executing.  Finally it will read the USER
energy and position derivatives from the file user.dat.

When the user.csh script is through executing, there should be a file
user.dat available.  The file should have the total energy at the top,
followed by space-separated X,Y, and Z partial derivatives of the
energy for each atom in the PDB file on subsequent lines.

Here is a sample user.csh script for use in X-PLOR refinement with a
PDF energy:

#!/bin/csh -f
tellsoesa -fifo /soesa_startup_directory/soesa.fifo \
	  -home /xplor_startup_directory/           \
	  -cmd xplor

A sample script user.csh is included with the soesa-xplor
distribution, along with a sample startup script for soesa.


***IMPORTANT***

Please read the Recommendedations section for instructions on how to
prepare a PDB file for use in soesa.

The following rules are used to determine the number of peptide bonds
separating an atom pair in XPLOR calculations:

1) If the chain ID's are different, the connection is deemed to be
   tertiary.  The segment ID's are *not* considered, as they are used
   to pass information about interactions selection.
2) Otherwise, the different in residue numbers is used to determine
   the number 'm'.  If the difference is negative, there is no score
   contribution from that pair (but there will be when they are
   considered in reverse.

See the web page www.bioc.rice.edu/soesa for updates about refinement and 
validation strategies.  Support will also initially be provided for
new users; contact soesa@bioc.rice.edu for questions.

TNT refinement

Please see Recommendations for advice about preparing the PDB file.

The shell script sserve.csh can be used for TNT refinement (see XPLOR
refinement above).

This part is really experimental at the moment.  Still, sample scripts
are provided for using soesa in TNT refinement.  Due to the multimodal
nature of the PDF distributions, only the steepest descent method will
be successful-at no point should TNT attempt to use any curvature
information.  Significant minimization is possible, but a
simulated-annealing algorithm for TNT will be required to escape local
traps and achieve really good refinement.  Soesa should be run in
-servermode.

The required modifications to TNT scripts are (see sample scripts):

1) Run the TNT program to_pdb to convert the .cor file to a PDB file.  
Output should be changed to the filename specified by -pdb in the 
soesa command line.
2) Run the program tellsoesa, issuing -cmd long
3) Include output specified by -out in the soesa command line in the call to 
the shift program.

***IMPORTANT***

Please read the Recommendations section for instructions on how to
prepare a PDB file for use in soesa.

The following rules are used to determine the number of peptide bonds
separating an atom pair in TNT calculations:

1) If the chain ID's are different or the segment ID's are different,
   the connection is deemed to be tertiary.
2) Otherwise, the different in residue numbers is used to determine
   the number 'm'.  If the difference is negative, there is no score
   contribution from that pair (but there will be when they are
   considered in reverse.


Recommendations

PDB files:

The best way to prepare a PDB file is to number all atoms
sequentially, and to ensure that there are at least five residues
separating chains that are not connected.  This will avoid all
confusion with the interpretation of segid's and chains.

Bug reports/advice/feedback

Soesa Development Team <soesa@bioc.rice.edu>.