/subopt-kaks

Suboptimal alignments and Ka, Ks calculations

Primary LanguageC

VERSION (unreleased, 2004-02-18)
Suboptimal alignments and Ka/Ks calculations

This src is freely available, but should be considered a personal
communication as I have not properly packaged it or given adequete
credit to the code I am reusing here.  Please contact me if you have
questions or intend to use this for large scale analyses so I can be
sure it is performing properly for you. 

To build this:

% cd squid
% ./configure 
% make
% cd ../src
% make
% make install
% cd ..

# set the env variable SUBOPTDIR 
% export SUBOPTDIR=`pwd`

# do a test alignment
% ./bin/yn00_cds_optimal --showtable tst/F35G8.1_CBG16160.cds.fas > table

#Run on a pre-aligned file alignment format 
% yn00_cds_prealigned tst/F35G8.1_CBG16160.cds.aln

Should build on OSX and Linux fine.  

SQUID is Sean Eddy's library.  Much of the code for alignment is based
on design ideas from Michael Smoot.  The dN,dS estimation is reusing
Z.Yang's yn00 implementation from his PAML package with some
modification to excise it from the PAML data structures.

To use it, I suggest using yn00_cds_optimal which will calculate an
optimal alignment (local alignment by default) and generate the table
of alignments using the --showtable option.  

Helpful hints:

  You need to pass in a file which contains two CDS sequences, introns
  removed, and with the stop codon removed.  I realize the removing of
  the stop codon is annoying, but this is the requirement of PAML and
  I have not added code in these applications yet to detect and strip
  the last stop codon from a CDS.

  The --showtable will print out a tab delimited summary of Ka and Ks
  and Omega as calculated by Yang's YN00 implementation. 

  You can specify the substitution matrix with --matrix option - pass
  in the whole path to the matrix file.

  -h will print help options for an application

  This is not a particularly robust implementation for some things so
  if you get segfaults let me know, I am finding slight differences in
  memory management between OSX and linux and have not been testing
  this on solaris at all at this point.


Usage: yn00_cds_optimal [-options] <sequence file>

   Generate the optimal alignments for a pair of CDS sequences using the protein sequence as a template.
   Available options:
   -h      : help; print version and usage info
   -v      : verbose output -- show alignment path states.
 

   --global      : generate global rather than the default local alignments
   --gapopen num : Gap open penalty.  Default is -8.
   --gapext      : Gap extension penalty.  Default is -2.
   --informat    : Format of the input sequence file database.
   --outformat   : Format of the output sequence alignments.
   --output      : specify an output file instead or writing to STDOUT
   --matrix      : Path to substitution matrix.
                   Default is $SUBOPTDIR/matricies/aa/BLOSUM50 or
                    ../matricies/aa/BLOSUM50 in the absence of
                    SUBOPTDIR env variable
   --gapchar     : Provide a different gap char from '-'
   --showtable   : Show the summary Ka/Ks table instead of the alignment.
   --noheader    : Don't display a header for the table
 

Jason Stajich
Duke University
jason.stajich AT duke.edu or jason AT bioperl.org