____ ____ _ __ __ ____ _ _____ ____ | _ \ _ __ ___ / ___|_ __ __ _ _ __ | |__ | \/ / ___| / \ _|_ _| _ \ | |_) | '__/ _ \| | _| '__/ _` | '_ \| '_ \| |\/| \___ \ / _ \ _| |_| | | |_) | | __/| | | (_) | |_| | | | (_| | |_) | | | | | | |___) / ___ \_ _| | | _ < |_| |_| \___/ \____|_| \__,_| .__/|_| |_|_| |_|____/_/ \_\|_| |_| |_| \_\ |_| Fast and Robust Phylogeny-Aware Multiple Sequence Alignment + tandem repeat unit insertions and deletions ___ _ | _ \_ _ _ _ _ _ (_)_ _ __ _ | / || | ' \| ' \| | ' \/ _` | |_|_\\_,_|_||_|_||_|_|_||_\__, | |___/ ___ ___ _ __ __ ___ _ | _ \_ _ ___ / __|_ _ __ _ _ __| |_ | \/ / __| /_\ | _/ '_/ _ \ (_ | '_/ _` | '_ \ ' \| |\/| \__ \/ _ \ |_| |_| \___/\___|_| \__,_| .__/_||_|_| |_|___/_/ \_\ |_| The easiest way to run ProGraphMSA with the recommended command line parameters is to use the wrapper script "ProGraphMSA+TR.sh" included in this package. Just run ./ProGraphMSA+TR.sh <file> or ./ProGraphMSA+TR.sh -o <output_file> <file> ProGraphMSA+TR will run using ML distances, the WAG model, and will output an alignment in FASTA format. Further it will use T-REKS from the file T-Reks.jar to detect tandem repeats. To use TRUST adjust the installation path in trust2treks.py and run ./ProGraphMSA+TR.sh --custom_tr_cmd trust2treks.py <file> ___ _ _ _ / __|___ _ __ _ __ __ _ _ _ __| | | (_)_ _ ___ | (__/ _ \ ' \| ' \/ _` | ' \/ _` | | | | ' \/ -_) \___\___/_|_|_|_|_|_\__,_|_||_\__,_| |_|_|_||_\___| _ _ __ __ _ _ _ __ _ _ __ ___| |_ ___ _ _ ___ | '_ \/ _` | '_/ _` | ' \/ -_) _/ -_) '_(_-< | .__/\__,_|_| \__,_|_|_|_\___|\__\___|_| /__/ |_| USAGE: ./ProGraphMSA [--ancestral_seqs] [--all_trees] [-i <iterations>] [-T] [-I] [-M] [-m] [-a] [-C <count>] [-F] [--custom_model <file>] [-w] [-c <file>] [-r] [--custom_tr_cmd <command>] [--trd_output <filename>] [--read_repeats <T-Reks format output>] [-R] ... [--repalign] [--repeat_indel_ext <probability>] [--repeat_indel_rate <rate>] [-A] [-P <distance>] [-p <distance>] [-D <distance>] [-d <distance>] [-x <distance>] [-s <probability>] [-l <distance>] [-E <probability>] [-e <probability>] [-g <rate>] [-f] [--dna] [--codon] [--topology <newick file>] [-t <newick file>] [-o <filename>] [--] [--version] [-h] <fasta file> Tandem-repeat related parameters: ================================= -R, --repeats use T-REKS to identify tandem repeats --custom_tr_cmd <command> custom command for detecting tandem-repeats --trd_output <filename> write TR detector output to file --read_repeats <T-REKS format output> read TR detector output from file --repalign re-align detected tandem repeat units --repeat_indel_ext <probability> repeat indel extension probability --repeat_indel_rate <rate> insertion/deletion rate for repeat units (per site) Guide tree, distances, and substitution model: ============================================== -i <iterations>, --iterations <iterations> number of iterations re-estimating guide tree [default: 2] -m, --mldist use distances estimated by a Maximum-Likelihood method -a, --nwdist estimate initial distance tree from Needleman-Wunsch alignments -D <distance>, --max_dist <distance> maximum distance for alignment -F, --estimate_aafreqs estimate equilibrium amino acid frequencies from input data -w, --darwin use model of evolution from Darwin (GONNET matrix and different indel model parameters, otherwise WAG will be used) --custom_model <file> custom substitution model in qmat format -c <file>, --cs_profile <file> path to library of context-sensitive profiles (we distribute a copy in the 3rd_party folder) -A, --no_force_align_m do not force alignment of initial Methionine Parameters for adjusting the indel model: ========================================= -l <distance>, --edge_halflife <distance> edge half-life (evolutionary distance at which the probability of re-using an unsused graph is halved) -E <probability>, --end_indel_prob <probability> probability of mismatching sequence ends (set to -1 to disable this feature) -e <probability>, --gap_ext <probability> gap extension probability -g <rate>, --indel_rate <rate> insertion/deletion rate Input/Output: ============= -f, --fasta output fasta format (instead of stockholm) -t <newick file>, --tree <newick file> initial guide tree -o <filename>, --output <filename> Output file name -I, --input_order output sequences in input order (default: tree order) --dna align DNA sequence --codon align DNA sequence based on a codon model --ancestral_seqs output all ancestral sequences <fasta file> (required) input sequences ___ _ _ _ _ __ | _ )_ _(_) |__| (_)_ _ __ _ / _|_ _ ___ _ __ ___ ___ _ _ _ _ __ ___ | _ \ || | | / _` | | ' \/ _` | | _| '_/ _ \ ' \ (_-</ _ \ || | '_/ _/ -_) |___/\_,_|_|_\__,_|_|_||_\__, | |_| |_| \___/_|_|_| /__/\___/\_,_|_| \__\___| |___/ For building ProGraphMSA from source you need: CMake >=2.8 (http://www.cmake.org) tclap >=1.1.0 (http://tclap.sourceforge.net) Eigen 2.0.x or 3.0.x (http://eigen.tuxfamily.org) on Debian/Ubuntu you can install these programs/libraries with: sudo apt-get install build-essential cmake libtclap-dev libeigen3-dev On a Mac, libraries can be installed with Homebrew: brew install gcc cmake eigen tclap Note that homebrew installs these to non-default locations, so cmake should be called with the following additional options: -D EIGEN_INCLUDE_DIR=/usr/local/include/eigen3 \ -D CMAKE_C_COMPILER=/usr/local/bin/gcc-7 \ -D CMAKE_CXX_COMPILER=/usr/local/bin/c++-7 \ ProGraphML is written with C++11. Using the default Apple compiler is not recommended due to libc++ incompatibilities. Then perform the following command to configure/build/install ProGraphMSA: cd BUILD cmake .. (press "c" to configure and "g" to generate the Makefile, see below for additional configuration options) make ProGraphMSA make install An example build script is included (`build_ProGraphMSA.sh`) to illustrate this process. It can be modified for your particular system. Additional CMake configuration options (in "ccmake .."): EIGEN2_INCLUDE_DIR: set this to the path, where Eigen is installed, if you use Eigen 2.0.x or if Eigen has been installed at a non-default location (default location: /usr/include/eigen3) WITH_EIGEN3: set this to ON, if you want to compile ProGraphMSA with Eigen 3.0.x CMAKE_CXX_FLAGS: add options for the C++ compiler, like optimization flags, or additional search paths for include files (-I <path>) WITH_SSE: disable this option, if you build ProGraphMSA for a machine that does not support SSE2 _ _ _ _ | || (_)__| |_ ___ _ _ _ _ | __ | (_-< _/ _ \ '_| || | |_||_|_/__/\__\___/_| \_, | |__/ ProGraphMSA was written by Adam Szalkowski: Adam M. Szalkowski and Maria Anisimova. Graph-based modeling of tandem repeats improves global multiple sequence alignment. Nucleic Acids Research, Volume 41, Issue 17, 1 September 2013, Page e162, https://doi.org/10.1093/nar/gkt628 The original source was hosted at Source Forge: https://sourceforge.net/projects/prographmsa/ A copy of this is also hosted on Adam's github: https://github.com/butzist/ProGraphMSA ProGraphMSA is now maintained by the Anisimova group at https://github.com/acg-team/ProGraphMSA ___ _ _ _ _ / __|___ _ _| |_ _ _(_) |__ _ _| |_ ___ _ _ ___ | (__/ _ \ ' \ _| '_| | '_ \ || | _/ _ \ '_(_-< \___\___/_||_\__|_| |_|_.__/\_,_|\__\___/_| /__/ Adam Szalkowski Spencer Bliven
sbliven/ProGraphMSA
ProGraphMSA is a state-of-the-art multiple sequence alignment tool which produces phylogenetically sensible gap patterns while maintaining robustness by allowing alternative splicings and errors in the branching pattern of the guide tree. Fork of ProGraphMSA+TR from https://sourceforge.net/projects/prographmsa
C++GPL-3.0