gs2
is a software to conduct a brand-new phylogenetic analysis method--the Graph Splitting (GS). It can effectively resolve early evolution of protein families, and its accuracy and speed was proved by extensive evolutionary simulation.
gs2
is open-source software (GPL v3.0) inplemented in C++ for Linux, Mac (macOS) and Windows (Cygwin).
Reference: Motomu Matsui and Wataru Iwasaki, Systematic Biology, 2019
Online tool: GS analysis server
Our Laboratory: Iwasaki Lab
version 2.4 (2019/02/12)
- Modified distance function
version 2.3 (2018/11/16)
- Added Transfer Bootstrap Expectation algorithm (F. Lemoine, et al., Nature, 2018)
version 2.2 (2018/11/07)
- Updated to display warnings in case redundant sequences are input
version 2.1 (2018/10/15)
- Added transitivity function
- Modified addEP function
version 2.0 (2018/06/01)
- Re-implemented in C++
- MMseq2 is used for all-to-all pairwise sequence alignment
version 1.0 (2017/02/07)
- Implemented in R and Perl
- BLAST+ is used for all-to-all pairwise sequence alignment
-
GNU GCC compiler (5.0+) is required to compile
gs2
-
CMake (3.0+) is required to compile
mmseqs
❗ Mac users are recommended to install
gcc
andcmake
using Homebrew
$ git clone https://github.com/MotomuMatsui/gs
$ cd gs
$ make
- You can optimize the Makefile in response to your environment (ex.
CXX := g++-8
,CXXFLAGS += -std=c++1z
)
$ export PATH=$(pwd)/MMseqs2/build/bin:$PATH
- You can move
mmseqs
to the other place where you want (ex.~/bin
) and add this path to your PATH environment variable (ex.export PATH=~/bin:$PATH
)
... Compiling LAPACK/BLAS sometimes fails
- Rewrite
OPTS = -O2 -frecursive
toOPTS = -O3 -frecursive -pipe
inlapack-3.7.1/make.inc
, then re-executemake
... You might get the following error message
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [gs2] Error 1
- Firstly, please execute
locate gfortran
to get the path togfortran
. If you already havegfortran
(ex. /usr/local/bin/gfortran-8), execute the following commands in response to your environment.
$ ln -sf /usr/local/bin/gcc-8 /usr/local/bin/gcc
$ ln -sf /usr/local/bin/g++-8 /usr/local/bin/g++
$ ln -sf /usr/local/bin/gfortran-8 /usr/local/bin/gfortran
$ hash -r
$ make clean
$ make
- If you have not had
gfortran
yet, please install the most current version ofgcc
using Homebrew, and execute the above commands
... You might get the following error message
ar cr ../../liblapacke.a
ar: no archive members specified
...
...
...
make: *** [lapack] Error 2
- Please re-execute
make
... If you had previously installed an old version of gcc
, installing mmseqs
sometimes fails
- Please replace the old version of
gcc
with the most current version using Homebrew. SonicParanoid project page should give you a useful hint to solve this issue
... LAPACK/BLAS version 3.8.0 has some problem to be installed
- Choose LAPACK/BLAS version 3.7.1 for installation (default)
To get on-line help:
$ ./gs2 -h
The following command enables you to calculate GS tree (phylogenetic tree reconstructed by Graph Splitting method):
$ ./gs2 [arguments] input > output
❗ A multiple sequence file (ex. example/200.faa) should be required as input
in fasta format
Arguments:
Option | Description |
---|---|
-e | [integer(>=0)] The number of replicates for EP method. Default: 0 |
-r | [integer(>=1)] The random seed number for EP method. Default: random number |
-t | [integer(>=1)] The number of threads for MMseqs. Default: 1 |
-m | [real(1–7.5)] Sensitivity for MMseqs. Default: 7.5 |
-b | [string(tbe/fbs)] The bootstrap method. Default: tbe |
-s | Silent mode: do not report progress. Default: Off |
-l | Newick format with actual names. Default: Off |
-h | Show help messages. Default: Off |
-v | Show the version. Default: Off |
GS tree (in newick format) will be displayed in STDOUT
(correspondence table between IDs and Sequence Names → example/200_annotation.txt):
$ ./gs2 example/200.faa
GS tree with branch reliability (Edge perturbation; EP) scores will be saved in test.nwk
:
$ ./gs2 -e 100 example/200.faa > example/200.nwk
GS tree with EP scores; a seed number is specified for EP method:
$ ./gs2 -e 100 -r 12345 example/200.faa > example/200.nwk
GS tree WITHOUT EP scores + silent mode:
$ ./gs2 -e 0 -s example/200.faa > example/200.nwk
MMseqs2 runs multithreaded jobs (4 CPUs are used in parallel):
$ ./gs2 -e 100 -t 4 example/200.faa > example/200.nwk
Visualization of 200.nwk by iTOL:
This software is distributed under the GNU GPL, see LICENSE
Copyright © 2019, Motomu Matsui
Frederic Lemoine, Jean-Baka Domelevo Entfellner, Eduan Wilkinson, Damien Correia, Miraine Davila Felipe, Tulio De Oliveira, and Olivier Gascuel, Renewing Felsensteins phylogenetic bootstrap in the era of big data, Nature, 2018
Motomu Matsui and Wataru Iwasaki, Graph Splitting: A Graph-Based Approach for Superfamily-Scale Phylogenetic Tree Reconstruction, Systematic Biology, 2019
This package includes the LAPACKE/CBLAS (Univ. of Tennessee; Univ. of California, Berkeley; Univ. of Colorado Denver; and NAG Ltd.) and MMseqs (Söding Laboratory) packages. The authors give special thanks to both teams. You can get the detailed information from http://www.netlib.org/lapack/ and https://github.com/soedinglab/MMseqs2.