/SegAlign

A Scalable GPU-Based Whole Genome Aligner, published in SC20: https://doi.ieeecomputersociety.org/10.1109/SC41405.2020.00043

Primary LanguageC++MIT LicenseMIT

License Build Status Published in SC20

A Scalable GPU System for Pairwise Whole Genome Alignments based on LASTZ's seed-filter-extend paradigm.

Table of Contents

Overview

The system has been tested on all the AWS G3 and P3 GPU instances with AMI Ubuntu Server 18.04 LTS (HVM), SSD Volume Type (ami-0fc20dd1da406780b (64-bit x86))

git clone https://github.com/gsneha26/SegAlign.git
export PROJECT_DIR=$PWD/SegAlign

Dependencies

The following dependencies are required by SegAlign:

  • NVIDIA CUDA 10.2 toolkit
  • CMake 3.8
  • Intel TBB library
  • libboost-all-dev
  • parallel
  • zlib
  • LASTZ 1.04.15
  • faToTwoBit, twoBitToFa (from kentUtils)

The dependencies can be installed with the given script as follows, which might take a while (only installs the dependencies not present already). This script requires sudo to install most packages at the system level. Using the -c option skips CUDA installation [the CUDA toolkit binaries should be in $PATH for SegAlign].

cd $PROJECT_DIR
./scripts/installUbuntu.sh

How to run SegAlign

  • Run SegAlign
run_segalign target query [options]
  • For a list of options
run_segalign --help

Running a test

cd $PROJECT_DIR
mkdir test
cd test
wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/cb4/bigZips/cb4.2bit 
twoBitToFa ce11.2bit ce11.fa
twoBitToFa cb4.2bit cb4.fa
run_segalign ce11.fa cb4.fa --output=ce11.cb4.maf

How to run SegAlign repeat masker

  • Run SegAlign repeat masker
run_segalign_repeat_masker sequence [options]
  • For a list of options
run_segalign_repeat_masker --help

Running a test

cd $PROJECT_DIR
mkdir test_rm
cd test_rm
wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
twoBitToFa ce11.2bit ce11.fa
run_segalign_repeat_masker ce11.fa --output=ce11.seg

Running Docker Image

Running segalign

wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/cb4/bigZips/cb4.2bit 
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/ce11.2bit \
                           /data/ce11.fa
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/cb4.2bit \
                           /data/cb4.fa
sudo docker run --ipc=host --gpus all -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           run_segalign \
                           /data/ce11.fa \
                           /data/cb4.fa \
                           --output=/data/ce11.cb4.maf

Running segalign_repeat_masker

wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/ce11.2bit \
                           /data/ce11.fa
sudo docker run --ipc=host --gpus all -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           run_segalign_repeat_masker \
                           /data/ce11.fa \
                           --output=/data/ce11.seg

Citing SegAlign

S. Goenka, Y. Turakhia, B. Paten and M. Horowitz, "SegAlign: A Scalable GPU-Based Whole Genome Aligner," in 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Atlanta, GA, US, 2020 pp. 540-552. doi: 10.1109/SC41405.2020.00043