/GKL

Accelerated kernel library for genomics

Primary LanguageCMIT LicenseMIT

Build Status Maven Central

Genomics Kernel Library (GKL)

This repository contains optimized versions of compute kernels used in genomics applications like GATK and HTSJDK. These kernels are optimized to run on Intel Architecture (AVX, AVX2, AVX-512, and multicore) under 64-bit Linux and Mac OSX.

Kernels included:

  • PairHMM
    • AVX and AVX-512 optimized versions of PairHMM used in GATK HaplotypeCaller and MuTect2.
    • OpenMP support for multicore processors.
  • Smith-Waterman
    • AVX2 and AVX-512 optimized versions of Smith-Waterman used in GATK HaplotypeCaller and MuTect2.
  • DEFLATE Compression/Decompression:
    • Performance optimized Level 1 and 2 compression and decompression from Intel's ISA-L library.
    • Performance optimized Level 3 through 9 compression from Intel's Open Source Technology Center zlib library.
  • Partially Determined HMM (PDHMM)
    • AVX2 and AVX-512 optimized versions of PDHMM used in GATK.
    • Serial Implementation for CPU's with no AVX.

Building GKL

GKL release binaries are built on CentOS 7, to enable running on most Linux distributions (see holy-build-box for a good description of portability issues).

Requirements

  • Java JDK 8
  • Git >= 2.5
  • CMake >= 2.8.12.2
  • GCC g++ >= 5.3.1
  • GNU patch >= 2.6
  • GNU libtool >= 2.2.6
  • GNU automake >= 1.11.1
  • Yasm >= 1.2.0
  • zlib-devel >= 1.2.7

Setup

Run these commands to set up the build environment on CentOS:

sudo yum install -y java-1.8.0-openjdk-devel git cmake patch libtool automake yasm zlib-devel centos-release-scl help2man
sudo yum install -y devtoolset-7-gcc-c++
source scl_source enable devtoolset-7

Build and Test

After build requirements are met, clone, and build:

git clone https://github.com/Intel-HLS/GKL.git
cd GKL
./gradlew build

For more details check build.sh

Known issues

  • (Version 0.8.11 only): Some GKL dependencies are declared incorrectly as implementation which makes them not accessible by projects depending on GKL unless the project itself also uses those dependencies. Workaround for this issue is to include following dependencies manually in affected projects:
    implementation 'org.broadinstitute:gatk-native-bindings:1.0.0'
    implementation 'com.github.samtools:htsjdk:3.0.5'
    
    Fix for this issue is present in master branch.
  • When compressing using ISA-L library (compression levels 1, 2) outputted compressed data size can differ by small amount of bytes (up to 100) for the same input. This does not affect original uncompressed contents. Investigation of this issue is ongoing.

License

All code is licensed under the MIT License, except: