From DNAnexus R&D: scalable gVCF merging and joint variant calling for population sequencing projects. (GL, genotype likelihood)
In our manuscript with collaborators at Regeneron Genetics Center and Baylor College of Medicine, we detail the design of GLnexus and scientific validation using up to 240,000 human exomes and 22,600 genomes. Compared to the DNAnexus cloud-native deployment used for such large projects, this open-source version produces identical scientific results but lacks some of the scalability and production-oriented features.
The Getting Started wiki page has a tutorial for first-time users.
For each tagged revision, the Releases page has a static executable suitable for most Linux x86-64 hosts; just download it and chmod +x glnexus_cli
.
The GLnexus build process has a number of dependencies, but produces a standalone, statically-linked executable glnexus_cli
. The easiest way to build it is to use our Dockerfile to control all the compile-time dependencies, then simply copy the static executable out of the resting Docker container and put it anywhere you like.
# Build GLnexus using its Dockerfile.
# You can set a specific git revision by adding --build-arg=git_revision=xxxx
curl -s https://raw.githubusercontent.com/dnanexus-rnd/GLnexus/master/Dockerfile \
| docker build --no-cache -t glnexus_tests -
# Run GLnexus unit tests.
docker run --rm glnexus_tests
# Copy the static GLnexus executable to the current working directory.
docker run --rm -v $(pwd):/io glnexus_tests cp glnexus_cli /io
# Run it to see its usage message.
./glnexus_cli
To build GLnexus without Docker, make sure you have gcc 5+, CMake 3.2+, and all the dependencies indicated in the Dockerfile.
Then,
git clone --recursive https://github.com/dnanexus-rnd/GLnexus.git
cd GLnexus
cmake -Dtest=ON . && make -j$(nproc) && ctest -V
You will also find ./glnexus_cli
here.
- C++14 - take advantage of the goodies
- Use smart pointers to avoid passing resources needing manual deallocation across function/class boundaries
- Prefer references over pointers when they shouldn't be null nor change ever.
- Avoid exceptions; prefer returning a
Status
, defined early in types.h - nb the frequently-used convenience macro
S()
defined just belowStatus
- Avoid public constructors with nontrivial bodies; prefer static initializer function returning
Status
- Avoid elaborate templated class hierarchies
The Performance wiki page has practical advice for deploying GLnexus on a powerful server.
The code has some hooks for performance profiling using
perf
and
FlameGraph.
To profile performance within the DNAnexus applet run the applet as
usual plus -i perf=true
. This produces an output file
genotype.stacks
containing sampling observation counts for common call
stacks. To generate an SVG visualization with FlameGraph:
git clone https://github.com/brendangregg/FlameGraph
FlameGraph/flamegraph.pl < genotype.stacks > genotype.svg