/GLnexus

Scalable datastore for population genome sequencing, with on-demand joint genotyping

Primary LanguageC++GNU Affero General Public License v3.0AGPL-3.0

GLnexus

From DNAnexus R&D: a scalable datastore for population genome sequencing, with on-demand joint genotyping. (GL, genotype likelihood)

This is an early-stage R&D project we're developing openly. The code doesn't yet do anything useful! There's a wiki project roadmap, which should be read in the spirit of "plans are worthless, but planning is indispensable."

Build & run tests

Coverage Status

First install gcc 4.9 or higher, cmake libjemalloc-dev libboost-dev libzip-dev libsnappy-dev liblz4-dev libbz2-dev python-pyvcf. Then:

cmake -Dtest=ON . && make && ./unit_tests

Other dependencies (should be set up automatically by CMake):

Developer documentation

Evolving developer documentation can be found on the project github page.

Coding conventions

  • C++14 - take advantage of the goodies
  • Use smart pointers to avoid passing resources needing manual deallocation across function/class boundaries
  • Prefer references over pointers when they shouldn't be null nor change ever.
  • Avoid exceptions; prefer returning a Status, defined early in types.h
  • nb the frequently-used convenience macro S() defined just below Status
  • Avoid public constructors with nontrivial bodies; prefer static initializer function returning Status
  • Avoid elaborate templated class hierarchies

Performance profiling

The code has some hooks for performance profiling using perf and FlameGraph.

To profile performance within the DNAnexus applet run the applet as usual plus -i enable_perf=true. This produces an output file genotype.stacks containing sampling observation counts for common call stacks. To generate an SVG visualization with FlameGraph:

git clone https://github.com/brendangregg/FlameGraph
FlameGraph/flamegraph.pl < genotype.stacks > genotype.svg