GLnexus
From DNAnexus R&D: a scalable datastore for population genome sequencing, with on-demand joint genotyping. (GL, genotype likelihood)
This is an early-stage R&D project we're developing openly. The code doesn't yet do anything useful! There's a wiki project roadmap, which should be read in the spirit of "plans are worthless, but planning is indispensable."
Build & run tests
First install gcc 4.9 or higher, cmake
libjemalloc-dev
libboost-dev
libzip-dev
libsnappy-dev
liblz4-dev
libbz2-dev
python-pyvcf
. Then:
cmake -Dtest=ON . && make && ./unit_tests
Other dependencies (should be set up automatically by CMake):
Developer documentation
Evolving developer documentation can be found on the project github page.
Coding conventions
- C++14 - take advantage of the goodies
- Use smart pointers to avoid passing resources needing manual deallocation across function/class boundaries
- Prefer references over pointers when they shouldn't be null nor change ever.
- Avoid exceptions; prefer returning a
Status
, defined early in types.h - nb the frequently-used convenience macro
S()
defined just belowStatus
- Avoid public constructors with nontrivial bodies; prefer static initializer function returning
Status
- Avoid elaborate templated class hierarchies
Performance profiling
The code has some hooks for performance profiling using
perf
and
FlameGraph.
To profile performance within the DNAnexus applet run the applet as
usual plus -i enable_perf=true
. This produces an output file
genotype.stacks
containing sampling observation counts for common call
stacks. To generate an SVG visualization with FlameGraph:
git clone https://github.com/brendangregg/FlameGraph
FlameGraph/flamegraph.pl < genotype.stacks > genotype.svg