Hail

Hail is a framework for scalable genetic data analysis. Hail is pre-alpha software and under active development. Hail is written in Scala (mostly) and uses Apache Spark and other Apache Hadoop projects. If you are interested in getting involved in Hail development, email hail@broadinstitute.org.

Documentation

Building
Representation
Hail's expression language
Importing
Splitting Multiallelic Variants
Renaming Samples
Annotating Variants
Annotating Samples
Annotating Global
Quality Control
PCA
Annotating with the Variant Effect Predictor
Filtering
Querying using SQL
Linear regression
Mendel errors
Exporting to TSV
Exporting to VCF
Exporting to Plink
Persist

Roadmap

Here is a rough list of features currently planned or under development:

generalized query language
better interoperability with other Hadoop projects
kinship estimation from GRM
LMM
burden tests, SKAT
logistic regression
dosage
posterior (PP)
LD pruning
sex check
TDT
BGEN
Kaitlin Samocha's de novo caller

dvrana/hail

Hail

Documentation

Roadmap