/hail

scalable genetic data analysis https://hail.is

Primary LanguageScalaMIT LicenseMIT

Hail

Join the chat at https://gitter.im/broadinstitute/hail

Hail is a framework for scalable genetic data analysis. Hail is pre-alpha software and under active development. Hail is written in Scala (mostly) and uses Apache Spark and other Apache Hadoop projects. If you are interested in getting involved in Hail development, email hail@broadinstitute.org.

Documentation

Read the docs.

Citing Hail

If you use Hail for published work, please cite both the software:

and the forthcoming manuscript describing Hail (if possible):

  • Cotton Seed, Alex Bloemendal, Jonathan M Bloom, Jacqueline I Goldstein, Daniel King, Timothy Poterba. Hail: An Open-Source Framework for Scalable Genetic Data Analysis. In preparation.

or the following paper which includes a brief introduction to Hail in the online methods:

  • Andrea Ganna, Giulio Genovese, Daniel P Howrigan, Andrea Byrnes, Mitja Kurki, Seyedeh M Zekavat, Christopher W Whelan, Robert E Handsaker, Mart Kals, Alex Bloemendal, Jonathan M Bloom, Jacqueline I Goldstein, Timothy Poterba, Cotton Seed, Michel G Nivard, Pradeep Natarajan, Reedik Magi, Diane Gage, Elise B Robinson, Andres Metspalu, Veikko Salomaa, Jaana Suvisaari, Shaun M Purcell, Pamela Sklar, Sekar Kathiresan, Mark J Daly, Steven A McCarroll, Patrick F Sullivan, Aarno Palotie, Tonu Esko, Christina Hultman, Benjamin M Neale. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. doi: http://dx.doi.org/10.1101/050195.

Roadmap

Here is a rough list of features currently planned or under development:

  • generalized query language
  • better interoperability with other Hadoop projects
  • kinship estimation from GRM
  • LMM
  • burden tests, SKAT
  • logistic regression
  • posterior (PP)
  • LD pruning
  • TDT
  • Kaitlin Samocha's de novo caller