/regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.

Primary LanguageC++OtherNOASSERTION

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.

It is developed and supported by a team of scientists at the Regeneron Genetics Center.

The method has the following properties

  • It works on quantitative and binary traits, including binary traits with unbalanced case-control ratios
  • It can process multiple phenotypes at once
  • It is fast and memory efficient 🔥
  • For binary traits it supports Firth logistic regression and an SPA test
  • It can perform gene/region-based burden tests
  • It supports the BGEN, PLINK bed/bim/fam and PLINK2 pgen/pvar/psam genetic data formats
  • It is ideally suited for implementation in Apache Spark (see GLOW)
  • It can be installed with Conda Regenie

Full documentation for the regenie can be found here.

Citation

Mbatchou, J., Barnard, L., Backman, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet (2021).

You can access the paper here.

License

regenie is distributed under an MIT license.

Contact

If you have any questions about regenie please contact

If you want to submit a issue concerning the software please do so using the regenie Github repository.

Version history

Version 2.0.2 (Bug fix for burden testing with BGEN files not in v1.2 with 8-bit encoding; enabled faster step 2 implementation with Zstd compressed BGEN files in v1.2 with 8-bit encoding)

Version 2.0.1 (New option --catCovList to specify categorical covariates; Enabled parameter expansion when specifying select phenotypes/covariates to analyze [e.g. 'PC{1:10}'])

Version 2.0 (Added burden testing functionality for region or gene-based tests [see website for details]; added sample size column in summary stats output).

For past releases, see here.