Binomial LSE performs fast latent subspace estimation (LSE) for binomial data (genotypes) by performing fast principal component analysis (PCA) of single nucleotide polymorphism (SNP) data. This implementation is based on the source code of FlashPCA and utilizes the Spectra and Eigen C++ libraries.
Latent subspace estimation described in Chen and Storey 2015 is a modification of PCA that accounts for heteroskedasticity. We implement a scalable, low memory implementation of LSE for binomial data. This specific implementation uses the iteratively restarted Arnoldi method implemented by FlashPCA using Spectra to perform SVD/PCA.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This code is based on contributions from the following sources:
- SparSNP, Copyright (C) 2011-2012 Gad Abraham.
- FlashPCA, Copyright (C) 2014-2020 Gad Abraham.
- National ICT Australia.
To get the latest version:
git clone git://github.com/alecmchiu/binomial_lse
On Linux:
- 64-bit OS
- g++ compiler
- Eigen, v3.2 or higher
(if you get a compile error
error: no match for 'operator/' in '1 / ((Eigen::MatrixBase...
you'll need a more recent Eigen) - Spectra
- Boost, specifically boost_program_options/boost_program_options-mt.
- libgomp for openmp support
On Mac:
- Homebrew to install boost
- Eigen, as above
- Spectra, as above
- clang C++ compiler
The Makefile contains three variables that need to be set according to where you have installed the Eigen headers and Boost headers and libraries on your system. The default values for these are:
EIGEN_INC=/usr/local/include/eigen
BOOST_INC=/usr/local/include/boost
BOOST_LIB=/usr/local/lib
SPECTRA_INC=spectra
If your system has these libraries and header files in those locations, you can simply run make:
cd binomial_lse
make all
If not, you can override their values on the make command line. For example,
if you have the Eigen source in /opt/eigen-3.2.5
, spectra headers in
/opt/spectra
, and Boost 1.59.0 installed into /opt/boost-1.59.0
, you could run:
cd binomial_lse
make all EIGEN_INC=/opt/eigen-3.2.5 \
BOOST_INC=/opt/boost-1.59.0/include \
BOOST_LIB=/opt/boost-1.59.0/lib \
SPECTRA_INC=/opt/spectra
By default, binomial LSE produces the following files:
eigenvectors.txt
: the top k eigenvectors of the covariance matrix after adjustment for heteroskedasticity X XT / p - D. This is the file containing the subspace of interest.pcs.txt
: the top k principal components (the projection of the data on the eigenvectors, scaled by the eigenvalueseigenvalues.txt
: the top k eigenvalues of X XT / p - D.pve.txt
: the proportion of total variance explained by each of the top k eigenvectors (the total variance is given by the trace of the covariance matrix X XT / p - D, which is the same as the sum of all eigenvalues).