/GrandPrix

GrandPrix

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

GrandPrix

GrandPrix is a package for non-linear probabilistic dimension reduction algorithm in python, using TensorFlow and GPFlow. GrandPrix uses sparse variational approximation to project data to lower dimensional spaces. The model is described in the paper

"GrandPrix: Scaling up the Bayesian GPLVM for single-cell data.", Sumon Ahmed, Magnus Rattray and Alexis Boukouvalas, Bioinformatics, Volume 35, Issue 1, 01 January 2019, Pages 47–54.

To replicate the results in the paper please use the betaVersion branch. The master branch works with the latest version of GPflow.

N.B. The package contains several large data files which are needed to run the example notebooks. Please be sure that your system has Git Large File Storage (Git LFS) installed to download these large data files.

Installation

If you have any problems with installation see the script at the bottom of the page for a detailed setup guide from a new python environment.

  • Install tensorflow
pip install tensorflow
  • Install GPflow
git clone https://github.com/GPflow/GPflow.git
cd GPflow    
pip install .
cd

See GPFlow page for more detailed instructions.

  • Install GrandPrix package
git clone https://github.com/ManchesterBioinference/GrandPrix
cd GrandPrix
python setup.py install
cd

List of notebooks

To run the notebooks

cd GrandPrix/notebooks
jupyter notebook
File
name
Description
Windram Application of GrandPrix to microarray data, models with and without informative prior.
McDavid Application of GrandPrix to cell cycle data.
Shalek Application of GrandPrix to single-cell RNA_seq from mouse dentritic cells.
Droplet_DPT Application of GrandPrix to droplet based single-cell RNA_seq data.
Droplet_68K Application of GrandPrix to ~68k PBMCs, models optimising and fixing inducing variables.
Guo Application of extendend 2-D GrandPrix model to embryonic stem cells.
Analysing_posterior_variance Compare posterior distributions from GrandPrix with other models.

Running in a cluster

When running GrandPrix in a cluster it may be useful to constrain the number of cores used. To do this insert this code at the beginning of your script.

from gpflow import settings
settings.session.intra_op_parallelism_threads = NUMCORES
settings.session.inter_op_parallelism_threads = NUMCORES

Installing with a new environment

  • Create a new environment
conda create -n newEnv python=3.5
  • Activate the new environment
source activate newEnv
  • Create a new directory
mkdir newInstall
cd newInstall
  • Follow the regular installation process described above