/discover-notebooks

Jupyter notebooks for reproducing the analyses in the paper 'A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence'

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

DISCOVER supplementary notebooks

The Jupyter notebooks below contain all the code required to reproduce the figures and results of the paper A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence.

To work with these files, Jupyter, IPython, and several Python packages should be installed. The easiest way to install these dependencies is by using Miniconda or Anaconda. The following command creates a conda environment that contains all required packages to execute the notebooks.

conda create -n discover-notebooks -c http://ccb.nki.nl/software/discover/repos/conda \
    corclust==0.1 \
    discover==0.9 \
    matplotlib==1.5.1 \
    networkx==1.11 \
    numpy==1.10.4 \
    pandas==0.17.1 \
    pytables==3.2.2 \
    scipy==0.17.0 \
    statsmodels==0.6.1 \
    notebook \
    ipykernel

Only for the notebook named Group test a few more packages need to be installed using the following command.

conda install -n discover-notebooks -c http://ccb.nki.nl/software/discover/repos/conda -c r -c msys2 \
    switching==0.1 \
    ccomet-with-timeout==1.0.2 \
    rpy2 \
    ipyparallel

Next, activate the created environment and start the Jupyter notebook using the following two commands. Make sure <notebook-dir> is replaced by the location of the .ipynb files after unzipping the downloaded file.

source activate discover-notebooks
jupyter notebook --notebook-dir=<notebook-dir>

On Windows, the first command should be replaced by:

activate discover-notebooks

Simulated data analyses

  • Pairwise analyses of simulated data

    Compares the Binomial, Fisher's exact and DISCOVER tests on simulated data.

  • Group test

    Compares the DISCOVER group test to six alternative methods (CoMEt, MEGSA, MEMo, muex, mutex, and TiMEx) on simulated data.

Pan-cancer analyses

  • Download PanCan12 data

    Downloads the mutation and copy number data for the TCGA PANCAN12 studies.

  • Gene selection

    Selects the genes for use in the pairwise analyses.

  • Pairwise analysis

    Performs pairwise co-occurrence and mutual exclusivity analyses.

  • Within-chromosome co-occurrence analysis

    Tests for co-occurrences between genes located on the same chromosome, in order to assess whether the DISCOVER test will detect these 'positive controls'.

  • STRING enrichment

    Determines the overlap of mutually exclusive gene pairs with the STRING functional interaction network.

  • MSigDb group tests

    Identifies significantly mutually exclusive gene sets based on predefined gene sets extracted from MSigDb.

  • De novo gene set identification

    Detects de novo mutually exclusive gene sets based on correlation clustering of pairwise mutual exclusivities.