/MRCL

R code for Manifold Regularized Causal Learning (MRCL) and scripts to run the analysis presented in Hill*, Oates*, Blythe & Mukherjee (2019).

Primary LanguageRGNU General Public License v3.0GPL-3.0

Manifold Regularized Causal Learning (MRCL)

R code for Manifold Regularized Causal Learning (MRCL) and scripts to run the analysis presented in:
Hill*, Oates*, Blythe & Mukherjee* (2019). Causal Learning via Manifold Regularization. Journal of Machine Learning Research 20(127):1−32.
*Equal contributions

MRCL source code

The file mrcl.R in the code directory contains the source code for running MRCL.
Functions are documented within this file.
This code was co-authored with Umberto Noè.

Required installs:
install.packages('kernlab')

Manuscript analysis scripts

R scripts and functions to reproduce the results in Section 3 of the manuscript are provided in the code directory.

To run the analysis scripts, the working directory in R must contain the code directory and the data directory.

Details of the origin of the three data files in the data directory are provided below.

The output directory contains files generated by running the scripts (processed data files, results files and plots in respective directories).
Only a subset of the results files generated by the scripts are provided as part of this repository - just those files required to generate the figures in the manuscript.

Required installs:
install.packages('pROC', 'glmnet', 'caret', 'pcalg', 'dplyr', 
                 'kernlab', 'doParallel', 'doRNG', 'ggplot2', 'RColorBrewer')
Dataset D1: Yeast Gene Expression (Section 3.2 of manuscript)
Data

The yeast gene expression knockout data is due to Kemmeren et al. (2014) and is available to download at The Deleteome webpage. We use the file Kemmeren.RData on the Downloads/causal inference subpage.
In the file experimentFunctions.R, there is a function importYeastData that downloads Kemmeren.RData, extracts the required data and saves the resulting dataset in the data directory as file yeastData.RData.

Analysis

There are three yeast data experiments. To run each of these:

source("code/runMainExperimentsYeastTCPA.R")

with the variable experiment on line 5 of runMainExperimentsYeastTCPA.R set to either "yeast_random", "yeast_row-wise" or "yeast_row-wise_GIES". These correspond to the results shown in Figures 1, 2 and 3 in the manuscript respectively. The data used to generate these figures can be found in the files output/yeast_random/results/collatedResults.RData, output/yeast_row-wise/results/collatedResults.RData and output/yeast_row-wise_GIES/results/collatedResults.RData respectively.

Dataset D2: Cancer Cell Line Protein Time-Course Data (Section 3.3 of manuscript)
Data

The cancer cell line protein data is due to Hill et al. (2017) and is available to download from this GitHub page. In the file experimentFunctions.R, there is a function importCellLineData that downloads the required files, extracts the required data and saves the resulting data in the data directory as file cellLineData.RData. Running this function requires the R package R.matlab.

Analysis

To run the cell line data analysis corresponding to the results shown in Figure 4 in the manuscript:

source("code/runMainExperimentsCellLine.R")

The data used to generate Figure 4 can be found in the file output/cellLine/results/collatedResults.RData.

Dataset D3: Human Cancer Data (Section 3.4 of manuscript)
Data

These data are part of the The Cancer Genome Atlas (TCGA) and are presented in Akbani et al. (2014). The dataset used for our analysis can be found in the file data/TCGA-PANCAN19-L4-BRCA-35phosphoSubset.RData. This was a subset of the data available at The Cancer Proteome Atlas (TCPA) Portal. In particular, we used a subset of the pan-cancer 19 level 4 data (portal data release version 4.0; note that the portal has now gone beyond this data release and, at the time of writing, only the most recent data release is available to download).

Analysis

To run the TCPA data analysis corresponding to the results shown in Figure 5 in the manuscript:

source("code/runMainExperimentsYeastTCPA.R")

with the variable experiment on line 5 of runMainExperimentsYeastTCPA.R set to "TCPA". The data used to generate Figure 5 can be found in the file output/TCPA/results/collatedResults.RData

Generation of manuscript figures

All the figures in the manuscript can be generated as follows:

source("code/generateFigures.R")

Figures are saved in the output directory. For example, for Dataset D3 (TCPA), figures can be found in output/TCPA/plots.