Code for Tensor Gaussian Process Regression for predicting drug combination synergy, developed for the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge 2015: https://www.synapse.org/#!Synapse:syn4231880/wiki/235645 More details of the method are available here: https://www.synapse.org/#!Synapse:syn5586109/wiki/394920 Required R packages: rstan, data.table, dplyr, doMC (or at least foreach), irlba (if you want to run the convex PCA component). Data required: - Both the Drug_synergy_data and Sanger_molecular_data directories must be in the main directory. ch1_LB.csv and ch2_LB.csv should be in the synergy data folder also. - CCLE expression data file: CCLE_Expression_Entrez_2012-09-29.gct available from http://www.broadinstitute.org/ccle/data/browseData?conversationPropagation=begin Key files: - molecular_data.R processes the raw cell line data and produces distance matrices. It addtionally performs imputation for GE for two cell lines from CCLE and for methylation. - mono_pca_all.R performs a nuclear-norm regularized PCA on the monotherapy data - train_gp.R This is run to produce results for all challenge tasks. Rscript train_gp.R <run> <setup> <sub> <cores> <usetissue> <max_its> With the full training data (we use all of Ch 1 and Ch 2 for both) the memory consumption is pretty high (like 40Gb per thread) so I only run two cores on the 120Gb machines we have. So Ch 1A was trained using: train_gp.R 1 final2 A 2 1 30 and predictions are made using train_gp.R 0 final2 A 1 1 30 Similary for Ch 1B (just change A -> B). For Ch 2, the training from Ch1 A can be used (pick the top likelihood seed), and then run sub2_predict.R. In principle this could be done using a Stan run, but in practice it's faster to do it manually.
davidaknowles/tensor_gp
Code for Tensor Gaussian Process Regression for predicting drug combination synergy, developed for the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge 2015: https://www.synapse.org/#!Synapse:syn4231880/wiki/235645
Stan