Chemosensitivity Project Scripts and Files
GENE EXPRESSION FILES SHOULD BE DOWNLOADED FROM LINKS BELOW
There are plenty of projects that predict sensitivity on a small scale, using only one drug and/or one cancer type. We sought to go as big as possible and build pan-cancer models to predict sensitivity for as many chemotherapy drugs as possible. One factor we considered was that it had to be able to be sequenced fairly inexpensively. The standard here would be Nanostring, which limits us to ~800 genes.
The flowchart for the study design can be found here.
- Briefly, we sought to use only publicly available gene expression data to build our predictive models. The best sources of gene expression data paired with sensitivity data are the Genomics of Drug Sensitivity in Cancer database and the Cancer Cell Line Encyclopedia.
- Generalized linear models (GLMs) were developed per drug and tested against testing data. Figure
- Finalized pan-cancer models were then applied to withheld testing data and AUC was measured.
- Further, pan-cancer models were tested against specific tissue types in the testing data to see how well pan-cancer models predicted sensitivity to individual tissue types.
- Finally, drug models were tested on human tumor datasets from The Cancer Genome Atlas.
- Pan-cancer models were tested against available TCGA class/drug combinations where n >= 10. Sensitivity measure used was RFS.
- Recurrence-free survival curves split based on predicted labels were generated and measured for significance by multivariate Cox regression analysis, adjusting for age, sex, and tumor stage. An example can be found here