A systematic assessment of deep learning methods for drug response prediction: from in-vitro to clinical application

│  README.md
│
├─dependencies # Environment required
│      crdnn_env.yml
│      drugcell_env.yml
│      moli_env.yml
│      paccmann_env.yml
│      tgsa_env.yml
│      vaen_env.yml
│      vaen_r_env.yml
│
├─hyper_grid # Hyperparameters search grid
│      hypergrid_crdnn.json
│      hypergrid_drugcell.json
│      hypergrid_moli.json
│      hypergrid_paccamnn.json
│      hypergrid_tgsa.json
│      hypergrid_vaen.json
│
└─utils # Utility scripts
        benchmark_metrics.py # Implement of benchmark metrics
        GDSCrel82_BinaryIC50.R # Script for binarization of IC50s
        generate_NWPC_weights.py # Generate NWPC weights
        smiles2graph.py # Generate molecular graphs from SMILES strings

Methods for assessments

The environments and hyperparameter grids used in this study are available on \dependencies and \hyper_grid.

	Paradigm	Tumor feature	Drug feature	Repository
DrugCell¹	MDL	M	Morgan FP	https://github.com/idekerlab/DrugCell
PaccMann²	MDL	E^a	SMILES	https://github.com/PaccMann/paccmann_predictor
TGSA³	MDL	E^a, M, C	Molecular graph	https://github.com/violet-sto/TGSA
CRDNN⁴	SDL	E^a	-	https://github.com/TeoSakel/deep-drug-response
VAEN⁵	SDL	E^b	-	https://github.com/bsml320/VAEN
MOLI⁶	SDL	E^a, M, C	-	https://github.com/hosseinshn/MOLI

MDL: Multi Drug Learning; SDL: Single Drug Learning
E: expression profiles; M: mutation status; C: Copy number variation; FP: Fingerprint; SMILES: Simplified Molecular-Input Line-Entry System
a: z-score standardization
b: rank normalization

Dataset

GDSC
- Cell line genomic data were downloaded from Cell Model Passports
- Drug response data were downloaded from GDSC bulk data dowload (release-8.2)
CCLE
- Cell line genomic data and drug response data were downloaded from DepMap portal (2021Q4)
TCGA
- Clinical response were curated by Ding et al.⁷ and downloaded from it portal.
- Cell line genomic data were downloaded from GDC client.
Bortezomib
- Patients with relapsed myeloma enrolled in phase 2 and phase 3 clinical trials of bortezomib.⁸
- Gene expression data were downloaded from GEO:GSE9782.

The curated datasets are available on https://zenodo.org/record/7060305#.YxnOEHZByUn

Benchmark Metrics

The nine metrics are adapted from Chen et al.⁹ and the implement is available on utils\benchmark_metrics.py

Metrics	Range	Based on	Formula
Root-mean-square error (RMSE)	R+	Value	$$RMSE(\boldsymbol{y,\hat{y}}) = \sqrt{\frac{\sum_{i}(y_i-\hat{y}_i)^2}{n}}$$
L-RMSE, R-RMSE, LR-RMSE	R+	Value
Pearson Correlation Coefficient (PCC)	[-1,1]	Value	$$PCC(\boldsymbol{y,\hat{y}})=\frac{\sum_i(\hat{y}_i - \mu(\boldsymbol{\hat{y}}))(y_i - \mu(\boldsymbol{y}))}{\sqrt{\sum_i(\hat{y}_i - \mu(\boldsymbol{\hat{y}}))^2}\sqrt{\sum_i(y_i - \mu(\boldsymbol{y}))^2}}$$
Spearman Correlation Coefficient (SCC)	[-1,1]	Rank	$$SCC(\boldsymbol{y,\hat{y}})=PCC(\boldsymbol{r(y),r(\hat{y})})$$
Normalized Discounted Cumulative Gain (NDCG)	[0,1]	Rank	$$DCG(\boldsymbol{y,\hat{y}}) = \sum_{i=1}^{n}\frac{2^{-y_i}}{\log_2 (r(\hat{y}_i)+1 )}$$ $$NDCG(\boldsymbol{y,\hat{y}}) = \frac{DCG(\boldsymbol{y,\hat{y}})}{DCG(\boldsymbol{y,y})}$$
Probabilistic C-index (PC), Normalized Weighted Probabilistic C-index (NWPC)	[0,1]	Rank	$$erf(a) = \frac{2}{\sqrt{\pi}}\int_0^a e^{-t^2} dt$$ $$WPC(M) = \frac{\sum_{d} w_d \cdot PC_d}{\sum_{d} w_d}$$ $$NWPC = \frac{WPC - WPC_{min}}{WPC_{max} - WPC_{min}}$$
ROC-AUC	[0,1]	Value	ROC-AUC provides an aggregate measure of performance across all possible classification thresholds

$y$: the observed response values; $\hat{y}$: the predicted response values; $n$: the number of samples; $r(\hat{y}_i)$ is the position of $\hat{y}_i$ on the sorted $\hat{y}$ in ascending order.

Pipeline

Assessment on cell line data

1. Data Preparation

Transcriptome profiles: GDSC_EXP.csv
- For PaccMann, TGSA, CRDNN and MOLI, gene expression matrix was z-score standardization.
- For VAEN, gene expression matrix was rank normalization.
Mutation profiles: GDSC_MUT.csv
Copy number profiles: GDSC_CNV.csv

Chemical representations:

SMILES: GDSC_drugannot.txt CanonicalSMILESrdkit

Morgan Fingerprints:

from rdkit.Chem import AllChem
mfp_ls =
[
    list(
        AllChem.GetMorganFingerprintAsBitVect(
            Chem.MolFromSmiles(smiles),
            2,
            nBits=2048
        )
    ) for smiles in mfp['CanonicalSMILESrdkit']
]

Molecular Graph:

from utils import smiles2graph
smiles2graph.save_drug_graph(
    'GDSC_drugannot.txt',
    'GDSC'
)

Drug Response: GDSC_DR.txt
- The script for binarization of IC50s: utils\GDSCrel82_BinaryIC50.R.

2. Model Construction

Multi Drug Learning (MDL)

Input:

$D$, a set of labeled tuples;
$t$, the number of repeats (default: 5);
$k$, the number of folds for cross-validation (default: 5);
$p$, the number of hyperparameters combination (default: 30);
a MDL model.

for i = 1 to t do
	stratify D on the tissue origin into k folds, $D = (D_1,...,D_k)$
	for j = 1 to k do
		take the $D_j$ as test set
		take the $D_{(j+1) \% k}$ as validation set
		take the remaining folds as training set
		apply max-min transformation of IC50s on training, validation and test set
		for n = 1 to p do
			select a random combination from hyperparameter grid
			use training set to derive a model, $M_n$
			use validation set to evaluate $M_n$
			retain the evaluation score, Pearson correlation coefficient
		done
		select the combination of hyperparameter with the highest evaluation score
		use training and validation set to fit the model
		return the predicted results on test set
	done
done

Single Drug Learning (SDL)

Input:

$D$, a set of labeled tuples;
$t$, the number of repeats (default: 5);
$k$, the number of folds for cross-validation (default: 5);
$d$, the number of drugs in the dataset;
$p$, the number of hyperparameters combination (default: 30);
a SDL model.

for i = 1 to t do
	stratify D on the tissue origin into k folds, $D = (D_1,...,D_k)$
	for j = 1 to k do
		take the $D_j$ as test set
		take the $D_{(j+1) \% k}$ as validation set
		take the remaining folds as training set
		apply max-min transformation of IC50s on training, validation and test set
		for m = 1 to d do
			sample test set with drug m to obtain drug specific test set
			sample validation set with drug m to obtain drug specific validation set
			sample training set with drug m to obtain drug specific training set
			for n = 1 to p do
                select a random combination from hyperparameter grid
                use training set to derive a model, $M_n$
                use validation set to evaluate $M_n$
                retain the evaluation score, Pearson correlation coefficient
            done
            select the combination of hyperparameter with the highest evaluation score
            use training and validation set to fit the model
            return the predicted results on drug specific test set
         done
	done
done

3. Model Assessment

Concatenate each fold to generate the list for assessment in the format of ['drug','true', 'predict','binary_response','repeat_fold','cell_line','repeat']
Overall performance

import pandas as pd
from utils import benchmark_metrics as bcm
df # result dataframe in the form of ['drug','true', 'predict','binary_response','repeat_fold','cell_line','repeat']
overall_performance = pd.DataFrame(
    {
    'PCC': bcm.pearson_corr(df[['repeat_fold','true', 'predict']]), 
    'SCC': bcm.spearman_corr(df[['repeat_fold','true', 'predict']]), 
    'RMSE': bcm.rmse(df[['repeat_fold','true', 'predict']]), 
    'L_RMSE': bcm.l_rmse(df[['repeat_fold','true', 'predict']]), 
    'R_RMSE': bcm.r_rmse(df[['repeat_fold','true', 'predict']]), 
    'LR_RMSE':bcm.lr_rmse(df[['repeat_fold','true', 'predict']]), 
    'NDCG':bcm.ndcg(df[['repeat_fold','true', 'predict']]),
    'NWPC': bcm.norm_wpCIdx(df[['repeat_fold','true', 'predict']])
    }
)

Performance on single-drug level

import pandas as pd
from utils import benchmark_metrics as bcm
df # result dataframe in the form of ['drug','true', 'predict','binary_response','repeat_fold','cell_line','repeat']
singledrug_performance = {
    'PCC': df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.pearson_corr), 
    'SCC': df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.spearman_corr), 
    'RMSE': df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.rmse), 
    'L_RMSE': df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.l_rmse), 
    'R_RMSE': df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.r_rmse), 
    'LR_RMSE':df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.lr_rmse), 
    'ROC_AUC':df.groupby('repeat')[['drug','binary_response', 'predict']].apply(bcm.roc_auc),
    'NDCG':df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.ndcg),
    'PC': df.groupby('repeat')[['drug','true', 'predict']].apply(bcm.pCIdx)
}

Assessment on clinical data

1. Data Preparation

1.1 Cell line data

Transcriptome profiles: GDSC_EXP.csv
- For PaccMann, TGSA, CRDNN and MOLI, gene expression matrix was z-score standardization.
- For VAEN, gene expression matrix was rank normalization.
Mutation profiles: GDSC_MUT.csv
Copy number profiles: GDSC_CNV.csv

Chemical representations:

SMILES: GDSC_drugannot.txt CanonicalSMILESrdkit

Morgan Fingerprints:

from rdkit.Chem import AllChem
mfp_ls =
[
    list(
        AllChem.GetMorganFingerprintAsBitVect(
            Chem.MolFromSmiles(smiles),
            2,
            nBits=2048
        )
    ) for smiles in mfp['CanonicalSMILESrdkit']
]

Molecular Graph:

from utils import smiles2graph
smiles2graph.save_drug_graph(
    'GDSC_drugannot.txt',
    'GDSC'
)

Drug Response: GDSC_DR.txt
- The script for binarization of IC50s: utils\GDSCrel82_BinaryIC50.R.

1.2 Patient data

Transcriptome profiles: TCGA_EXP.csv
- For PaccMann, TGSA, CRDNN and MOLI, gene expression matrix was z-score standardization.
- For VAEN, gene expression matrix was rank normalization.
- ComBat ¹⁰ was used to adjust expression profiles of patients with the GDSC cell line dataset.
Mutation profiles: TCGA_MUT.csv
Copy number profiles: TCGA_CNV.csv

Chemical representations:

SMILES: TCGA_drugannot.txt CanonicalSMILESrdkit

Morgan Fingerprints:

from rdkit.Chem import AllChem
mfp_ls =
[
    list(
        AllChem.GetMorganFingerprintAsBitVect(
            Chem.MolFromSmiles(smiles),
            2,
            nBits=2048
        )
    ) for smiles in mfp['CanonicalSMILESrdkit']
]

Molecular Graph:

from utils import smiles2graph
smiles2graph.save_drug_graph(
    'TCGA_drugannot.txt',
    'TCGA'
)

Drug Response: TCGA_DR.txt

2. Model Construction

Multi Drug Learning (MDL)

Input:

$D$, a set of labeled tuples;
$h$, a set of hyperparameters with the highest evaluation score;
a MDL model.

use D, h to fit the model M 
return the predicted results on patient data

Single Drug Learning (SDL)

Input:

$D$, a set of labeled tuples;
$h$, sets of hyperparameters with the highest evaluation score for each drug
$d$, the number of drugs in the dataset;
a SDL model.

for m = 1 to d do
	sample D to obtain drug specific dataset, $D_m$
	use $D_m$, $h_m$ to fit the model $M_m$
	return the predicted results on patient data
done

3. Model Assessment

Concatenate the result of each drug to generate the list for assessment in the format of ['drug', 'response_RS', 'response_01', 'predict']
Performance

import pandas as pd
from utils import benchmark_metrics as bcm
df # result dataframe in the form of ['drug', 'response_RS', 'response_01', 'predict']
singledrug_performance = {
    'pval': df.groupby('drug')[['response_RS', 'predict']].apply(bcm.wilcox_pval), 
    'effectsize': df.groupby('drug')[['response_RS', 'predict']].apply(bcm.effect_size), 
    'roc_auc':bcm.roc_auc(df[['drug','response_01','predict']])
}

Reference:

Kuenzi, B. M., Park, J., Fong, S. H., Sanchez, K. S., Lee, J., Kreisberg, J. F., Ma, J., & Ideker, T. (2020). Predicting Drug Response and Synergy Using a Deep Learning Model of Human Cancer Cells. Cancer Cell, 38(5), 672-684.e676. https://doi.org/10.1016/j.ccell.2020.09.014 ↩
Manica, M., Oskooei, A., Born, J., Subramanian, V., Saez-Rodriguez, J., & Rodriguez Martinez, M. (2019). Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders. Mol Pharm, 16(12), 4797-4806. https://doi.org/10.1021/acs.molpharmaceut.9b00520 ↩
Zhu, Y., Ouyang, Z., Chen, W., Feng, R., Chen, D. Z., Cao, J., & Wu, J. (2021). TGSA: Protein-Protein Association-Based Twin Graph Neural Networks for Drug Response Prediction with Similarity Augmentation. Bioinformatics. https://doi.org/10.1093/bioinformatics/btab650 ↩
Sakellaropoulos, T., Vougas, K., Narang, S., Koinis, F., Kotsinas, A., Polyzos, A., Moss, T. J., Piha-Paul, S., Zhou, H., Kardala, E., Damianidou, E., Alexopoulos, L. G., Aifantis, I., Townsend, P. A., Panayiotidis, M. I., Sfikakis, P., Bartek, J., Fitzgerald, R. C., Thanos, D., Mills Shaw, K. R., Petty, R., Tsirigos, A., & Gorgoulis, V. G. (2019). A Deep Learning Framework for Predicting Response to Therapy in Cancer. Cell Rep, 29(11), 3367-3373 e3364. https://doi.org/10.1016/j.celrep.2019.11.017 ↩
Jia, P., Hu, R., Pei, G., Dai, Y., Wang, Y. Y., & Zhao, Z. (2021). Deep generative neural network for accurate drug response imputation. Nat Commun, 12(1), 1740. https://doi.org/10.1038/s41467-021-21997-5 ↩
Sharifi-Noghabi, H., Zolotareva, O., Collins, C. C., & Ester, M. (2019). MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics, 35(14), i501-i509. https://doi.org/10.1093/bioinformatics/btz318 ↩
Ding, Z., Zu, S., & Gu, J. (2016). Evaluating the molecule-based prediction of clinical drug responses in cancer. Bioinformatics, 32(19), 2891-2895. https://doi.org/10.1093/bioinformatics/btw344 ↩
Mulligan, G., Mitsiades, C., Bryant, B., Zhan, F., Chng, W. J., Roels, S., Koenig, E., Fergus, A., Huang, Y., Richardson, P., Trepicchio, W. L., Broyl, A., Sonneveld, P., Shaughnessy, J. D., Jr., Bergsagel, P. L., Schenkein, D., Esseltine, D. L., Boral, A., & Anderson, K. C. (2007). Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. Blood, 109(8), 3177-3188. https://doi.org/10.1182/blood-2006-09-044974 ↩
Chen, J., & Zhang, L. (2020). A survey and systematic assessment of computational methods for drug response prediction. Brief Bioinform. https://doi.org/10.1093/bib/bbz164 ↩
Johnson, W. E., Li, C., & Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118-127. ↩

wuys13/Suppl-data-Benchmark

Methods for assessments

Dataset

Benchmark Metrics

Pipeline

Assessment on cell line data

1. Data Preparation

2. Model Construction

Multi Drug Learning (MDL)

Single Drug Learning (SDL)

3. Model Assessment

Assessment on clinical data

1. Data Preparation

1.1 Cell line data

1.2 Patient data

2. Model Construction

Multi Drug Learning (MDL)

Single Drug Learning (SDL)

3. Model Assessment

Footnotes