This is the package of Yuanfang's winning algorithm in the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge
background: Drug Combination Prediction
see also: Yuanfang Guan's 1st Place Solution and Original Code
Please contact (gyuanfan@umich.edu or hyangl@umich.edu) if you have any questions or suggestions.
Git clone a copy of TAIJI:
git clone https://github.com/GuanLab/TAIJI.git
- python (3.6.5)
- numpy (1.13.3). It comes pre-packaged in Anaconda.
- scikit-learn (0.19.0) A popular machine learning package. It can be installed by:
pip install -U scikit-learn
To run TAIJI, the monotherapy data are required to extract features. Two types of files are needed:
(1) summary table (e.g. data/test_monotherapy.csv)
CELL_LINE | COMPOUND_A | COMPOUND_B | MAX_CONC_A | MAX_CONC_B | IC50_A | H_A | Einf_A |
---|---|---|---|---|---|---|---|
MFM-223 | ADAM17 | AKT | 3 | 1 | 0.226 | 1.9 | 17.47 |
UM-UC-3 | AKT | ATR_4 | 3 | 3 | 0.977 | 2.7 | 46.19 |
IC50_B | H_B | Einf_B | SYNERGY_SCORE | QA | COMBINATION_ID | FILENAME |
---|---|---|---|---|---|---|
0.562 | 1.2 | 70.23 | NA | 1 | ADAM17.AKT | ADAM17.AKT.MFM-223.Rep1.csv |
1.12 | 1.9 | 81.79 | NA | 1 | AKT.ATR_4 | AKT.ATR_4.UM-UC-3.Rep1.csv |
- CELL_LINE: Normalized cell line name.
- COMPOUND_A: Name of drug/compound A described by its target.
- COMPOUND_B: Name of drug/compound B described by its target.
- MAX_CONC_A: Maximum concentration (uM) of drug A.
- MAX_CONC_B: Maximum concentration (uM) of drug B.
- IC50_A: Concentration where half of the maximum kill is obtained with drug A.
- H_A: Slope of the dose-response curve for drug A.
- Einf_A: Maximum cells killed (percentage) with drug A.
- IC50_B: Concentration where half of the maximum kill is obtained with drug B.
- H_B: Slope of the dose-response curve for drug B.
- Einf_B: Maximum cells killed (percentage) with drug B.
- SYNERGY_SCORE: To be predicted
- QA: Quality assessment flag of combination assays (1: good, -1: bad)
- COMBINATION_ID: Name for drug A and drug B combination
- FILENAME: the filename of the corresponding monotherapy spreadsheets
(2) monotherapy spreadsheets (e.g. data/monotherapy_spread_csv/drug1.drug2.cell_line.Rep1.csv)
0 | 0.03 | 0.1 | 0.3 | 1 | 3 | (=Agent 2) | |
0 | 100 | 113.8 | 106.4 | 98.1 | 100.2 | 91.7 | |
0.01 | 104.5 | NA | NA | NA | NA | NA | |
0.03 | 108.5 | NA | NA | NA | NA | NA | |
0.1 | 73 | NA | NA | NA | NA | NA | |
0.3 | 34.5 | NA | NA | NA | NA | NA | |
1 | 18.8 | NA | NA | NA | NA | NA | |
(=Agent1) | |||||||
Agent1 | DRUG1 | ||||||
Agent2 | DRUG2 | ||||||
Unit1 | \muM | ||||||
Unit2 | \muM | ||||||
Title | CELL_LINE |
- the 1st row is the 5 concentrations (0.03, 0.1, 0.3, 1, 3) of drug A (Agent1)
- the 2nd row is the corresponding 5 responses (113.8, 106.4, 98.1, 100.2, 91.7) to drug A
- the 1st column is the 5 concentrations (0.01, 0.03, 0.1, 0.3, 1) of drug B (Agent2)
- the 2nd column is the corresponding 5 responses (104.5, 108.5, 73, 34.5, 18.8) to drug B
The example monotherapy data of 100 query drug combinations are provided. Of note, the monotherapy data were simulated and they are NOT real data.
Once you prepare and format the input monotherapy data, you can put them in the correponding data folder (and remove the example data). Then you can simply run TAIJI by
python TAIJI.py
TAIJI automatically generates a prediction file prediction.csv in a very short time. If you run it using the simulated example data of 100 drug combinations, it takes about 5mins.
You can also provide the input file/directory and output directory:
python TAIJI.py -i1 data/test_monotherapy.csv -i2 data/monotherapy_spread_csv/ -o ../
The argument i1 is the monotherapy summary table. The i2 is the directory of monotherapy spread sheets. The o is the output directory.
To run TAIJI with advanced options, you can check the descriptions by running:
python TAIJI.py -h
It returns:
-i1 INPUT1, --input1 INPUT1
Input summary table (example: data/test_monotherapy.csv)
-i2 INPUT2, --input2 INPUT2
Directory of the input monotherapy spread sheets (example: data/monotherapy_spread_csv/)
-w1 WEIGHT1, --weight1 WEIGHT1
Weight for the monotherapy component
-w2 WEIGHT2, --weight2 WEIGHT2
Weight for the cell line-drug component
-w3 WEIGHT3, --weight3 WEIGHT3
Weight for the molecular profile component
-o OUTPUTDIR, --outputdir OUTPUTDIR
Output directory
TAIJI has three prediction components based on different inputs: 1. monotherapy treatment effect 2. cell line-drug frequency 3. molecular profiles. The default weights of these three components are 1:1:2, which are based on Yuanfang's winning solution.
These weights can be adjusted based on the input data. For example, if the user believes that their monotherapy input data is less reliable (e.g. due to large noise), they can adjust these weights (e.g. from 1:1:2 to 0.8:2:4) by running TAIJI like this:
python TAIJI.py -w1 0.8 -w2 2 -w3 4