AstraZeneca/chemicalx

Create data loader

benedekrozemberczki opened this issue · 2 comments

  • Load drug side features.
  • Load triples.
  • Document.
  • Test generation.

I have 5 papers with datasets that would be worth looking into suggested by @debplana:

Mathews Griner LA, Guha R, Shinn P, Young RM, Keller JM, Liu D, Goldlust IS, Yasgar A, McKnight C, Boxer MB, Duveau DY, Jiang JK, Michael S, Mierzwa T, Huang W, Walsh MJ, Mott BT, Patel P, Leister W, Maloney DJ, et al. 2014. High-throughput combinatorial screening identifies drugs that cooperate with ibrutinib to kill activated B-cell-like diffuse large B-cell lymphoma cells. PNAS 111:2349–2354. DOI: https://doi.org/10.1073/pnas.1311846111, PMID: 24469833

This dataset only has a handful of drug synergy pairs. Could be manually curated to be used for evalution, but not enough for training.

O’Neil J, Benita Y, Feldman I, Chenard M, Roberts B, Liu Y, Li J, Kral A, Lejnine S, Loboda A, Arthur W. An unbiased oncology compound screen to identify novel combination strategies. Molecular Cancer Therapeutics. 2016;15(6):1155–1162. https://doi.org/10.1158/1535-7163.MCT-15-0843

This is the OncoPolyPharmacology in TDC.

Borisy AA, Elliott PJ, Hurst NW, Lee MS, Lehar J, Price ER, Serbedzija G, Zimmermann GR, Foley MA, Stockwell BR, Keith CT. 2003. Systematic discovery of multicomponent therapeutics. PNAS 100:7977–7982. DOI: https://doi.org/10.1073/pnas.1337088100, PMID: 12799470

Could not find supplementary information

DREAM Challenge

Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R, Chen B, Kim M, Wang T, Heiser LM, Realubit R, Mattioli M, Alvarez MJ, Shen Y, Gallahan D, Singer D, et al. 2014. A community computational challenge to predict the activity of pairs of compounds. Nature Biotechnology 32:1213–1222. DOI: https://doi.org/10.1038/nbt.3052, PMID: 25419740

It appears the website linked by this paper, http://www.the-dream-project.org/challenges/nci-dream-drug-sensitivity-prediction-challenge, is down.

AstraZeneca-Sanger Drug Combination DREAM Consortium, Menden MP, Wang D, Mason MJ, Szalai B, Bulusu KC, Guan Y, Yu T, Kang J, Jeon M, Wolfinger R, Nguyen T, Zaslavskiy M, Jang IS, Ghazoui Z, Ahsen ME, Vogel R, Neto EC, Norman T, Tang EKY, Garnett MJ, et al. 2019. Community assessment to advance computational prediction of Cancer drug combinations in a pharmacogenomic screen. Nature Communications 10:2674. DOI: https://doi.org/10.1038/s41467-019-09799-2, PMID: 3120923

Therapeutic Data Commons


After chatting with Deb, it's clear we should be really careful to make sure we only compare within cell lines. This also opens us up to doing an interesting evaluation where you train on data from one cell line and test on another. Second important thing is we need to be careful of is concentration. Last is we need to also provide some meaningful baselines , because it's a good bet the ML people are way off base compared to what's actually useful in the field

Added some basic loader for two datasets - I will close for now, but these comments are extremely good - the person who worked on the AZ sanger dataset (Krishna Bulusu) works with us closely.