mims-harvard/TDC

New DrugComb data

TangYiChing opened this issue · 4 comments

Describe the problem
The DrugComb database has released new drug combination and monotherapy screening datasets, which includes cancer, malaria, and COVID-19.
Reference: [https://doi.org/10.1093/nar/gkab438]

Describe the solution you'd like
Replace current TDC/data/drugcomb.pkl with the new file at (https://drugcomb.org/download/), and add new columns ['Study name', 'Disease'] to distinguish cancer, malaria, or COVID-19.

Additional context
N/A.

Thank you! It would be a great idea! Would you like to make a PR for it?

Thank you! It would be a great idea! Would you like to make a PR for it?

DrubComb provides API for quick access to both drug and cell line information. They already have SMILE strings and cell line ids. In terms of adding a new drug-drug-cell line triplet to the current TDC dataset, what needs to be added now is the gene expression values from the CallMiner database. What would you like me to do to facilitate the process?

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

Yes, these are commonly used sources nowadays, and they are all RNA-seq data now (i.e., expression values are TPM). We might need a new workflow for data processing.