This repository documents the code used to generate the results for our PNAS article. The updated package, which is continuously being developed, can be found at this repository. Please submit an issue or email samsl@mit.edu with any questions.
python train_DTI.py --exp-id ExperimentName --config configs/default_config.yaml
src
: Python files containing protein and molecular featurizers, prediction architectures, and data loadingscripts
: Bash files to run benchmarking tasksCMD_BENCHMARK_DAVIS.sh
-- Run DTI classification benchmarks on DAVIS data set. Can be easily modified for other classification data setsCMD_BENCHMARK_TDC_DTI_DG.sh
-- Run benchmarks for TDC DTI-DG regression taskCMD_BENCHMARK_DUDE_CROSSTYPE.sh
-- Evaluate trained model on DUDe decoy performance for kinase and GPCR targetsCMD_BENCHMARK_DUDE_WITHINTYPE.sh
-- The same as above, but with half of kinase, gpcr, protease, and nuclear targets
models
: Pre-trained protein language modelsdataset
: Data sets to benchmark on, most are from MolTransDAVIS
BindingDB
BIOSNAP
DUDe
nb
: Jupyter notebooks for data generation and explorationtrain_DTI.py
-- Main training script to run DTI classification benchmarksDUDE_evaluate_decoys.py
-- Compare predictions of a trained model between a target and known true binders/decoys. Visualize embedding spaceDUDE_summarize_decoys.py
-- Given a directory of protein targets, summarize active/decoy discriminative performance by target type
- Described in our PNAS paper
- Previously appeared in NeurIPS MLSB 2021 and NeurIPS MLSB 2022, and on bioRxiv.