/TocoDecoy

Primary LanguagePythonMIT LicenseMIT

TocoDecoy: a new approach to design unbiased datasets for training and benchmarking machine-learning scoring functions

Models used in this study:

CRNN model for decoys generation:https://github.com/pcko1/Deep-Drug-Coder

IGN model used for MLSFs construction:https://github.com/zjujdj/InteractionGraphNet/tree/master

XGBoost model used for MLSFs construction:https://github.com/tqchen/xgboost

Datasets used in this study:

TocoDecoy:https://zenodo.org/record/5290011#.YSmecN--vVg

LIT-PCBA:http://drugdesign.unistra.fr/LIT-PCBA

Architecture of the codes:

1.dataset_generation:

1.1 molecular generation: generate decoys with similar physicochemical properties based on the 'seed' active ligands using the CRNN model.

1.2 postpreprocessing: select decoys with dissimilar topology structures to the active ligands and do the grid filter.

2.model_training:

2.1 IGN: scripts used for training and testing IGN

2.2 XGB: scripts used for training and testing XGBoost

3.utilities: scripts for various utilities