Harmonizing QSAR machine learning-based models and docking approaches for the identification of novel HDAC2 inhibitors
by Dao Quang Tung, Do Thi Mai Dung, Nguyen Thanh Cong, Dao Ngoc Nam Hai, Daniel Baecker, Phan Thi Phuong Dung, Nguyen Hai Nam, Nguyen Ngoc An*
*Correspondence: ngocan@vnu.edu.vn (N.N.A)
This repository is the prove of our paper, which has been submitted for publication in Link.
Please use the data in https://github.com/Lelvels/qsar_ml_find_hdac2_inhibitor/blob/main/data_for_modeling/train_test_data/HDAC2_train_test_data_final.xlsx if you need further test
You will need a working Python environment to run the code. The recommended way to set up your environment is through the Anaconda Python distribution which provides the conda
package manager. Anaconda can be installed in your user directory and does not interfere with the system Python installation. The required dependencies are specified in the file *.yml
in the env
folder. We recommened to run the command below in Linux operating system terminal.
Run the following command in the repository folder (where env/*.yml
files is located) to create a separate environment and install all required dependencies in it:
conda env create -f my-rdkit-env.yml -n your_env_name
conda env create -f tmap-env.yml -n your_other_env_name
Then verify that the new environment was installed correctly:
conda env list
Our screening dataset was stored using PostgreSQL database, installation is availible in PostgreSQL official website. The screening dataset is available in this link (8GB after decompress). After imported the database, create a duplicate of file env/env.example
and rename it to .env
, then fill the database URL in the file.
DATABASE_URL = postgresql://<host_url>/<database_name>
- The source code are available in the src folder.
- The results of our work is in the results folder.
- The train, test and validation data are available in the data_for_modeling/train_test_data folder
- The screening dataset is available in the data_for_modeling/screening_dataset, if you want the raw data from database, they are avalible in this link (8GB after decompress).