Hierarchical-QSAR-Modeling

Implementation of Hierarchical H-QSAR Modeling Approach for Integrating Binary/Multi Classification and Regression Models of Acute Oral Systemic Toxicity

The above figure shows the overall workflow for building hierarchical QSAR models. Base regression, binary and multiclass models (60 models in total) are built with diverse combinations of machine learning algorithms and chemical descriptors/fingerprints. Out-of-Fold Predictions of base models are generated through 10-fold cross-validation. The out-of-fold predictions are concatenated together and used as input (Meta Features) for building hierarchical regression, binary and multiclass models.

Data

The rat acute oral toxicity data used in this study were collected by the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) and the U.S. EPA National Center for Computational Toxicology (NCCT) from a number of public available datasets and resources. The full description and the actual dataset is available at here. The whole dataset, comprising 11,992 compounds, was semi-randomly split into a training set (75%) and an external test set (25%) with equivalent coverage with respect to LD50 distribution by the organizers of the project.

The curated training and test data are avaiable in train_test_sets folder.

Code

The code is provides as Jupyter notebook in the notebooks folder. All code was developed in a Ubuntu 18.04 workstation.

  1. Prepare labels for modeling: labels.ipynb.

Base models

  1. Compute chemical descriptors/fingerprints for base models: descriptors.ipynb.
  2. Descriptor Selection: descriptors_selections.ipynb.
  3. Hyperparameter tuning of base models: Base_models_selection.ipynb.
  4. Building base models with optimal hyperparameters: Base_models.ipynb.

Hierarchial Models

  1. Meta features: Hierarchical_features.ipynb.
  2. Hyperparameter tuning of hierarchical models: Hierarchical_models_selection.ipynb.
  3. Build hierarchial models with optimal hyperparameters: Hierarchical_models.ipynb.

Model Evaluation

  1. Evaluate cross-validation and test set performance: Model_evaluation.ipynb.

Results

GUI

Installation

conda env create -f hqsar_env.yml
pip install streamlit

Run the GUI:

streamlit run GUI.py