Implementation of Hierarchical H-QSAR Modeling Approach for Integrating Binary/Multi Classification and Regression Models of Acute Oral Systemic Toxicity
The above figure shows the overall workflow for building hierarchical QSAR models
. Base regression, binary and multiclass models (60 models in total) are built with diverse combinations of machine learning algorithms and chemical descriptors/fingerprints. Out-of-Fold Predictions of base models are generated through 10-fold cross-validation. The out-of-fold predictions are concatenated together and used as input (Meta Features) for building hierarchical regression, binary and multiclass models.
The rat acute oral toxicity data used in this study were collected by the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) and the U.S. EPA National Center for Computational Toxicology (NCCT) from a number of public available datasets and resources. The full description and the actual dataset is available at here. The whole dataset, comprising 11,992 compounds, was semi-randomly split into a training set (75%) and an external test set (25%) with equivalent coverage with respect to LD50 distribution by the organizers of the project.
The curated training and test data are avaiable in train_test_sets folder.
The code is provides as Jupyter notebook
in the notebooks
folder. All code was developed in a Ubuntu 18.04 workstation.
- Prepare
labels
for modeling: labels.ipynb.
- Compute chemical descriptors/fingerprints for base models: descriptors.ipynb.
- Descriptor Selection: descriptors_selections.ipynb.
- Hyperparameter tuning of base models: Base_models_selection.ipynb.
- Building base models with optimal hyperparameters: Base_models.ipynb.
- Meta features: Hierarchical_features.ipynb.
- Hyperparameter tuning of hierarchical models: Hierarchical_models_selection.ipynb.
- Build hierarchial models with optimal hyperparameters: Hierarchical_models.ipynb.
- Evaluate cross-validation and test set performance: Model_evaluation.ipynb.
conda env create -f hqsar_env.yml
pip install streamlit
Run the GUI:
streamlit run GUI.py