/spoc

An integrated Python package for molecular descriptor generation, data processing, model training, and hyper-parameter optimization.

Primary LanguagePythonMIT LicenseMIT

SPOC

This package is aimed at molecular descriptor generation, data processing, model training, and hyper-parameter optimization.

  1. Summarizes various molecular descriptor generation methods provided by different tools/packages, including RDKit, CDK, Openbabel, Pubchem, Deepchem, etc. It's easy for batch generation.
  2. Data pre-processing and splitting.
  3. Modeling training and hyperparameter optimization by leveraging Scikit-Learn, XGBoost, and LightGBM, more machine learning, and neural network methods will be included/wrapped in the future.

Dependencies

SPOC currently supports Python >= 3.6 and requires these packages on any condition.

Installation

Method 1: conda

# Clone project
git clone git@github.com:WhitestoneYang/spoc.git # or other released or tagged version.

# conda installation
bash - i conda_installation.sh

Method 2: docker

# docker build
docker build --progress=plain -t spoc .

# docker run
docker run -v $(pwd):/workspace/ --network host -it spoc

Usage

  1. Please refer the tests for descriptor generation examples, including single and multiple molecular descriptor generation examples
  2. Please refer the examples for 1) molecular descriptor generation; 2) data processing; 3) model training; 4) hyper parameter optimization workflow.

Citing SPOC

If you have used SPOC in your research, please cite our paper.