/SmartProcessAnalytics

Smart Process Analytics (SPA) is a software package for automatic machine learning. Given user-input data (and optional user preferences), SPA automatically cross-validates and tests ML and DL models. Model types are selected based on the properties of the data, minimizing the risk of data-specific variance.

Primary LanguagePythonMIT LicenseMIT

Smart Process Analytics

Smart Process Analytics (SPA) is a Python software for predictive modeling. The original version is associated with the paper "Smart process analytics for predictive modeling" by Weike Sun and Richard D. Braatz. Since 2022, it has been updated by Pedro Seber. This fork is different enough from the original version that it should be considered its own thing.

To run SPA on your computer, simply download the source code of the most recent release. Unzip that folder somewhere convenient, open a terminal in that folder, (optionally) create a new conda environment or activate your conda environment of choice, and run pip install -e . (note the dot after -e). SPA should then be usable after import SPA and calling the SPA.main_SPA() function. If you are having issues installing ace-cream, comment its line out in the setup.py file and try again. Most of SPA will work without ace-cream.

SPA.py comes with default hyperparameters for its models, but all hyperparameters are customizable by the user. To learn how to do so, please read its documentation. You may also check the Examples folder and the README within.

The major files in SPA are:

  1. SPA.py: the main file and what should be called by the user. Calls the files below depending on what inputs have been passed by the user or the properties of the data.
  2. cv_final.py: performs cross-validation (or IC calculations) to automatically determine the best hyperparameters. Also trains the final model after validation.
  3. regression_models.py: called multiple times by cv_final.py; runs a model once based on one combination of hyperparameters.
  4. dataset_property_new.py: functions for data interrogation to determine whether the data exhibit nonlinearity, multicollinearity, and/or dynamics. Mostly ignored if the user selects a model architecture(s) manually.

A typical run of SPA.py automatically calls cv_final.py once to determine the best hyperparameters and return the best model. For each hyperparameter, cv_final.py automatically calls regression_models.py once per hyperparameter combination for validation. If the user has not supplied a model type (or types), SPA.py also calls dataset_property_new.py to determine the most adequate model(s) for the data.

The final result is stored in the selected_model and fitting_result variables returned by SPA.py. It is also saved as .json and .p files.

Please contact Pedro Seber (pseber[at]mit{dot}edu) or Richard Braatz (braatz[at]mit{dot}edu) for any inquiries.