Dear folks, we are offering challenging opportunities located in Beijing for both professionals and students who are keen on AutoML/NAS. Come be a part of DataCanvas! Please send your CV to yangjian@zetyun.com. (Application deadline: TBD.)
HyperGBM is a full pipeline automated machine learning (AutoML) toolkit designed for tabular data. It covers the complete end-to-end ML processing stages, consisting of data cleaning, preprocessing, feature generation and selection, model selection and hyperparameter optimization.
HyperGBM optimizes the end-to-end ML processing stages within one search space, which differs from most existing AutoML approaches that only tackle partial stages, for instance, hyperparameter optimazation. This full pipeline optimization process is very similar to a sequential decision process (SDP). Therefore, HyperGBM utilizes reinforcement learning, Monte Carlo Tree Search, evolution algorithm combined with a meta-learner to efficiently solve the pipeline optimization problem.
HyperGBM, as indicated in the name, involves several gradient boosting tree models (GBM), namely, XGBoost, LightGBM and Catboost. What's more, it could access the Hypernets, a general automated machine learning framework, and introduce its advanced characteristics in data cleaning, feature engineering and model ensemble. Additionally, the search space representation and search algorithm inside Hyper GBM are also supported by Hypernets.
Install HyperGBM with conda
from the channel conda-forge:
conda install -c conda-forge hypergbm
On the Windows system, recommend install pyarrow(required by hypernets) 4.0 or earlier version with HyperGBM:
conda install -c conda-forge hypergbm "pyarrow<=4.0"
Install HyperGBM with different pip
options:
- Typical installation:
pip install hypergbm
- To run HyperGBM in JupyterLab/Jupyter notebook, install with command:
pip install hypergbm[notebook]
- To support dataset with simplified Chinese in feature generation,
- Install
jieba
package before running HyperGBM. - OR install with command:
- Install
pip install hypergbm[zhcn]
- Install all above with one command:
pip install hypergbm[all]
- Use HyperGBM with Python
Users can quickly create and run an experiment with make_experiment
, which only needs one required input parameter train_data
. The example shown below is using the blood
dataset as train_data
from hypernet.tabular
. If the target column of the dataset is not y
, it needs to be manually set through the argument target
.
An example codes:
from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils
train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)
This training experiment returns a pipeline with two default steps, data_clean
and estimator
. In particular, the estimator
returns a final model which consists of various models. The outputs:
Pipeline(steps=[('data_clean',
DataCleanStep(...),
('estimator',
GreedyEnsemble(...)])
To see more examples, please read Quick Start and Examples.
- Use HyperGBM with Command line tools
Hypergbm also supports command line tools to perform model training, evaluation and prediction. The following codes enable the user to view command line help:
hypergbm -h
usage: hypergbm [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug]
[--verbose VERBOSE] [-v] [--enable-dask ENABLE_DASK] [-dask]
[--overload OVERLOAD]
{train,evaluate,predict} ...
The example of training a model for dataset blood.csv is shown below:
hypergbm train --train-file=blood.csv --target=Class --model-file=model.pkl
For more details, please read Quick Start.
- Hypernets: A general automated machine learning (AutoML) framework.
- HyperGBM: A full pipeline AutoML tool integrated various GBM models.
- HyperDT/DeepTables: An AutoDL tool for tabular data.
- HyperKeras: An AutoDL tool for Neural Architecture Search and Hyperparameter Optimization on Tensorflow and Keras.
- Cooka: Lightweight interactive AutoML system.
HyperGBM is an open source project created by DataCanvas.