State-of-the art Automated Machine Learning python library for Tabular Data
-
Binary Classification
-
Regression
-
Multiclass Classification (in progress...)
- Automated Data Clean (Auto Clean)
- Automated Feature Engineering (Auto FE)
- Smart Hyperparameter Optimization (HPO)
- Feature Generation
- Feature Selection
- Models Selection
- Cross Validation
- Optimization Timelimit and EarlyStoping
- Save and Load (Predict new data)
pip install automl
Classifier:
from automl import AutoMLClassifier
model = AutoMLClassifier()
model.fit(X_train, y_train, timeout=600)
predicts = model.predict(X_test)
Regression:
from automl import AutoMLRegressor
model = AutoMLRegressor()
model.fit(X_train, y_train, timeout=600)
predicts = model.predict(X_test)
DataPrepare:
from automl import DataPrepare
de = DataPrepare()
X_train = de.fit_transform(X_train)
X_test = de.transform(X_test)
Simple Models Wrapper:
from automl import LightGBMClassifier
model = LightGBMClassifier()
model.fit(X_train, y_train)
predicts = model.predict_proba(X_test)
model.opt(X_train, y_train,
timeout=600, # optimization time in seconds,
)
predicts = model.predict_proba(X_test)
It integrates many popular frameworks:
- scikit-learn
- XGBoost
- LightGBM
- CatBoost
- Optuna
- ...
-
Categorical Features
-
Numerical Features
-
Binary Features
-
Text
-
Datetime
-
Timeseries
-
Image
- With a large dataset, a lot of memory is required! Library creates many new features. If you have a large dataset with a large number of features (more than 100), you may need a lot of memory.
Run
$ optuna-dashboard sqlite:///db.sqlite3
-
Feature Generation
-
Save/Load and Predict on New Samples
-
Advanced Logging
-
Add opt Pruners
-
Docs Site
-
DL Encoders
-
Add More libs (NNs)
-
Multiclass Classification
-
Build pipelines
Contact: kaushikeva0026@gmail.com