An exploratory top layer package for sklearn
This scikit-learn top layer package finds the machine learning model and hyperparameters best-suited for the dataset and properties the user has set. This is aimed to find the most appropriate machine learning model for the dataset and is not intended to substitute actual model engineering.
The find_model function has the following parameters:
dataset
takes a pandas dataframe as its value
train_size
is a float from the interval (0, 1); it defines the partition of the dataset for training and testing
problem
has three selections: classification, regression, and clustering
label
takes a column name in the pandas dataframe and treats it as the label for classification or the target value for regression
dim_reduction
takes True or False as its value; it gives the option to apply dimensionality reduction on the dataset; False by default
features
is the number of components to remain after dimensionality reduction; auto by default
contains_negative
takes True or False as its value; setting its value to True uses principal component analysis, while setting its value to False uses non-negative matrix factorization; True by default
ensembling
takes True or False as its value; setting its value to True makes use of ensemble methods from base estimators, while setting its value to False disables ensembling
priority
has two selections: accuracy and time; selecting accuracy would enable the module to find for better hyperparameters and optimize the different algorithms in consideration; selecting time would use the default hyperparameters
skxplore considers the following algorithms:
- Classification
Naive Bayes algorithm
,K-nearest neighbors algorithm
,Support vector machine classifier
,eXtreme gradient boosting classifier
, andLight gradient boosting machine classifier
- Regression
Lasso regression
,Ridge regression
,Elastic net regression
,Linear regression
,Support vector machine regressor
,eXtreme gradient boosting regressor
, andLight gradient boosting machine regressor
- Clustering
K-means clustering
,Spectral clustering
,Gaussian mixture model
,Density-based spatial clustering of applications with noise (DBSCAN) algorithm
, andOrdering points to identify the clustering structure (OPTICS) algorithm