Contact: Jason Bryer, Ph.D.
Website: https://jbryer.github.io/mldash/
The goal of mldash
is to provide a framework for evaluating the
performance of many predictive models across many datasets. The package
includes common predictive modeling procedures and datasets. Details on
how to contribute additional datasets and models is outlined below. Both
datasets and models are defined in the Debian Control File (dcf) format.
This provides a convenient format for storing both metadata about the
datasets and models but also R code snippets for retrieving data,
training models, and getting predictions. The run_models
function
handles executing each model for each dataset (appropriate to the
predictive model type, i.e. classification or regression), splitting
data into training and validation sets, and calculating the desired
performance metrics utilizing the
yardstick
package.
You can install the development version of mldash
using the remotes
package like so:
remotes::install_github('jbryer/mldash')
The mldash
package makes use of predictive models implemented in R,
Python, and Java. As a result, there are numerous system requirements
necessary to run all the models. We have included instructions in the
installation
vignette:
vignette('installation', package = 'mldash')
To begin, we read in the datasets using the read_ml_datasets()
function. There are two parameters:
dir
is the directory containing the metadata files. The default is to look in the package’s installation directory.cache_dir
is the directory where datasets can be stored locally.
This lists the datasets currenlty included in the package.
ml_datasets <- mldash::read_ml_datasets(dir = 'inst/datasets',
cache_dir = 'inst/datasets')
# head(ml_datasets, n = 4)
Similarly, the read_ml_models
will read in the models. The dir
parameter defines where to look for model files.
ml_models <- mldash::read_ml_models(dir = 'inst/models')
#> Warning in mldash::read_ml_models(dir = "inst/models"): The following packages
#> are not installed but required by the models: FCNN4R, mxnet
# head(ml_models, n = 4)
Once the datasets and models have been loaded, the run_models
will
train and evaluate each model for each dataset as appropriate for the
model type.
ml_results <- mldash::run_models(datasets = ml_datasets,
models = ml_models,
seed = 2112)
The metrics
parameter to run_models()
takes a list of metrics from
the yardstick
package
(Kuhn & Vaughan, 2021). The full list of metrics are available here:
https://yardstick.tidymodels.org/articles/metric-types.html
There are 27 included in the mldash
package. You can view the packages
in the datasets
vignette.
vignette('datasets', package = 'mldash')
Each model is defined in a Debian Control File (DCF) format the details
of which are described below. Below is the list of models included in
the mldash
package. Note that models that begin with tm_
are models
implemented with the tidymodels
R
package; models that begin with weka_
are models implemented with the
the RWeka
which is a wrapper to the Weka
collection of machine learning algorithms.
There are 413 included in the mldash
package. You can view the models
in the models
vignette.
vignette('models', package = 'mldash')
Please note that the mldash project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.