Refactor run_ml() API for future expandability
Closed this issue · 1 comments
kelly-sovacool commented
From openjournals/joss-reviews#3073 (comment):
The
run_ml()
function currently implements 5 off-the-shelve ML algorithms, while providing 12 other parameters for training criteria, hyperparameters, feature importance, etc.. If in the future it would support more algorithms, custom metrics, or training parameters, I'd imagine there'll be limitations imposed by the function arguments. I'd suggest the function to take in 3 objects, e.g.run_ml(dataset, model, metrics, [args])
, where a metrics object can allow the user select standard metrics or define their own metric functions given the model output and true labels.
zenalapp commented
run_ml()
already takes these objects:
dataset
: The input dataset.method
: The ML model to be used. While we only officially support 5 models, all of the models supported bycaret
(https://topepo.github.io/caret/available-models.html) should work in our package. Ifcaret
supports additional models in the future, these should also work inmikropml
. We realize that the model options are not as generalizable as e.g. PyTorch, since users must choose from options supported bycaret
. However, our code heavily relies oncaret
to perform the underlying model training. Additionally, asmikropml
is oriented toward beginner practitioners, we believe that it does not need to provide the option to include custom models.perf_metric_function
andperf_metric_name
: The performance metric to be used. We chose sensible defaults, but the user can provide their own performance metrics if they would like.hyperparameters
: The values of hyperparameters in the model that the user would like to tune.