Refactor run_ml() API for future expandability

Question

Refactor run_ml() API for future expandability

Closed this issue 4 years ago · 1 comments

From openjournals/joss-reviews#3073 (comment):

The run_ml() function currently implements 5 off-the-shelve ML algorithms, while providing 12 other parameters for training criteria, hyperparameters, feature importance, etc.. If in the future it would support more algorithms, custom metrics, or training parameters, I'd imagine there'll be limitations imposed by the function arguments. I'd suggest the function to take in 3 objects, e.g. run_ml(dataset, model, metrics, [args]), where a metrics object can allow the user select standard metrics or define their own metric functions given the model output and true labels.

Answer 1 · 2021-04-30T21:06:54.000Z

run_ml() already takes these objects:

dataset: The input dataset.
method: The ML model to be used. While we only officially support 5 models, all of the models supported by caret (https://topepo.github.io/caret/available-models.html) should work in our package. If caret supports additional models in the future, these should also work in mikropml. We realize that the model options are not as generalizable as e.g. PyTorch, since users must choose from options supported by caret. However, our code heavily relies on caret to perform the underlying model training. Additionally, as mikropml is oriented toward beginner practitioners, we believe that it does not need to provide the option to include custom models.
perf_metric_function and perf_metric_name: The performance metric to be used. We chose sensible defaults, but the user can provide their own performance metrics if they would like.
hyperparameters: The values of hyperparameters in the model that the user would like to tune.