SchlossLab/mikropml

Implement progress bar for long-running functions

Closed this issue · 3 comments

From openjournals/joss-reviews#3073 (comment):

When testing preprocess_data() and run_ml() on a larger dataset with 1917 rows × 14426 columns, it would take an indeterminately long time. It should have a progress bar indicating the iteration #, and the estimated duration for each iteration, similar to Python's tqdm.

And openjournals/joss-reviews#3073 (comment):

I agree with @JonnyTran 's suggestion to display time to completion estimates and relevant metrics (e.g. loss or chosen metrics for current iteration) during training. The average user will find this helpful. This would also be the case when parallelizing run_ml() execution.

I'll probably use https://github.com/r-lib/progress and make it optional.

Unfortunately the progress bar is nearly useless for run_ml() since the bulk of the execution time is spent running caret::train(). We won't be able to fix that without writing our own train function, which is probably not worth it.

The progress bar is more informative for preprocess_data(), so I implemented it there.

Wonderful @kelly-sovacool ! I was wondering if it's also possible to implement a progress bar for the feature importance, since computing it can take a long time?

Nice idea @JonnyTran, I'll implement it for feature importance too in #257.