Progress bar misleading when ICE = TRUE
notiv opened this issue · 6 comments
When using partial
with ICE = FALSE
and progress = 'time'
, the progress bar seems to produce a reasonable estimate of the time to completion. When ICE = TRUE
though, the estimate is far off and the function runs for quite some time after the progress bar has reached 100%. The number of features and the number of records are relatively high (> 700, > 300K respectively). Is there an explanation for that?
@notiv I can't imagine why that would happen, but it may be an error with either the progress or plyr packages. Do you have a reproducible example I can run on my end?
I checked the source code and I didn't find a reason why that would happen either. I also tried to create a reprex, but there was no problem with small datasets. Do you expect different run times depending on the value of ICE
? I thought that one should first calculate the ICE and then get the average, i.e. running times should be comparable.
I'll check further and I'll also try to create a reproducible example with a larger dataset.
@notiv Correct, theoretically, ICE curves should be faster since they are computed first. However, you'll notice they take longer in pdp because they get post-processed (e.g., converted from wide to long format; initially, each ICE curve is in a different row) to make them easier to plot.
I'm having the same issue as with @notiv. Progress bar is accurate for the PDP but not for the ICE plot. The dataset I'm dealing with is big (6 million data points) but plotting PDP only takes a few minutes, whereas ICE takes forever, even after progress bar reaches 100%. I assume it's the wide-to-long format conversion that's taking so long.
I can try reimplementing the long-to-wide conversion, or even trying to eliminate it whenever ice=TRUE. My task is removing the plyr dependency, so I’ll take a hard look at this soon.
Fix available on this branch if anyone cares to test: https://github.com/bgreenwell/pdp/tree/foreach/R.