bgreenwell/pdp

Progress bar misleading when ICE = TRUE

notiv opened this issue · 6 comments

notiv commented

When using partial with ICE = FALSE and progress = 'time', the progress bar seems to produce a reasonable estimate of the time to completion. When ICE = TRUE though, the estimate is far off and the function runs for quite some time after the progress bar has reached 100%. The number of features and the number of records are relatively high (> 700, > 300K respectively). Is there an explanation for that?

@notiv I can't imagine why that would happen, but it may be an error with either the progress or plyr packages. Do you have a reproducible example I can run on my end?

notiv commented

I checked the source code and I didn't find a reason why that would happen either. I also tried to create a reprex, but there was no problem with small datasets. Do you expect different run times depending on the value of ICE? I thought that one should first calculate the ICE and then get the average, i.e. running times should be comparable.

I'll check further and I'll also try to create a reproducible example with a larger dataset.

@notiv Correct, theoretically, ICE curves should be faster since they are computed first. However, you'll notice they take longer in pdp because they get post-processed (e.g., converted from wide to long format; initially, each ICE curve is in a different row) to make them easier to plot.

I'm having the same issue as with @notiv. Progress bar is accurate for the PDP but not for the ICE plot. The dataset I'm dealing with is big (6 million data points) but plotting PDP only takes a few minutes, whereas ICE takes forever, even after progress bar reaches 100%. I assume it's the wide-to-long format conversion that's taking so long.

I can try reimplementing the long-to-wide conversion, or even trying to eliminate it whenever ice=TRUE. My task is removing the plyr dependency, so I’ll take a hard look at this soon.

Fix available on this branch if anyone cares to test: https://github.com/bgreenwell/pdp/tree/foreach/R.