erikbern/uncertainty

Percentile bootstrapped confidence intervals are problematic

yanirs opened this issue · 6 comments

Thanks for a great post, @erikbern. I keep referring back to it. After publishing an internal analysis, my colleague mentioned that using percentile bootstrapped confidence intervals is problematic. The issue is summarised on Wikipedia:

See Davison and Hinkley (1997, equ. 5.18 p. 203) and Efron and Tibshirani (1993, equ 13.5 p. 171). This method can be applied to any statistic. It will work well in cases where the bootstrap distribution is symmetrical and centered on the observed statistic [29] and where the sample statistic is median-unbiased and has maximum concentration (or minimum risk with respect to an absolute value loss function). In other cases, the percentile bootstrap can be too narrow.[citation needed] When working with small sample sizes (i.e., less than 50), the percentile confidence intervals for (for example) the variance statistic will be too narrow. So that with a sample of 20 points, 90% confidence interval will include the true variance only 78% of the time. [30] Some give a general warning against using the percentile bootstrap in favour of the basic bootstrap; according to Rice, "Although this direct equation of quantiles of the bootstrap sampling distribution with confidence limits may seem initially appealing, it’s rationale is somewhat obscure." [31][32]

Any reason not to use the basic bootstrap method (aka pivotal confidence intervals)? I found its explanation here to be pretty accessible, and it's just as easy to use as the percentile method.

That’s very interesting! I’m not planning to update the blog post other than fixing any errors (there’s so many things that could be mentioned) but maybe this deserves its own topic almost

After reading this paper and a few other resources, I think that implementing bootstrapping manually should be discouraged. I'd also avoid the basic pivotal bootstrap method. Maybe I'll post about it one day...

Seems good. I think bootstrap is a good technique to know since it’s so easy to understand but I agree that there’s plenty of pitfalls. I think I did raise that point in my blog post

I've now read the full version of the paper cited above. It gives more accurate sample sizes for when confidence intervals can be trusted. Both the percentile method and the basic bootstrap (reverse percentile) methods don't look good for practical applications:

The sample sizes needed for different intervals to satisfy the “reasonably accurate” (off by no more than 10% on each side) criterion are: are n ≥ 101 for the bootstrap t, 220 for the skewness-adjusted t statistic, 2235 for expanded percentile, 2383 for percentile 4815 for ordinary t (which I have rounded up to 5000 above), 5063 for t with bootstrap standard errors and something over 8000 for the reverse percentile method.

The blog post talks about a sample size of 50, which is probably too low with percentile intervals on non-normal data. Sorry for misleading you with the basic bootstrap comment above – the past few weeks have been full of discoveries about bootstrap limitations...

Thanks! Will take a look. Sounds like bootstrap could be dangerous

Sounds like bootstrap could be dangerous

Indeed. I'll just leave my summary of bootstrapping pitfalls here for future reference: https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/