Sensible Default bin size

Question

Sensible Default bin size

Closed this issue 8 years ago · 8 comments

Currently the kde methods either require the user to provide number of bins, the midpoints or default to 2048. This can be a problem for small datasets. It seems like it would be nice to have a sensible default like the on in http://stats.stackexchange.com/questions/798/calculating-optimal-number-of-bins-in-a-histogram or some other rule of thumb.

`
bin_size = 2_IQR(data)_length(data)^(-1/3)

midpoints= max(data):bin_size:min(data))
`

Answer 1 · 2016-07-07T06:45:26.000Z

That seems like a good idea to me. Would you be interested in putting together a PR for this?

Answer 2 · 2016-07-07T06:47:01.000Z

Yep, I'm already working on it.

Answer 3 · 2016-07-07T08:17:35.000Z

Note that the choice of the number of bins here should be different than a histogram.

In a histogram, you choose the number of bins as a method of avoiding overfitting (i.e. regularization).

For a KDE, the number of bins just affects the numerical resolution of the resulting function, so you want to choose as many as your computational budget allows (up to the resolution of your screen, or whatever needs you have). Ideally it should also be a power of 2 to gain the most advantage from the FFTs. The regularization is handled by the kernel function.

The 2048 was admittedly a pretty arbitrary pick, based on scaling up R's choice (512) by a bit.

Answer 4 · 2016-07-07T22:20:39.000Z

Thank for that comments, I haven't noticed that. This ticket seems rather pointless than. Unless there is another reason to do it?

Answer 5 · 2016-07-07T22:26:44.000Z

We could implement a different a more data-aware default than 2048. Perhaps there's some literature around that recommends something along those lines for kernel density estimation rather than histograms?

Answer 6 · 2016-07-07T22:50:36.000Z

I think Simon is right, it doesn't seem to make any difference for the resulting density other than sampling.

Answer 7 · 2016-07-07T22:54:13.000Z

I think Simon is right

Agreed. After all, when isn't he right? 😄

Answer 8 · 2016-07-07T23:04:17.000Z

Whenever he is talking to his wife/girlfriend ;)