Pg. 124, "a few bulk points"?

Question

Pg. 124, "a few bulk points"?

Closed this issue 4 months ago · 2 comments

On pg 124, there is this like explaining why we want a shifted exponential:

In my experience, this is fine for data with none to moderate outliers, but for data with extreme outliers (or data with a few bulk points), like in Anscombe’s third dataset, it is better to avoid such low values.

What does "a few bulk points" mean? Is this implying:

A dataset with a small number of data points in the "expected" regime, with emphasis on the small number part? If so, we might perhaps update the sentence to say or data with *only* a few bulk points
Or, is "bulk points" referring to the "not expected" part of the data, talking perhaps about a cluster of "outliers" (but they're all close together)?
Or, something else?

Answer 1 · 2024-07-01T19:29:17.000Z

A few more questions from this paragraph:

The defaults are good starting points ...

Does "defaults" refer to the values you picked for Code 4.6, or something else?

Other common priors are Gamma(2, 0.1) and Gamma(mu=20, sigma=15) ...

Are these supposed to be for the "nu_" parameter, or something else? I initially thought these would be referring to 2 different priors in the same model since the input numbers are so different, but taking a look at the pymc docs, I'm now thinking these are probably both for "nu_", with the 1st using the alpha/beta parameterization, while the 2nd using the mu/sigma parameterization. Does this sound right?

Answer 2 · 2024-07-15T14:46:20.000Z

Regarding the " few bulk points" I am talking about the first.

This is a general recommendation, not just about this example "Take this, as well as other prior recommendations, with a pinch of salt. The defaults are good starting points, but there’s no need to stick to them."

Those gamma priors are prios for "nu", and you are right they are just two different parametrizations.