mplatzer/BTYDplus

PGGG: lower upper bound limit of k slice sampling

gBokiau opened this issue · 1 comments

When drawing pggg parameters I tend to get a limited (0.5 %) cluster of cases with mean k's between 985 and 999, with absolutely no mean k between 50 and 985 (Expected aggregate k is around 0.8).

The only thing seemingly setting these cases apart is that they're somewhat regular but rather short-lived, they're probably somewhat overrepresented and must be confusing the algorithm.

The upper bound for k slice sampling is set at 1000, which at first sight I don't think is a realistic expectation in any scenario? I would suspect a limit of around 100 to be safer and would adjust the algorithm accordingly.

Speaking of assumptions, it occurred to me that k's aggregate distribution is more likely to follow a lognormal than a gamma distribution. Even in the clumpiest of scenarios, extremely low k's remain less likely than values around 0.5, with a few higher k cases always remaining quite likely, a situation the gamma distribution doesn't allow for.

Interesting! I haven't seen such cases myself. Could you share the plotted
timing patterns of these customers? And what are your estimated t and gamma
parameters?

As I haven't run into similar problems, I also haven't had the need to
lower the upper limit for k. I doubt that it would help though, since the
gamma should cap such outliers anyways. If there is a way for you to share
a dataset which reproduces the behavior, it would be very helpful.

And yes, lognormal might also be a good candidate for the heterogeneity,
also for lambda and mu. Abe's model is using the lognormal for example.

Am 18.11.2016 16:00 schrieb "gBokiau" notifications@github.com:

When drawing pggg parameters I tend to get a limited (0.005 %) cluster of
cases with mean k's between 985 and 999, with absolutely no mean k
between 50 and 985 (Expected aggregate k is around 0.8).

The only thing seemingly setting these cases apart is that they're
somewhat regular but rather short-lived, they're probably somewhat
overrepresented and must be confusing the algorithm.

The upper bound for k slice sampling is set at 1000, which at first sight
I don't think is a realistic expectation in any scenario? I would suspect a
limit of around 100 to be safer and would adjust the algorithm accordingly.

Speaking of assumptions, it occurred to me that k's aggregate
distribution is more likely to follow a lognormal than a gamma
distribution. Even in the clumpiest of scenarios, extremely low k's remain
less likely than values around 0.5, with a few higher k cases always
remaining quite likely, a situation the gamma distribution doesn't allow
for.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#50, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMwOTYzC6lcrUU8I2yYvBZnXGSP7H_3ks5q_bzggaJpZM4K2kMF
.