PGGG: lower upper bound limit of k slice sampling
gBokiau opened this issue · 1 comments
When drawing pggg parameters I tend to get a limited (0.5 %) cluster of cases with mean k
's between 985 and 999, with absolutely no mean k
between 50 and 985 (Expected aggregate k
is around 0.8).
The only thing seemingly setting these cases apart is that they're somewhat regular but rather short-lived, they're probably somewhat overrepresented and must be confusing the algorithm.
The upper bound for k
slice sampling is set at 1000, which at first sight I don't think is a realistic expectation in any scenario? I would suspect a limit of around 100 to be safer and would adjust the algorithm accordingly.
Speaking of assumptions, it occurred to me that k
's aggregate distribution is more likely to follow a lognormal than a gamma distribution. Even in the clumpiest of scenarios, extremely low k
's remain less likely than values around 0.5, with a few higher k
cases always remaining quite likely, a situation the gamma distribution doesn't allow for.
Interesting! I haven't seen such cases myself. Could you share the plotted
timing patterns of these customers? And what are your estimated t and gamma
parameters?
As I haven't run into similar problems, I also haven't had the need to
lower the upper limit for k. I doubt that it would help though, since the
gamma should cap such outliers anyways. If there is a way for you to share
a dataset which reproduces the behavior, it would be very helpful.
And yes, lognormal might also be a good candidate for the heterogeneity,
also for lambda and mu. Abe's model is using the lognormal for example.
Am 18.11.2016 16:00 schrieb "gBokiau" notifications@github.com:
When drawing pggg parameters I tend to get a limited (0.005 %) cluster of
cases with mean k's between 985 and 999, with absolutely no mean k
between 50 and 985 (Expected aggregate k is around 0.8).The only thing seemingly setting these cases apart is that they're
somewhat regular but rather short-lived, they're probably somewhat
overrepresented and must be confusing the algorithm.The upper bound for k slice sampling is set at 1000, which at first sight
I don't think is a realistic expectation in any scenario? I would suspect a
limit of around 100 to be safer and would adjust the algorithm accordingly.Speaking of assumptions, it occurred to me that k's aggregate
distribution is more likely to follow a lognormal than a gamma
distribution. Even in the clumpiest of scenarios, extremely low k's remain
less likely than values around 0.5, with a few higher k cases always
remaining quite likely, a situation the gamma distribution doesn't allow
for.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#50, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMwOTYzC6lcrUU8I2yYvBZnXGSP7H_3ks5q_bzggaJpZM4K2kMF
.