stan sampling fails when specifying censored types

Question

stan sampling fails when specifying censored types

Closed this issue a year ago · 10 comments

minimal working example

This works:

stanfit <- CausalQueries:::stanmodels$simplexes

model <-  make_model("X->Y") 

data_compact <- 
  collapse_data(
  data.frame(X=c(1,1), Y=c(1,1)), 
  model)

stan_data_0 <- 
  CausalQueries:::prep_stan_data(
    model, data_compact)

updated <- rstan::sampling(stanfit, stan_data_0, refresh = 1)

This fails:

stan_data_1 <- 
  CausalQueries:::prep_stan_data(
    model, data_compact,
    censored_types = c("X1Y0", "X0Y0", "X0Y1"))

stan_data_1$parmap

# This blows up
bad <-  rstan::sampling(stanfit, stan_data_1, refresh = 1, iter = 10000)

diagnosis thus far

This issue is unrelated to normalization with censored types as initially expected

Answer 1 · 2023-09-15T16:31:04.000Z

I wonder if the 0s is the multinomial cause problems; what if they were tinz numbers instead of 0s

…

On Fri, Sep 15, 2023 at 6:21 PM Till Tietz ***@***.***> wrote: Assigned #266 <#266> to @integrated-inferences <https://github.com/integrated-inferences>. — Reply to this email directly, view it on GitHub <#266 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BASI3QQU7XWBTKALPMAU7STX2R6AVANCNFSM6AAAAAA42BKSYA> . You are receiving this because you were assigned.Message ID: <integrated-inferences/CausalQueries/issue/266/issue_event/10385368880@ github.com>

Answer 2 · 2023-09-15T16:38:30.000Z

I'll try adding a tiny offset to the multinomial inputs asap today and test.

Answer 3 · 2023-09-15T16:40:13.000Z

this seemed to work but I don;t like it: // Ensure weights sum to 1 w = w + 0.000000001; w = w / sum(w);

…

On Fri, Sep 15, 2023 at 6:38 PM Till Tietz ***@***.***> wrote: I'll try adding a tiny offset to the multinomial inputs asap today and test. — Reply to this email directly, view it on GitHub <#266 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADBE57L6VA5JF6BUIP5GUTLX2SABBANCNFSM6AAAAAA42BKSYA> . You are receiving this because you were assigned.Message ID: ***@***.***>

Answer 4 · 2023-09-15T16:41:55.000Z

Great that it works, but yeah, sort of a dodgy fix. Does it cause any numerical stability issues?

Answer 5 · 2023-09-15T16:52:05.000Z

What if we just specify initial values for this case? If the problem are the 0s evaluating to -inf when the mcmc tries to start sampling in log space then that could maybe work right?

Answer 6 · 2023-09-15T17:07:32.000Z

I'll have to think through that. The other thing I was thinking abkut is just making the w vector shorter. So only event probabilities for non 0 events. Stan team might also have a suggestion since this must be new

…

On Fri, 15 Sept 2023, 18:52 Till Tietz, ***@***.***> wrote: What if we just specify initial values for this case? If the problem are the 0s evaluating to -inf when the mcmc tries to start sampling in log space then that could maybe work right? — Reply to this email directly, view it on GitHub <#266 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADBE57ODI5YOFGUNMQ7AKRTX2SBT7ANCNFSM6AAAAAA42BKSYA> . You are receiving this because you were assigned.Message ID: ***@***.***>

Answer 7 · 2023-09-15T17:16:37.000Z

Something vaguely related to this popped up on the Stan forum 5 years ago

https://discourse.mc-stan.org/t/error-occurred-during-calling-the-sampler-sampling-not-done/6061

The thing that gets me about this though is that everything worked fine until the update. And the changes to the code (except the new array syntax) don't look massive.

I wonder: did we just always get some sort of floating point voodoo with the old version that prevented things from evaluating exactly to 0?

Answer 8 · 2023-09-15T20:01:24.000Z

In the current function even if we specify initial values for w the likelihood always sends them straight to zero for excluded cases. In ancient history there was something similar with restrictions on nodal types: should we force their probability to 0 or prune the whole.model to exclude them. The latter worked better On Fri, 15 Sept 2023, 19:07 Macartan Humphreys, ***@***.***> wrote:

…

I'll have to think through that. The other thing I was thinking abkut is just making the w vector shorter. So only event probabilities for non 0 events. Stan team might also have a suggestion since this must be new On Fri, 15 Sept 2023, 18:52 Till Tietz, ***@***.***> wrote: > What if we just specify initial values for this case? If the problem are > the 0s evaluating to -inf when the mcmc tries to start sampling in log > space then that could maybe work right? > > — > Reply to this email directly, view it on GitHub > < #266 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ADBE57ODI5YOFGUNMQ7AKRTX2SBT7ANCNFSM6AAAAAA42BKSYA> > . > You are receiving this because you were assigned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#266 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BASI3QWQKPLAT4UKAYZ5Z3LX2SDOBANCNFSM6AAAAAA42BKSYA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 9 · 2023-09-17T18:24:55.000Z

I have a branch in which w is kept intact and it is pruned only when the multinomial is run; this has the advantage of letting us preserve the true w vector (event probabilities without censoring) while still updating with the censored data doing tests now and will then fold in but worth a pair of eyes hopefully tis deals with the Cran bug we had I noticed also that w is being preserved but not used; it should be used whenever we do make_data(...posterior_draw) On Fri, Sep 15, 2023 at 10:01 PM Integrated inferences < ***@***.***> wrote:

…

In the current function even if we specify initial values for w the likelihood always sends them straight to zero for excluded cases. In ancient history there was something similar with restrictions on nodal types: should we force their probability to 0 or prune the whole.model to exclude them. The latter worked better On Fri, 15 Sept 2023, 19:07 Macartan Humphreys, ***@***.***> wrote: > I'll have to think through that. > The other thing I was thinking abkut is just making the w vector shorter. > So only event probabilities for non 0 events. > > Stan team might also have a suggestion since this must be new > > On Fri, 15 Sept 2023, 18:52 Till Tietz, ***@***.***> wrote: > > > What if we just specify initial values for this case? If the problem are > > the 0s evaluating to -inf when the mcmc tries to start sampling in log > > space then that could maybe work right? > > > > — > > Reply to this email directly, view it on GitHub > > < > #266 (comment)>, > > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/ADBE57ODI5YOFGUNMQ7AKRTX2SBT7ANCNFSM6AAAAAA42BKSYA> > > > . > > You are receiving this because you were assigned.Message ID: > > ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > < #266 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/BASI3QWQKPLAT4UKAYZ5Z3LX2SDOBANCNFSM6AAAAAA42BKSYA> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#266 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADBE57O3KQGFAGJ7CLCVA3LX2SX2BANCNFSM6AAAAAA42BKSYA> . You are receiving this because you were assigned.Message ID: ***@***.***>

Answer 10 · 2023-09-19T21:47:54.000Z

Since the fix is merged I'll test the previous CRAN test failure and close the issues.