reach-lab/MixWildGUI

Stage 2 model issue (missing replications)

Opened this issue · 2 comments

Hi Don and Rachel,

I also noticed that current Fortran EXE may have some difficulties in doing logistic regression in the Stage 2 model. As you could see in the DEF file, the resampling parameter is 500. However, there ended up being 91 replications in the Stage 2 output. The screenshot shows that it generated lots of "102" when estimating a subject-level model in Stage 2. It seems that there were so many incomplete/invalid replications ("500" -> "91", 409 missing replications) in the analysis.

Stage2_102
data_and_def_(stage2).zip

Please help me check this issue as well. Thanks!

Best,
Wei-Lin

Hi Wei-Lin,
i suspect that the problem is with the inclusion of the HSG variable (and in particular, the scalling of this variable) in this model, and also the small size of the data. Here, you are trying to run logistic regressions with 72 observations and 6 regressors - not a lot of data to estimate the regression coefficients. What makes it computationally more difficult is that the HSG variable is coded with min=3.4 and max=5.6, and there are 3 model terms involving HSG:
main effect of HSG
Location by HSG interaction
Scale by HSG interaction.

so there are three parameters in the model representing effects when HSG=0:
model intercept
location main effect
scale main effect

The range of HSG doesn't come close to a value of 0, so these parameters are extrapolations outside of the data range. for example, in the output that you sent, the intercept was close to -8 - this is an incredibly small value for a logit indicating a probability of essentially zero. Why is it so small? it represents the logit when HSG=0, but HSG is never close to zero. so, i suspect that the reason only 91 of the 500 models converged was because of this treatment of the HSG variable. Try centering it around its mean (so that 0 represents a meaningful place in the data surface) - i would suspect that more models will converge if you do this.

Hi Wei-Lin,
i suspect that the problem is with the inclusion of the HSG variable (and in particular, the scalling of this variable) in this model, and also the small size of the data. Here, you are trying to run logistic regressions with 72 observations and 6 regressors - not a lot of data to estimate the regression coefficients. What makes it computationally more difficult is that the HSG variable is coded with min=3.4 and max=5.6, and there are 3 model terms involving HSG:
main effect of HSG
Location by HSG interaction
Scale by HSG interaction.

so there are three parameters in the model representing effects when HSG=0:
model intercept
location main effect
scale main effect

The range of HSG doesn't come close to a value of 0, so these parameters are extrapolations outside of the data range. for example, in the output that you sent, the intercept was close to -8 - this is an incredibly small value for a logit indicating a probability of essentially zero. Why is it so small? it represents the logit when HSG=0, but HSG is never close to zero. so, i suspect that the reason only 91 of the 500 models converged was because of this treatment of the HSG variable. Try centering it around its mean (so that 0 represents a meaningful place in the data surface) - i would suspect that more models will converge if you do this.

Dear Don,

I centered HSG to it's mean, and it works. The output has 500 replications for the stage-2 logistic regression. Thank you so much for your help!

Best,
Wei-Lin