oguzhanogreden/dcurver

Create a document to verify DC estimation works under several conditions

oguzhanogreden opened this issue · 6 comments

We need to verify that DC estimation works under several conditions. The initial conditions are as follows:

  1. Estimate DC for simulated data from a bimodal distribution described in Woods & Lin (2009)
  2. Estimate DC for simulated data from a DC with parameters (77.32, 78.51, 76.33, 77.16)
  3. Imitate E-table using the density
  4. Estimate DC for an E-table, using the example table here.
    1. without standardizing the quadrature points
    2. after standardizing the quadrature points standardizeQuadrature()

So the DC estimation as described in the literature does work under many circumstances, except when the quadrature points are standardized.

I isolated this step. That means I simply estimated DC for a vector, out of the context of EM. Here are the plots:

  1. Plot1
  2. Plot2
  3. Imitate E-table using the density skipped, not very informative
  4. Estimate DC for an E-table: by using the E-table as weights for the LL and the gradient.
    i. Plot4i
    ii. Plot4ii

Has the code on the repo been synced to reflect these two approaches? I'd like to walk through the code to see what's going on internally as well.

  • Plot 1. doesn't look that bad, but why is there a Gaussian-looking line in the most informative condition (bottom right)? For comparison? I also imagine these will get better as the number of items increase (not sure what the test lengths are here).
  • Plot 2. looks good to me.
  • Plot 4ii looks astoundingly terrible; almost too good to be true. Are you certain this isn't some kind of simulation specification error/implementation error? All the plots have literally the same density function....which in and of itself seems extremely unlikely. I also don't understand how these could be possible. All standardization does is change the spread of the points (except maybe the points which are extrapolated, but those should have little weight anyway). So, for the DC density estimates this transformation should change basically nothing and work as it does for the other examples.

When I meant isolated, I meant it fully :) There is no test here. For 1 and 2, I sampled N observations from a bimodal density. For 4i and 4ii I used the same E-table I've put online before. The code is now available here.

I don't know the reason behind the Gaussian-looking ones. Some samples from the target density does not lend any information which can be captured by the spline estimation. I know that how often this happens also depends on the target density. I didn't dwell on it since this is easy to notice: the spline parameters do not move, I can detect this.

... for the DC density estimates this transformation should change basically nothing and work as it does for the other examples.

I'm not sure if I agree. I am yet to see a set of DC parameters which gives a density which is similar in spread to the standardized E-table. I'm about to make a Shiny app to systematically check this :-) So when I see the published plots, I'm curious about what are the DC parameters there. I asked Woods if she can send me an example set of parameters, but this could be not so easy after so many years.

There could be another implementation trick here, which decouples DC parameters from the curve. I have some ideas but I'll sleep on them a little.

Consider the two plots generated from your E-table data file:

load("~/../Downloads/e-table.Rdata")
fTheta <- gTheta[[1]] * rr
obs_m <- sum(fTheta) / sum(rr)
obs_sd <- sqrt(sum(rr * (gTheta[[1]] - obs_m)^2) / sum(rr))
gTheta_std <- (gTheta[[1]] - obs_m) / obs_sd

std_qp <- standardizeQuadrature(gTheta[[1]], gTheta_std, rr)
rr_std <- std_qp[,2] / sum(std_qp[,2]) * sum(rr)

plot(rr)
plot(rr_std)

rplot
rplot01

How is it possible for the DC density function to adequately reflect the first plot, yet for the second plot it always returns a Gaussian distribution regardless of the number of DC parameters fitted? The shape of the distribution doesn't change, though as I said before the extrapolated tails are somewhat different.

This seems like a rather small (but obvious) bug that should be tracked down as I just don't see how plot 4ii would be possible given such clear distributional shapes before and after standardization.

Sorry for several notifications, some irrelevant code went together with the relevant bit


I went through my notes. They reminded me that I didn't understand yet another statement by Woods.

While describing a term in inter-/extra-polation functions, she wrote this: "delta is the distance between any two q or, equivalently, between any two q*" (p. 67 of the 2015 chapter). I was surprised by the statement since they were not equivalent numerically, nor the functions would be equivalent.

This is the difference in code. Here are the plots, which are promising.

I'll reflect these in the mirt fork and test again.

See the issue in the mirt fork for an update.