Decreasing log-likelihood with commondispersion = FALSE

Question

Decreasing log-likelihood with commondispersion = FALSE

fedeago opened this issue 5 years ago · 3 comments

Hi, I encountered a problem using zinbwave. If I set commodispersion parameters equal to FALSE the log-likelihood will decrese during the optimization of the dispersion parameters. In this little reproducible example there is a little effect but in real case application this effect is much bigger.

Example:

I am going to reproduce a problem with zinbwave model. At first I create a dataset sampling from negative binomial.

res <- NULL
conte <- for(i in 1:200){
  res_temp <- rnbinom(600,mu = 10, size = rchisq(1,100))
  res <-rbind(res,res_temp)
}

In order to be close to my utilization experience I artificially create a batch effect by adding a poisson distributed noise to some observation.

first_batch <- matrix(rpois(40000,5), nrow = 200)
secon_batch <-matrix(rpois(40000,10), nrow = 200)

res[,c(201:400)] <- res[,c(201:400)]+first_batch
res[,c(401:600)] <- res[,c(401:600)]+secon_batch

conte <- res
batch <- append(rep("type_1",200),rep("type_2",200))
batch <- append(batch ,rep("type_3",200))
conte_sce <- SingleCellExperiment(assays = list(counts = conte))
conte_sce$batch<- batch

Now I use zinbwave:

res_commondisp <- zinbwave::zinbwave(conte_sce,
                X = "~batch",BPPARAM = SerialParam(), verbose = T)

Create model:
ok
Initialize parameters:
ok
Optimize parameters:
Iteration 1
penalized log-likelihood = -444478.223994673
After dispersion optimization = -334822.451814143
user system elapsed
1.282 0.004 1.285
After right optimization = -333462.91717295
After orthogonalization = -333462.91717295
user system elapsed
2.398 0.019 2.418
After left optimization = -333443.084022314
After orthogonalization = -333443.084022314
Iteration 2
penalized log-likelihood = -333443.084022314
After dispersion optimization = -333431.229280787
user system elapsed
0.378 0.000 0.378
After right optimization = -333430.601113283
After orthogonalization = -333430.601113283
user system elapsed
0.759 0.056 0.815
After left optimization = -333430.564456441
After orthogonalization = -333430.564456441
Iteration 3
penalized log-likelihood = -333430.564456441
ok

res_tagwisedisp <- zinbwave::zinbwave(conte_sce,
                X = "~batch",BPPARAM = SerialParam(), verbose = T, commondispersion = F)

Create model:
ok
Initialize parameters:
ok
Optimize parameters:
Iteration 1
penalized log-likelihood = -444478.223994673
After dispersion optimization = -334864.84438229
user system elapsed
1.224 0.003 1.227
After right optimization = -333512.064447135
After orthogonalization = -333512.064447135
user system elapsed
2.371 0.010 2.380
After left optimization = -333492.694979865
After orthogonalization = -333492.694979865
Iteration 2
penalized log-likelihood = -333399.257494682
After dispersion optimization = -333467.350788734
user system elapsed
0.319 0.000 0.319
After right optimization = -333466.727984708
After orthogonalization = -333466.727984708
user system elapsed
0.922 0.003 0.924
After left optimization = -333466.269073984
After orthogonalization = -333466.269073984
Iteration 3
penalized log-likelihood = -333387.305793593
ok

We can see that with the commondispersion option set to FALSE in the second iteration the loglikelihood after the dispersion optimization is higher than the previous likelihood.

Answer 1 · 2020-03-06T15:17:51.000Z

Hi @fedeago,

while I was trying to understand what's going on I stumbled across this old issue, the discussion there could be relevant:
statOmics/zinbwaveZinger#2

In an effort to fix the above issue, I may have introduced a different error in the verbose mode. It might be possible that it's only a problem with the wrong value being computed in the verbose mode. I will try to investigate this more and get back to you.

Note that the actual penalized log-likelihood at each iteration increases as expected, it's only the intermediate values that decrease. Can you confirm that this is the case also in your real case and not just in this small example?

Answer 2 · 2020-03-06T16:07:35.000Z

Yes in the real case application only the intermediate values decrease.

Answer 3 · 2020-03-06T16:22:04.000Z

@fedeago

I found the bug. I was using the wrong value of the dispersion parameter to compute the likelihood in the verbose computations. Thanks for catching this!

Note that this did not affect the results, but only the printed values in verbose mode.

Commit 7dc0cbc fixes the problem. Please update to version 1.9.2 which is available via github and will be propagated to bioc-devel in the next couple of days.