can't get zinbFit to finish

Question

can't get zinbFit to finish

friedue opened this issue 6 years ago · 14 comments

Hi,

I've been trying to follow the strategy for DE using zinbwave as specified here and I keep on running into the same error. Since this takes a rather long time to compute and I couldn't immediately spot the function that may issue the error, I was wondering whether you could point me to the culprit.

zinb <- zinbFit(core, X = design, commondispersion = TRUE, epsilon=1e12, verbose = TRUE, maxiter.optimize = 2)
Create model:
ok
Initialize parameters:

ok
Optimize parameters:
Iteration 1
penalized log-likelihood = -149172106.416477
After dispersion optimization = -149172106.416477
^[[A^[[B^[[B    user   system  elapsed
68311.71    68.54 66309.15
After right optimization = -149172106.416477
After orthogonalization = -149172106.416477
     user    system   elapsed
73520.612    63.835  7400.819
After left optimization = -149172106.416477
After orthogonalization = -149172106.416477
Iteration 2
penalized log-likelihood = -128545935.423954
After dispersion optimization = -149172106.416477
Error in updt[[as.integer(i)]] <- .error(msg) :
  attempt to select more than one element in integerOneIndex
In addition: Warning message:
In value[[3L]](cond) : NAs introduced by coercion

core is a SummarizedExperiment object with the following specs

> dim(core)
[1] 15347 18329

> str(core)
Formal class 'SummarizedExperiment' [package "SummarizedExperiment"] with 5 slots
  ..@ colData        :Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : chr [1:18329] "D1" "D2" "D3" "D4" ...
  .. .. ..@ nrows          : int 18329
  .. .. ..@ listData       :List of 1
  .. .. .. ..$ condition: Ord.factor w/ 3 levels "W"<"H"<"D": 3 3 3 3 3 3 3 3 3 3 ...
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ assays         :Reference class 'ShallowSimpleListAssays' [package "SummarizedExperiment"] with 1 field
  .. ..$ data: NULL
  .. ..and 14 methods.
  ..@ NAMES          : chr [1:15347] "ENSMUSG00000033845" "ENSMUSG00000025903" "ENSMUSG00000033793" "ENSMUSG00000025907" ...
  ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 15347
  .. .. ..@ listData       : Named list()
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ metadata       : list()

drisso commented 6 years ago

Maybe 10?

Answer 1 · 2018-06-15T18:41:48.000Z

@drisso have you seen this before?

Answer 2 · 2018-06-15T19:23:40.000Z

I could add that zinbwave ran without an error, although not on all the genes and with the default settings of zinbwave . It did not contain the weights in the assay slot, though, which is why I wanted to go via zinbFit.

Answer 3 · 2018-06-15T19:36:40.000Z

Hi @friedue

I haven't seen that error before, but the current version of zinbwave computes the weights by default. So if you update to the current version, you should be able to run the zinbwave function and get the weights out of it.

I'm not sure that will solve it though, since zinbwave calls zinbFit internally. Can you re-run with the latest version and see If you still get the error?

Answer 4 · 2018-06-15T20:06:27.000Z

Just re-ran it with a much smaller matrix (1.5k x 1.5k) and the latest version directly cloned from git -- that seems to have worked.
Any advice on the maxiter.optim setting? I reduced that to 2 because it took around 15h per iteration with the original matrix and I was getting impatient.

Answer 5 · 2018-06-15T20:08:18.000Z

Not really... in principle it should be let iterate until convergence, but I understand that it can take a while... 2 seems very low though...

Answer 6 · 2018-06-15T20:09:40.000Z

I wasn't really seeing any changes in the the values it spat out after about 6 iterations, so once that failed and I thought I had found the culprit, I wanted to get a somewhat quicker insight into whether it was going to fail again.

Answer 7 · 2018-06-15T20:10:41.000Z

sorry, this is getting a bit off-topic, but while I'm having your attention I'll just keep on going:
what do those values represent that it tells me in verbose mode, e.g. After right optimization = -149172106.416477

Answer 8 · 2018-06-15T21:22:28.000Z

It's the negative (log?) likelihood of the model. The method is looking for the maximum likelihood solution.

…

On Fri, Jun 15, 2018 at 4:10 PM Friederike Dündar ***@***.***> wrote: sorry, this is getting a bit off-topic, what while I'm having your attention: what *do* those values represent that it tells me in verbose mode, e.g. After right optimization = -149172106.416477 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AID1aEqtEb6DWxAZ2vgUwMYW3A2TKHfCks5t9BTCgaJpZM4Up_T5> .

Answer 9 · 2018-06-16T10:06:49.000Z

Although the value is negative, it is just the normal log-likelihood, penalized by the regularization. It is negative but should increase at each iteration.
From the message above there seems to be a bug in the value reported in the "verbose" mode. While the first value after each iteration increases at each iteration, as expected, the other reported values ("After dispersion optimization", "After right optimization" etc...) seem not to change. I ran the example in the vignette to check and observe the same phenomenon, see below. @drisso : have you seen that?

fluidigm_zinb <- zinbwave(fluidigm, K = 2, epsilon=1000, verb=TRUE)
Create model:
ok
Initialize parameters:
ok
Optimize parameters:
Iteration 1
penalized log-likelihood = -97286.4543228788
After dispersion optimization = -97286.4543228788
user system elapsed
0.938 0.068 1.007
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.734 0.062 0.796
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 2
penalized log-likelihood = -69344.9476810396
After dispersion optimization = -97286.4543228788
user system elapsed
0.687 0.035 0.724
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.654 0.056 0.710
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 3
penalized log-likelihood = -68115.327913982
After dispersion optimization = -97286.4543228788
user system elapsed
0.643 0.042 0.686
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.558 0.035 0.595
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 4
penalized log-likelihood = -67948.0547100049
After dispersion optimization = -97286.4543228788
user system elapsed
0.648 0.040 0.689
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.588 0.047 0.637
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 5
penalized log-likelihood = -67896.2706681621
After dispersion optimization = -97286.4543228788
user system elapsed
0.607 0.027 0.636
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.587 0.052 0.640
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788

Answer 10 · 2018-06-16T14:34:15.000Z

As an additional observation: I got the error I initially reported again with the latest version. I had a feeling that it may have something to do with me compulsively loading the data.table package in almost every session (it tends to mask some functions and the error I got seemed to point towards some problem with subsetting). I have no idea whether that's the real issue, but once I restarted the same process in a new session without loading any other packages than the ones needed for zinbwave, it seemed to fare better. I've started it on my huge data set with 10 iterations, I'll report back once that's ended.

Answer 11 · 2018-06-18T01:11:02.000Z

Yay, it finished with 7 iterations (your guess of 10 iterations seems to have been on spot! :) )
I'll superstitiously chalk it up to the loading of the data.table although I lack the nerves and time to test that thoroughly

Answer 12 · 2018-06-18T13:53:16.000Z

HI @jpvert

thanks for spotting this! I hadn't noticed, mostly because I never use the verbose option and I think we don't have unit tests for this.

I will have a look at what's going on (opening an issue in the zinbwave repo).

Answer 13 · 2018-06-25T18:13:33.000Z

@jpvert the verbose issue has been fixed