can't get zinbFit to finish
friedue opened this issue · 14 comments
Hi,
I've been trying to follow the strategy for DE using zinbwave
as specified here and I keep on running into the same error. Since this takes a rather long time to compute and I couldn't immediately spot the function that may issue the error, I was wondering whether you could point me to the culprit.
zinb <- zinbFit(core, X = design, commondispersion = TRUE, epsilon=1e12, verbose = TRUE, maxiter.optimize = 2)
Create model:
ok
Initialize parameters:
ok
Optimize parameters:
Iteration 1
penalized log-likelihood = -149172106.416477
After dispersion optimization = -149172106.416477
^[[A^[[B^[[B user system elapsed
68311.71 68.54 66309.15
After right optimization = -149172106.416477
After orthogonalization = -149172106.416477
user system elapsed
73520.612 63.835 7400.819
After left optimization = -149172106.416477
After orthogonalization = -149172106.416477
Iteration 2
penalized log-likelihood = -128545935.423954
After dispersion optimization = -149172106.416477
Error in updt[[as.integer(i)]] <- .error(msg) :
attempt to select more than one element in integerOneIndex
In addition: Warning message:
In value[[3L]](cond) : NAs introduced by coercion
core
is a SummarizedExperiment object with the following specs
> dim(core)
[1] 15347 18329
> str(core)
Formal class 'SummarizedExperiment' [package "SummarizedExperiment"] with 5 slots
..@ colData :Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
.. .. ..@ rownames : chr [1:18329] "D1" "D2" "D3" "D4" ...
.. .. ..@ nrows : int 18329
.. .. ..@ listData :List of 1
.. .. .. ..$ condition: Ord.factor w/ 3 levels "W"<"H"<"D": 3 3 3 3 3 3 3 3 3 3 ...
.. .. ..@ elementType : chr "ANY"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ assays :Reference class 'ShallowSimpleListAssays' [package "SummarizedExperiment"] with 1 field
.. ..$ data: NULL
.. ..and 14 methods.
..@ NAMES : chr [1:15347] "ENSMUSG00000033845" "ENSMUSG00000025903" "ENSMUSG00000033793" "ENSMUSG00000025907" ...
..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
.. .. ..@ rownames : NULL
.. .. ..@ nrows : int 15347
.. .. ..@ listData : Named list()
.. .. ..@ elementType : chr "ANY"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ metadata : list()
@drisso have you seen this before?
I could add that zinbwave
ran without an error, although not on all the genes and with the default settings of zinbwave
. It did not contain the weights in the assay
slot, though, which is why I wanted to go via zinbFit
.
Hi @friedue
I haven't seen that error before, but the current version of zinbwave
computes the weights by default. So if you update to the current version, you should be able to run the zinbwave
function and get the weights out of it.
I'm not sure that will solve it though, since zinbwave
calls zinbFit
internally. Can you re-run with the latest version and see If you still get the error?
Just re-ran it with a much smaller matrix (1.5k x 1.5k) and the latest version directly cloned from git -- that seems to have worked.
Any advice on the maxiter.optim
setting? I reduced that to 2 because it took around 15h per iteration with the original matrix and I was getting impatient.
Not really... in principle it should be let iterate until convergence, but I understand that it can take a while... 2 seems very low though...
Maybe 10?
I wasn't really seeing any changes in the the values it spat out after about 6 iterations, so once that failed and I thought I had found the culprit, I wanted to get a somewhat quicker insight into whether it was going to fail again.
sorry, this is getting a bit off-topic, but while I'm having your attention I'll just keep on going:
what do those values represent that it tells me in verbose mode, e.g. After right optimization = -149172106.416477
Although the value is negative, it is just the normal log-likelihood, penalized by the regularization. It is negative but should increase at each iteration.
From the message above there seems to be a bug in the value reported in the "verbose" mode. While the first value after each iteration increases at each iteration, as expected, the other reported values ("After dispersion optimization", "After right optimization" etc...) seem not to change. I ran the example in the vignette to check and observe the same phenomenon, see below. @drisso : have you seen that?
fluidigm_zinb <- zinbwave(fluidigm, K = 2, epsilon=1000, verb=TRUE)
Create model:
ok
Initialize parameters:
ok
Optimize parameters:
Iteration 1
penalized log-likelihood = -97286.4543228788
After dispersion optimization = -97286.4543228788
user system elapsed
0.938 0.068 1.007
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.734 0.062 0.796
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 2
penalized log-likelihood = -69344.9476810396
After dispersion optimization = -97286.4543228788
user system elapsed
0.687 0.035 0.724
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.654 0.056 0.710
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 3
penalized log-likelihood = -68115.327913982
After dispersion optimization = -97286.4543228788
user system elapsed
0.643 0.042 0.686
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.558 0.035 0.595
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 4
penalized log-likelihood = -67948.0547100049
After dispersion optimization = -97286.4543228788
user system elapsed
0.648 0.040 0.689
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.588 0.047 0.637
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
Iteration 5
penalized log-likelihood = -67896.2706681621
After dispersion optimization = -97286.4543228788
user system elapsed
0.607 0.027 0.636
After right optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
user system elapsed
0.587 0.052 0.640
After left optimization = -97286.4543228788
After orthogonalization = -97286.4543228788
As an additional observation: I got the error I initially reported again with the latest version. I had a feeling that it may have something to do with me compulsively loading the data.table
package in almost every session (it tends to mask some functions and the error I got seemed to point towards some problem with subsetting). I have no idea whether that's the real issue, but once I restarted the same process in a new session without loading any other packages than the ones needed for zinbwave
, it seemed to fare better. I've started it on my huge data set with 10 iterations, I'll report back once that's ended.
Yay, it finished with 7 iterations (your guess of 10 iterations seems to have been on spot! :) )
I'll superstitiously chalk it up to the loading of the data.table although I lack the nerves and time to test that thoroughly