Winnie09/Lamian

'chol2inv': not positive definite

chooliu opened this issue · 12 comments

Hi Dr. Hou!

I encountered the following choleski error when running lamian_test(): suspect it's a dataset-dependent issue, but wanted to reach out to see if you have any thoughts on troubleshooting / if you've seen this in any real-world Lamian applications.

Cheers!

Command in Line 1 & Error:

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a
method for function 'chol2inv': the leading minor of order 1 is not positive definite
Traceback:

1. lamian_test(expr = expression, cellanno = cellannot, pseudotime = pt, 
 .     design = dm, test.type = "variable", testvar = 2, permuiter = 50, 
 .     ncores = 1)
2. lapply(seq_len(permuiter + 1), function(i) fitfunc(iter = i, 
 .     diffType = "overall", test.type = test.type, testvar = testvar, 
 .     EMmaxiter = EMmaxiter, EMitercutoff = EMitercutoff, verbose.output = verbose.output, 
 .     expr = expr, cellanno = cellanno, pseudotime = pseudotime, 
 .     design = design))
3. FUN(X[[i]], ...)
4. fitfunc(iter = i, diffType = "overall", test.type = test.type, 
 .     testvar = testvar, EMmaxiter = EMmaxiter, EMitercutoff = EMitercutoff, 
 .     verbose.output = verbose.output, expr = expr, cellanno = cellanno, 
 .     pseudotime = pseudotime, design = design)
5. fitpt(expr, cellanno, pseudotime, design, testvar = testvar, 
 .     maxknotallowed = 10, EMmaxiter = EMmaxiter, EMitercutoff = EMitercutoff, 
 .     ncores = 1, model = mod.full)
6. sapply(0:maxknot, bicfunc)
7. lapply(X = X, FUN = FUN, ...)
8. FUN(X[[i]], ...)
9. sapply(names(sname), function(ss) {
 .     phiss <- phi[sname[[ss]], , drop = F]
 .     dif2 <- sexpr[[ss]] - sexpr[[ss]] %*% (phiss %*% chol2inv(chol(crossprod(phiss)))) %*% 
 .         t(phiss)
 .     dif2 <- rowSums(dif2 * dif2)
 .     s2 <- dif2/(length(sname[[ss]]) - ncol(phi))
 .     log(2 * pi * s2) * nrow(phiss) + dif2/s2
 . })
10. lapply(X = X, FUN = FUN, ...)
11. FUN(X[[i]], ...)
12. chol2inv(chol(crossprod(phiss)))
13. chol(crossprod(phiss))
14. chol(crossprod(phiss))
15. chol.default(crossprod(phiss))
16. .handleSimpleError(function (cond) 
  . .Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite", 
  .     base::quote(chol.default(crossprod(phiss))))
17. h(simpleError(msg, call))

Attempts to investigate:

  • check that this step runs correctly with tutorial expdata & same format as tutorial objects: OK
  • stricter filtering of genes in expression matrix (e.g., include genes with higher variance, less zeroes)
  • stricter filtering of cells in expression matrix (e.g., total counts/cell
  • change normalization method (Seurat NormalizeData() defaults, different log-transforms, etc)
  • change to test.method = "chisq" (also fails on chol2inv() invocation)

Dataset Info: 10X Multiome, n = 6 samples, n = 2/group x 3 groups; tried 100 to 3000 genes and different variations on design matrix

Software Versions: Lamian_0.99.2, installed via Github commit 7d5a8ff
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Same issue here:

Warning message:
In mclapply(seq_len(permuiter + 1), function(i) { :
  all scheduled cores encountered errors in user code
[1] "The length of fit is ..."
[1] 1 1 1 1 1 1
   Mode   FALSE 
logical       6 
[1] "The length of fit after removing null is ..."
[1] 1 1 1 1 1 1
   Mode   FALSE 
logical       6 
Error in fit[[1]]$fitres.full : $ operator is invalid for atomic vectors

r$> fit[[1]]                                                                                                                                               
[1] "Error in h(simpleError(msg, call)) : \n  error in evaluating the argument 'x' in selecting a method for function 'chol2inv': the leading minor of order 1 is not positive definite\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'chol2inv': the leading minor of order 1 is not positive definite>

Maybe related to these issues: #2 #11
Also, my expression matrix, cellanno, and pseudotime all have same number of cells (~150k) so I don't think this is the reason causing the error.

Hi @chooliu @chooliu Thank you for your strong interest in this software package! I am glad to help. It could be due to the pseudotime. Would you mind sharing with me your pseudotime and the design matrix?

It could be due to the skewed distribution of the pseudotime, or in some range of the pseudotime, there are no cells.

Thanks so much for the response & offering to look at the data! I'll take a peek in the next week to (1) confirm how the distribution of pseudotime varies with treatment and sample as well as (2) try alternative pseudotime estimation algorithms. I'll email those and the design matrix if this doesn't give insight.

Re: skewed distribution, does Lamian expect any scaling or transformation of pseudotime values (e.g., how some algorithms output pseudotime over range 0 to 1)?

@chooliu Sure! Lamian did not perform a transformation or scaling on pseudotime values. The default pseudotime values are integers, from 1, 2, ...., up to the number of cells, which are the output from the default pseudotime inference method TSCAN.

Hi! I've been encountering this same issue. All my cells have pseudotime values that are doubles ranging from 0.5 to ~25, with some cells having the same pseudotime value. If I rank the pseudotime instead, there may be some ties. Will that affect how the program runs or performs its calculations?

Hi @mehtar1 Yes. Pseudotime ties may affect some of the calculate. One way to get around with it is to average the counts of the cells that have the same pseudotime, or remove duplicated ones.

Hi @mehtar1 Yes. Pseudotime ties may affect some of the calculate. One way to get around with it is to average the counts of the cells that have the same pseudotime, or remove duplicated ones.

To test this, I selected just one cell from the tied pseudotime scores and it seems that I can run lamian_test without error now. However, our larger dataset will have many more pseudotime ties. Is there no way to include cells with the same pseudotime? Averaging the counts won't accurately represent our dataset while removing the duplicated ones will cause a large loss of data. Would love to hear if you have any potential fixes or workarounds!

Hi. Great package! I am running into a similar problem as above, and I was looking into perhaps if it has to do with the pseudotime distribution within samples and conditions as you mentioned beforehand. In this case, would the number of cells in each sample matter? As in if a couple of my samples have few numbers in comparison to others (ex: 500 cells vs 1000 cells) (extreme ex: 100 cells vs 1000 cells), could that potentially lead to the above lamian_test error? Would the number of cells within each condition also lead to an issue (ex: disease ~5000 cells and control ~9000 cells)?

Sorry for very delayed follow-up, but re: our discussion on Sept 17th: my pseudotime distributions seem qualitatively pretty similar across samples and conditions, with few ties -- so seems likely that there are multiple possible "failure modes" for lamian_test() / the cholesky decomposition.

My nuclei numbers (10X Multiome): 2,000 - 4,000 / sample, 2 samples per condition so maybe more balanced than the situation that Kyla discusses.

I haven't had the chance yet to read the final Lamian publication (congrats!) to see how these counts compare to the demonstrated use cases.

Hi, would it be helpful if you disfunction the knot selection? For example, set the number of knots as 3.

I had been through same issues such as
'chol2inv': the leading minor of order 1 is not positive definite Traceback:
and
Error in fit[[1]]$fitres.full : $ operator is invalid for atomic vectors

but those were disappeared after i filtered out samples with low cell number ( < 10).
I think maybe it has sth to do with the cell number if it's too low to calculate some statistics.