Error running scDblFinder
Closed this issue · 7 comments
Hi @plger,
While running scDblFinder
function, I've found the following error:
seurat <- readRDS("data.rds")
dim(seurat)
[1] 30791 277065
# Doublet Identification
set.seed(12345)
sce <- as.SingleCellExperiment(seurat)
sce <- scDblFinder(sce, sample="sample", BPPARAM=MulticoreParam(8))
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
convergence criterion below machine epsilon
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
did not converge--results might be invalid!; try increasing work or maxit
rro: BiocParallel errors
1 remote errors, element index: 3
0 unevaluated and other errors
first remote error:
Error in value[[3L]](cond): An error occured while processing sample 'Stephenson et al., 2021':
Error in if (any(w <- knn$distance == 0)) knn$distance[w] <- min(knn$distance[knn$distance[, : valor ausente onde TRUE/FALSE necessário
I suspect that my dataset might be a little too large, since this error doesn't happen on smaller ones (e.g. < 100k cells). I tried to increase CPU cores to up 16, but I just run out of RAM (192gb). Also, this dataset is made from different datasets, with differences sizes each. I don't know if may be influencing on this error, but worth pointing that out.
I would appreciate any support or ideas on this issue. Thanks.
Hi,
thanks for reporting. I doubt that's really related to the dataset size, as it's been run on much larger datasets, but could you report the sizes of each sample, i.e. table(sce$sample)
?
In addition it would be really helpful if you could run the following
sce <- scDblFinder(sce[,which(sce$sample=="Stephenson et al., 2021")])
and then, when the error occurs, run traceback()
and report the output.
Thanks,
plger
Hi @plger,
Just a minor correction. In my previous post, I said I was using the sample
column, but I'm actually using the study
one. Heres the size of it:
table(sce$study)
name et al., 2019 name et al., 2019 Stephenson et al., 2021
159138 13248 104679
Independently of that, both run in a similar error again:
> sce <- scDblFinder(sce[,which(sce$study=="Stephenson et al., 2021")])
Error in serialize(data, node$con, xdr = FALSE) :
erro ao escrever na conexão
Além disso: Warning message:
In scDblFinder(sce[, which(sce$study == "Stephenson et al., 2021")]) :
You are trying to run scDblFinder on a very large number of cells. If these are from different captures, please specify this using the `samples` argument.TRUE
Creating ~25000 artificial doublets...
Dimensional reduction
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
convergence criterion below machine epsilon
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
did not converge--results might be invalid!; try increasing work or maxit
Evaluating kNN...
Error in if (any(w <- knn$distance == 0)) knn$distance[w] <- min(knn$distance[knn$distance[, :
valor ausente onde TRUE/FALSE necessário
Além disso: Warning message:
In rpois(nrow(x) * length(wAd), as.numeric(as.matrix(x[, wAd]))) :
NAs produzidos
Using traceback()
:
2: .evaluateKNN(pca, ctype, ado2, expected = ex, k = k)
1: scDblFinder(sce[, which(sce$study == "Stephenson et al., 2021")])
I'm starting to think that there's just something wrong with this Stephenson et al., 2021
dataset, since in downstream analysis like in sctransform()
it incurs in errors that I've never seen before. If you have any further insights into this issue I would appreciate. Otherwise, I prone to remove this dataset and move on. Thanks for your help again.
Dimitri
Hi,
thanks for the extra info.
First, the sample
argument should be given the individual captures, rather than entire study (I'm assuming that the whole study, i.e. ~100k cells, was not done in a single capture). That in itself will already massively reduce the load. (While scDblFinder has been used with several hundreds of thousands of cells, it was typically 10x data, so individual captures were somewhere between 1-20k cells.) This being said, the error doesn't look like a memory issue.
Could you report the quantiles of library sizes for this Stephenson study? (e.g. quantile(colSums(counts(sce)[,which(sce$study == "Stephenson et al., 2021")]))
if they're not already stored somewhere...)
Thanks,
Pierre-Luc
Hi @plger,
I`m sorry for the late response. Heres what I got:
quantile(colSums(counts(sce)[,which(sce$study == "Stephenson et al., 2021")]))
0% 25% 50% 75% 100%
305.6683 1915.5127 2190.4214 2477.9051 3979.5989
Thanks once again
Hi,
worth checking issue 97, which had a similar error message. Specifically, check whether any(counts(sce)<0)
.
Thanks for pointing that, I`m going to check it out. I appreciate your assistance @plger
Best,
Dimitri
please let me know whether you had the same issue.