plger/scDblFinder

"did not converge" Error on cellbender3

Closed this issue · 8 comments

Dear scDblFinder developer,

This is a first time I am trying to use your tool. Unfortunately , I am getting an error and not sure how to fix it:
Running on Linux, Ubuntu with 250 RAM, CPU: 64, 3T free space

# 1.0 Validate assay version of the Seurat object -  Assay-v5
cell_bender_seurat[["RNA"]]  # Assay (v5) data with 36601 features for 77863 cell

# 1.1 Convert v5 to v3.
cell_bender_seurat[["RNA3"]] <- as(object = cell_bender_seurat[["RNA"]], Class = "Assay")
cell_bender_seurat
cell_bender_seurat[["RNA3"]]  # Assay  data with 36601 features for 77863 cells

# 1.2 Convert to sce
sce = as.SingleCellExperiment(cell_bender_seurat, assay ="RNA3")
sce

class: SingleCellExperiment
dim: 36601 75331
metadata(0):
assays(2): counts logcounts
rownames(36601): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(0):
colnames(75331): L25_ACGCAGCCAAACAACA-1 L25_CGACCTTTCGATCCCT-1 ...
  S55_GTTAAGCGTCTAGGTT-1 S55_TGCTACCGTCGCGTGT-1
colData names(23): orig.ident nCount_RNA ... clonotype_id ident
reducedDimNames(5): PCA INTEGRATED.CCA INTEGRATED.RPCA UMAP.CCA
  UMAP.SCVI
mainExpName: RNA3
altExpNames(0):


# 1.3 Find doublets (multiple samples x8)
sce.standard <- scDblFinder(sce, samples = "orig.ident", BPPARAM=MulticoreParam(20))   # fails, error message above

_Error in manager$availability[[as.character(result$node)]] <- TRUE :
  wrong args for environment subassignment
Error in serialize(data, node$con, xdr = FALSE) :
  error writing to connection

Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  did not converge--results might be invalid!; try increasing work or maxit
Stop worker failed with the error: wrong args for environment subassignment_

I'd appreciate any suggestions.
Thank you

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /data/bin/conda_env_location/PDX_manuscript_2023_v2/lib/libopenblasp-r0.3.26.so; LAPACK version 3.12.0

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] BiocParallel_1.36.0 scDblFinder_1.16.0
[3] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
[5] Biobase_2.62.0 GenomicRanges_1.54.1
[7] GenomeInfoDb_1.38.1 IRanges_2.36.0
[9] S4Vectors_0.40.2 BiocGenerics_0.48.1
[11] MatrixGenerics_1.14.0 matrixStats_1.2.0
[13] Seurat_5.0.1 SeuratObject_5.0.0
[15] sp_2.1-3

loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.2
[3] later_1.3.2 BiocIO_1.12.0
[5] bitops_1.0-7 tibble_3.2.1
[7] polyclip_1.10-6 XML_3.99-0.16.1
[9] fastDummies_1.7.3 lifecycle_1.0.4
[11] edgeR_4.0.2 globals_0.16.2
[13] lattice_0.22-5 MASS_7.3-60
[15] magrittr_2.0.3 limma_3.58.1
[17] plotly_4.10.4 yaml_2.3.8
[19] metapod_1.10.0 httpuv_1.6.14
[21] sctransform_0.4.1 spam_2.10-0
[23] spatstat.sparse_3.0-3 reticulate_1.35.0
[25] cowplot_1.1.3 pbapply_1.7-2
[27] RColorBrewer_1.1-3 abind_1.4-5
[29] zlibbioc_1.48.0 Rtsne_0.17
[31] purrr_1.0.2 RCurl_1.98-1.14
[33] GenomeInfoDbData_1.2.11 ggrepel_0.9.5
[35] irlba_2.3.5.1 listenv_0.9.1
[37] spatstat.utils_3.0-4 goftest_1.2-3
[39] RSpectra_0.16-1 dqrng_0.3.2
[41] spatstat.random_3.2-2 fitdistrplus_1.1-11
[43] parallelly_1.36.0 DelayedMatrixStats_1.24.0
[45] leiden_0.4.3.1 codetools_0.2-19
[47] DelayedArray_0.28.0 scuttle_1.12.0
[49] tidyselect_1.2.0 ScaledMatrix_1.10.0
[51] viridis_0.6.5 spatstat.explore_3.2-6
[53] GenomicAlignments_1.38.0 jsonlite_1.8.8
[55] BiocNeighbors_1.20.0 ellipsis_0.3.2
[57] progressr_0.14.0 ggridges_0.5.6
[59] survival_3.5-7 scater_1.30.1
[61] tools_4.3.2 ica_1.0-3
[63] Rcpp_1.0.12 glue_1.7.0
[65] gridExtra_2.3 SparseArray_1.2.2
[67] dplyr_1.1.4 fastmap_1.1.1
[69] bluster_1.12.0 fansi_1.0.6
[71] digest_0.6.34 rsvd_1.0.5
[73] R6_2.5.1 mime_0.12
[75] colorspace_2.1-0 scattermore_1.2
[77] tensor_1.5 spatstat.data_3.0-4
[79] utf8_1.2.4 tidyr_1.3.1
[81] generics_0.1.3 data.table_1.14.10
[83] rtracklayer_1.62.0 httr_1.4.7
[85] htmlwidgets_1.6.4 S4Arrays_1.2.0
[87] uwot_0.1.16 pkgconfig_2.0.3
[89] gtable_0.3.4 lmtest_0.9-40
[91] XVector_0.42.0 htmltools_0.5.7
[93] dotCall64_1.1-1 scales_1.3.0
[95] png_0.1-8 scran_1.30.0
[97] reshape2_1.4.4 rjson_0.2.21
[99] nlme_3.1-164 zoo_1.8-12
[101] stringr_1.5.1 KernSmooth_2.23-22
[103] parallel_4.3.2 miniUI_0.1.1.1
[105] vipor_0.4.7 restfulr_0.0.15
[107] pillar_1.9.0 grid_4.3.2
[109] vctrs_0.6.5 RANN_2.6.1
[111] promises_1.2.1 BiocSingular_1.18.0
[113] beachmat_2.18.0 xtable_1.8-4
[115] cluster_2.1.6 beeswarm_0.4.0
[117] locfit_1.5-9.8 cli_3.6.2
[119] compiler_4.3.2 Rsamtools_2.18.0
[121] rlang_1.1.3 crayon_1.5.2
[123] future.apply_1.11.1 plyr_1.8.9
[125] ggbeeswarm_0.7.2 stringi_1.8.3
[127] viridisLite_0.4.2 deldir_2.0-2
[129] munsell_0.5.0 Biostrings_2.70.1
[131] lazyeval_0.2.2 spatstat.geom_3.2-8
[133] Matrix_1.6-1.1 RcppHNSW_0.6.0
[135] patchwork_1.2.0 sparseMatrixStats_1.14.0
[137] future_1.33.1 ggplot2_3.4.4
[139] statmod_1.5.0 shiny_1.8.0
[141] ROCR_1.0-11 igraph_1.6.0
[143] xgboost_2.0.3.1

Hi,
I've never seen this error, but this could be a memory and/or multithreading issue.
I'd recommend to check the following:

  1. monitor your RAM usage when running scDblFinder (e.g. using htop).
  2. the package per se is not very memory hungry (it's been ran on much larger datasets), but the object itself can be, in particular earlier versions of as.SingleCellExperiment had a bug that made the object huge (although this should be solved in the version you're using). So check the size (e.g. using format(object.size(x), units="Gb") of both cell_bender_seurat and sce. If you see that sce is much bigger, you can always skip the conversion and run scDblFinder with something like:
sce <- scDblFinder(GetAssayData(cell_bender_seurat, slot="counts", assay="RNA3"), 
                   samples=cell_bender_seurat$orig.ident)
  1. If from htop it does seem to be memory-related, try reducing the number of threads (or eventually using a single one).

thank you for the prompt response

  1. It looks normal (below 1%)
  2. seems ok
format(object.size(cell_bender_seurat), units="Gb") #  "8 Gb"
format(object.size(sce), units="Gb")   #  "2.9 Gb"

A) Could it be something to do with how Seurat v.5 has layers ( 8 sample 8 layers for counts for example), and when I convert it to Array v.3 it becomes one matrix 36601 x 75331?

B) Tried to run without threads:

sce.standard <- scDblFinder(sce, samples = "orig.ident")

Warning messages:
1: In rpois(nrow(x) * length(wAd), as.numeric(as.matrix(x[, wAd]))) :
  NAs produced
2: In value[[3L]](cond) :
  Error in calculating norm factors:Error in .local(x, ...): size factors should be positive

C) Tried this too

sce <- scDblFinder(GetAssayData(cell_bender_seurat, slot="counts", assay="RNA3"),
                   samples=cell_bender_seurat$orig.ident)
Error in .checkSCE(sce) :
  `sce` should be a SingleCellExperiment, a SummarizedExperiment, or an array (i.e. matrix, sparse matric, etc.) of counts.
In addition: Warning message:
The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Not sure I understand your question A, the original Seurat object also has dimensions 36601 x 75331...

  • Can you check class(GetAssayData(cell_bender_seurat, layer="counts", assay="RNA3"))?
  • Can you check quantile(colSums(counts(sce)))
  • Can you try this:
    sce.standard <- scDblFinder(sce[VariableFeatures(cell_bender_seurat),], samples = "orig.ident")
class(GetAssayData(cell_bender_seurat, layer="counts", assay="RNA3"))
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
quantile(colSums(counts(sce)))
   0%   25%   50%   75%  100%
  201   650  2209  5732 81977
  1. It is running since 1 hr - I hope it is a good sign
    sce.standard <- scDblFinder(sce[VariableFeatures(cell_bender_seurat),], samples = "orig.ident")

I'm unsure what's the issue here, but it appears to be related to 1) the fact that you have cells with a very low library size (your 201 is crap, personally I'd have filtered out many) and 2) the feature selection internal to scDblFinder might have resulted in some cells not having reads in those features. This appears to have been solved by using the VariableFeatures (which is a perfectly decent way of doing things), or would most likely also be solved by filtering out cells with a low library size (e.g. taking >=400-500).

If you want you can try again with multithreading, user either of these 2 solutions.

how long in average does it take to run scDblFinder ?

  1. its been ~5 hrs
  2. filtered out data, which eventually crashed
quantile(colSums(counts(sce)))
   0%   25%   50%   75%  100%
  451  1189  3332  6480 81977
sce.standard <- scDblFinder(sce, samples = "orig.ident", BPPARAM=MulticoreParam(8))
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  convergence criterion below machine epsilon
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  did not converge--results might be invalid!; try increasing work or maxit

Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  convergence criterion below machine epsilon
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  did not converge--results might be invalid!; try increasing work or maxit

Stop worker failed with the error: wrong args for environment subassignment

I figure out why I was getting that error, few steps back in my analysis:

I removed ambient RNA with Cell Bender v3, which generated negative values in the count matrix, that's why scDblFinder() was not able to process my data. The issue about cell bender generating a negative count matrix is discussed here htps://github.com/broadinstitute/CellBender/issues/306. To fix it run Cellbender v.2 re-run scDblFinder()

all works, quite quickly
Cheers.

Hi,
Great that we have an explanation, thanks for coming back on this.
I've now added in the devel version a check of that so that a more useful error message is provided.
Best,
Pierre-Luc