plger/scDblFinder

error in serialize(data)

Closed this issue · 6 comments

wbvguo commented

Hi,

Thanks for maintaining this tool, I met a problem when trying this tool when using MulticoreParam

code:

library(scDblFinder)
library(BiocParallel)

sce = as.SingleCellExperiment(seurat_filtered)
sce = scDblFinder(sce, samples="sample_label", BPPARAM=MulticoreParam(4))

Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
Error in manager$availability[[as.character(result$node)]] <- TRUE : 
  wrong args for environment subassignment
In addition: Warning messages:
1: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
3: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection

When I remove BPPARAM=MulticoreParam(4), the code can be run through without error (although slow). so I guess it might be related to the multiple processing. The object size I am dealing with is 4.3 GB, while the server has more than 140 GB of memory, so I guess it shouldn't be the memory issue, May I ask if you have any idea about this problem and the potential solution?

Thanks,

plger commented

On what platform are you? (you should always report environment and sessionInfo())
Do you have the same problem with BPPARAM=SnowParam(4) ?

wbvguo commented

Hi, thank you for the quick reply!

Here is my sessionInfo():

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.30.4 scDblFinder_1.10.0  magrittr_2.0.3      ggplot2_3.4.2       tidyr_1.3.0         tibble_3.2.1        dplyr_1.1.1        
[8] SeuratObject_4.1.3  Seurat_4.3.0       

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                  spatstat.explore_3.1-0      reticulate_1.28             tidyselect_1.2.0            htmlwidgets_1.6.2          
  [6] grid_4.2.1                  Rtsne_0.16                  munsell_0.5.0               ScaledMatrix_1.4.1          codetools_0.2-18           
 [11] ica_1.0-3                   statmod_1.5.0               scran_1.24.1                xgboost_1.7.5.1             future_1.32.0              
 [16] miniUI_0.1.1.1              withr_2.5.0                 spatstat.random_3.1-4       colorspace_2.1-0            progressr_0.13.0           
 [21] Biobase_2.56.0              knitr_1.42                  rstudioapi_0.14             stats4_4.2.1                SingleCellExperiment_1.18.1
 [26] ROCR_1.0-11                 tensor_1.5                  listenv_0.9.0               MatrixGenerics_1.8.1        labeling_0.4.2             
 [31] GenomeInfoDbData_1.2.8      polyclip_1.10-4             farver_2.1.1                parallelly_1.35.0           vctrs_0.6.1                
 [36] generics_0.1.3              xfun_0.38                   R6_2.5.1                    GenomeInfoDb_1.32.4         ggbeeswarm_0.7.1           
 [41] rsvd_1.0.5                  locfit_1.5-9.7              bitops_1.0-7                spatstat.utils_3.0-2        DelayedArray_0.22.0        
 [46] promises_1.2.0.1            BiocIO_1.6.0                scales_1.2.1                beeswarm_0.4.0              gtable_0.3.3               
 [51] beachmat_2.12.0             globals_0.16.2              goftest_1.2-3               rlang_1.1.0                 splines_4.2.1              
 [56] rtracklayer_1.56.1          lazyeval_0.2.2              spatstat.geom_3.1-0         yaml_2.3.7                  reshape2_1.4.4             
 [61] abind_1.4-5                 httpuv_1.6.9                tools_4.2.1                 ellipsis_0.3.2              RColorBrewer_1.1-3         
 [66] BiocGenerics_0.42.0         ggridges_0.5.4              Rcpp_1.0.10                 plyr_1.8.8                  sparseMatrixStats_1.8.0    
 [71] zlibbioc_1.42.0             purrr_1.0.1                 RCurl_1.98-1.12             deldir_1.0-6                pbapply_1.7-0              
 [76] viridis_0.6.2               cowplot_1.1.1               S4Vectors_0.34.0            zoo_1.8-11                  SummarizedExperiment_1.26.1
 [81] ggrepel_0.9.3               cluster_2.1.3               data.table_1.14.8           scattermore_0.8             lmtest_0.9-40              
 [86] RANN_2.6.1                  fitdistrplus_1.1-8          matrixStats_0.63.0          patchwork_1.1.2             mime_0.12                  
 [91] evaluate_0.20               xtable_1.8-4                XML_3.99-0.14               IRanges_2.30.1              gridExtra_2.3              
 [96] compiler_4.2.1              scater_1.24.0               KernSmooth_2.23-20          crayon_1.5.2                htmltools_0.5.5            
[101] later_1.3.0                 snow_0.4-4                  DBI_1.1.3                   MASS_7.3-58                 Matrix_1.5-4               
[106] cli_3.6.1                   parallel_4.2.1              metapod_1.4.0               igraph_1.4.2                GenomicRanges_1.48.0       
[111] pkgconfig_2.0.3             GenomicAlignments_1.32.1    sp_1.6-0                    plotly_4.10.1               scuttle_1.6.3              
[116] spatstat.sparse_3.0-1       vipor_0.4.5                 dqrng_0.3.0                 XVector_0.36.0              stringr_1.5.0              
[121] digest_0.6.31               sctransform_0.3.5           RcppAnnoy_0.0.20            spatstat.data_3.0-1         Biostrings_2.64.1          
[126] rmarkdown_2.21              leiden_0.4.3                uwot_0.1.14                 edgeR_3.38.4                DelayedMatrixStats_1.18.2  
[131] restfulr_0.0.15             shiny_1.7.4                 Rsamtools_2.12.0            rjson_0.2.21                lifecycle_1.0.3            
[136] nlme_3.1-162                jsonlite_1.8.4              BiocNeighbors_1.14.0        viridisLite_0.4.1           limma_3.52.4               
[141] fansi_1.0.4                 pillar_1.9.0                lattice_0.20-45             ggrastr_1.0.1               fastmap_1.1.1              
[146] httr_1.4.5                  survival_3.5-5              glue_1.6.2                  png_0.1-8                   bluster_1.6.0              
[151] stringi_1.7.12              BiocSingular_1.12.0         irlba_2.3.5.1               future.apply_1.10.0        

I tested with BPPARAM=SnowParam(4), it did not report an error, but had the following warning message

Warning messages:
1: <anonymous>: ... may be used in an incorrect context: 
     scDblFinder(sce[sel_features, x], clusters = clusters, dims = dims, 
         dbr = dbr, dbr.sd = dbr.sd, clustCor = clustCor, unident.th = unident.th, 
         knownDoublets = knownDoublets, knownUse = knownUse, artificialDoublets = artificialDoublets, 
         k = k, processing = processing, nfeatures = nfeatures, propRandom = propRandom, 
         includePCs = includePCs, propMarkers = propMarkers, trainingFeatures = trainingFeatures, 
         returnType = returnType, threshold = isSplitMode, score = ifelse(isSplitMode, 
             score, "weighted"), removeUnidentifiable = removeUnidentifiable, 
         verbose = FALSE, aggregateFeatures = aggregateFeatures, ...)
 
2: In serialize(data, node$con) :
  'package:stats' may not be available when loading
3: In serialize(data, node$con) :
  'package:stats' may not be available when loading
4: In serialize(data, node$con) :
  'package:stats' may not be available when loading

I have another question: is scDBlFinder a deterministic tool? If we run the tool n times, will it always give the same result?

Thanks,

plger commented

No it is not deterministic. See section 1.5.5 of the vignette to make it reproducible.

I'm afraid your first BiocParallel error isn't something I can help you with. Perhaps @LTLA has seen this before (the manager$availability I've never seen before)?

wbvguo commented

Thank you for the reply, for the non-multithreading case (say no BPPARAM parameter was used), will set.seed be sufficient to make the results reproducible?

I am closing this issue now as there are alternative ways to get around it

Thanks,

plger commented

Yes, without multithreading set.seed should be sufficient.

plger commented

Actually no, if you're using samples you need to set it in BPPARAM=SerialParam(RNGseed = seed) (see #59)