plger/scDblFinder

Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0): missing value where TRUE/FALSE needed

Closed this issue · 9 comments

twoneu commented

Hi! Thank you for this great tool. I am encountering the error in the title when running scDblFinder on a large dataset (CellRanger estimated ~20,000 cells):

Assuming the input to be a matrix of counts or expected counts.

Aggregating features...

Warning message:
"Quick-TRANSfer stage steps exceeded maximum (= 1905250)"
Creating ~11084 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0): missing value where TRUE/FALSE needed

I have not encountered this error in several other (much smaller) samples I have tried, so is this related to the dataset being too large?

Traceback:
1. scDblFinder(peak_assay, aggregateFeatures = TRUE, nfeatures = 25, 
 .     processing = "normFeatures")
2. .scDblscore(d, scoreType = score, addVals = pca[, includePCs, 
 .     drop = FALSE], threshold = threshold, dbr = dbr, dbr.sd = dbr.sd, 
 .     nrounds = nrounds, max_depth = max_depth, iter = iter, BPPARAM = BPPARAM, 
 .     features = trainingFeatures, verbose = verbose, metric = metric, 
 .     filterUnidentifiable = removeUnidentifiable, unident.th = unident.th)
3. which((d$type == "real" & doubletThresholding(d, dbr = dbr, dbr.sd = dbr.sd, 
 .     stringency = 0.7, perSample = perSample, returnType = "call") == 
 .     "doublet") | (d$type == "doublet" & d$score < unident.th & 
 .     filterUnidentifiable) | !d$include.in.training)
4. doubletThresholding(d, dbr = dbr, dbr.sd = dbr.sd, stringency = 0.7, 
 .     perSample = perSample, returnType = "call")
5. .optimThreshold(d, dbr = .gdbr(d, dbr), dbr.sd = dbr.sd, stringency = stringency)
6. optimize(totfn, c(0, 1), maximum = FALSE)
7. (function (arg) 
 . f(arg, ...))(0.381966011250105)
8. f(arg, ...)
9. .prop.dev(d$type, d$score, expected, x)

Session info

`R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/envs/NET_R_env/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3                  
 [3] Biostrings_2.66.0                 XVector_0.38.0                   
 [5] CopyscAT_0.40                     MASS_7.3-60                      
 [7] jsonlite_1.8.4                    sp_1.6-1                         
 [9] rtracklayer_1.58.0                gplots_3.1.3                     
[11] tibble_3.2.1                      tidyr_1.3.0                      
[13] edgeR_3.40.2                      limma_3.54.2                     
[15] stringr_1.5.0                     mclust_6.0.0                     
[17] changepoint_2.2.4                 zoo_1.8-12                       
[19] data.table_1.14.8                 igraph_1.4.3                     
[21] FNN_1.1.3.2                       Rtsne_0.16                       
[23] biomaRt_2.54.0                    fastcluster_1.2.3                
[25] NMF_0.26                          cluster_2.1.4                    
[27] rngtools_1.5.2                    registry_0.5-1                   
[29] viridis_0.6.3                     viridisLite_0.4.2                
[31] dplyr_1.1.2                       RColorBrewer_1.1-3               
[33] scDblFinder_1.13.13               SingleCellExperiment_1.20.1      
[35] SummarizedExperiment_1.28.0       MatrixGenerics_1.10.0            
[37] matrixStats_0.63.0                glue_1.6.2                       
[39] ggplot2_3.4.1                     EnsDb.Hsapiens.v86_2.99.0        
[41] ensembldb_2.22.0                  AnnotationFilter_1.22.0          
[43] GenomicFeatures_1.50.2            AnnotationDbi_1.60.0             
[45] Biobase_2.58.0                    GenomicRanges_1.50.2             
[47] GenomeInfoDb_1.34.9               IRanges_2.32.0                   
[49] S4Vectors_0.36.2                  BiocGenerics_0.44.0              
[51] Signac_1.10.0                     SeuratObject_4.1.3               
[53] Seurat_4.3.0                     

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3            pbdZMQ_0.3-9             
  [3] scattermore_1.0           bit64_4.0.5              
  [5] irlba_2.3.5.1             DelayedArray_0.24.0      
  [7] KEGGREST_1.38.0           RCurl_1.98-1.12          
  [9] doParallel_1.0.17         generics_0.1.3           
 [11] ScaledMatrix_1.6.0        cowplot_1.1.1            
 [13] RSQLite_2.2.20            RANN_2.6.1               
 [15] future_1.32.0             bit_4.0.5                
 [17] spatstat.data_3.0-1       xml2_1.3.4               
 [19] httpuv_1.6.11             hms_1.1.3                
 [21] evaluate_0.21             promises_1.2.0.1         
 [23] fansi_1.0.4               restfulr_0.0.15          
 [25] progress_1.2.2            caTools_1.18.2           
 [27] dbplyr_2.3.1              DBI_1.1.3                
 [29] htmlwidgets_1.6.2         spatstat.geom_3.2-1      
 [31] purrr_1.0.1               ellipsis_0.3.2           
 [33] gridBase_0.4-7            deldir_1.0-9             
 [35] sparseMatrixStats_1.10.0  vctrs_0.6.2              
 [37] ROCR_1.0-11               abind_1.4-5              
 [39] cachem_1.0.8              withr_2.5.0              
 [41] progressr_0.13.0          sctransform_0.3.5        
 [43] GenomicAlignments_1.34.1  prettyunits_1.1.1        
 [45] scran_1.26.2              goftest_1.2-3            
 [47] IRdisplay_1.1             lazyeval_0.2.2           
 [49] crayon_1.5.2              spatstat.explore_3.2-1   
 [51] pkgconfig_2.0.3           nlme_3.1-162             
 [53] vipor_0.4.5               ProtGenerics_1.30.0      
 [55] rlang_1.1.0               globals_0.16.2           
 [57] lifecycle_1.0.3           miniUI_0.1.1.1           
 [59] filelock_1.0.2            BiocFileCache_2.6.0      
 [61] rsvd_1.0.5                polyclip_1.10-4          
 [63] lmtest_0.9-40             Matrix_1.5-4             
 [65] IRkernel_1.3.2            base64enc_0.1-3          
 [67] beeswarm_0.4.0            ggridges_0.5.4           
 [69] png_0.1-8                 rjson_0.2.21             
 [71] bitops_1.0-7              KernSmooth_2.23-21       
 [73] blob_1.2.3                DelayedMatrixStats_1.20.0
 [75] parallelly_1.35.0         spatstat.random_3.1-5    
 [77] beachmat_2.14.2           scales_1.2.1             
 [79] memoise_2.0.1             magrittr_2.0.3           
 [81] plyr_1.8.8                ica_1.0-3                
 [83] zlibbioc_1.44.0           compiler_4.2.2           
 [85] dqrng_0.3.0               BiocIO_1.8.0             
 [87] fitdistrplus_1.1-11       Rsamtools_2.14.0         
 [89] cli_3.6.1                 listenv_0.9.0            
 [91] patchwork_1.1.2           pbapply_1.7-0            
 [93] tidyselect_1.2.0          stringi_1.7.12           
 [95] yaml_2.3.7                BiocSingular_1.14.0      
 [97] locfit_1.5-9.7            ggrepel_0.9.3            
 [99] grid_4.2.2                fastmatch_1.1-3          
[101] tools_4.2.2               future.apply_1.10.0      
[103] parallel_4.2.2            uuid_1.1-0               
[105] bluster_1.8.0             foreach_1.5.2            
[107] metapod_1.6.0             gridExtra_2.3            
[109] digest_0.6.31             BiocManager_1.30.20      
[111] shiny_1.7.4               Rcpp_1.0.10              
[113] scuttle_1.8.4             later_1.3.1              
[115] RcppAnnoy_0.0.20          httr_1.4.5               
[117] colorspace_2.1-0          XML_3.99-0.14            
[119] tensor_1.5                reticulate_1.28          
[121] splines_4.2.2             uwot_0.1.14              
[123] RcppRoll_0.3.0            statmod_1.5.0            
[125] spatstat.utils_3.0-3      scater_1.26.1            
[127] xgboost_1.7.5.1           plotly_4.10.1            
[129] xtable_1.8-4              R6_2.5.1                 
[131] pillar_1.9.0              htmltools_0.5.5          
[133] mime_0.12                 fastmap_1.1.1            
[135] BiocParallel_1.32.6       BiocNeighbors_1.16.0     
[137] codetools_0.2-19          utf8_1.2.3               
[139] lattice_0.21-8            spatstat.sparse_3.0-1    
[141] curl_4.3.3                ggbeeswarm_0.7.2         
[143] leiden_0.4.3              gtools_3.9.4             
[145] survival_3.5-5            repr_1.1.6               
[147] munsell_0.5.0             GenomeInfoDbData_1.2.9   
[149] iterators_1.0.14          reshape2_1.4.4           
[151] gtable_0.3.3             `
plger commented

Hi,
20k cells certainly shouldn't be a problem, how many peaks do you have? Do you have mbkmeans installed?
Can you reproduce the error without the aggregation (i.e. just scDblFinder(peak_assay) ) ?
I've never seen this error, so if it'd be possible to share the object (ideally smaller if you can still reproduce the error, genes/samples can obviously be scrambled) it'll make debugging easier.
Pierre-Luc

plger commented

Hi @twoneu ,
please respond or I'll close the issue.

I also have same error. Did you solve it? I have also just 20000 cell( in cellranger websummary). but My another data have also 20000cell. but it ran fine without error.

plger commented

Then please provide the extra info requested above.

Owner
counts <- Read10X_h5("/data/jrgong/AD_multiome/fastq/aggr_2023_0616/aggr_2023_0616/outs/filtered_feature_bc_matrix.h5")
fragpath <- "/data/jrgong/AD_multiome/fastq//aggr_2023_0616/aggr_2023_0616/outs/atac_fragments.tsv.gz"

#get gene annotations for hg 38
annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)

seqlevelsStyle(annotation) <- "UCSC"

#create a Seurat object containing the RNA data

AH_Mul <- CreateSeuratObject(counts = counts$Gene Expression, assay = "RNA")

#create a Seurat object containing the ATAC data
AH_Mul[["ATAC"]] <- CreateChromatinAssay(counts = counts$Peaks, sep = c(":","-"), fragments = fragpath, annotation = annotation)

AH.PBMC.7 <- subset(AH_Mul, subset=aggr_number=="7")

ce <- scDblFinder(SingleCellExperiment(list(counts=AH.PBMC.7@assays$RNA@counts)))
AH.PBMC.7$doublet_scores <- sce$scDblFinder.score
AH.PBMC.7$doublet_class <- sce$scDblFinder.class

and then I saw the error like this :

#Creating ~16000 artificial doublets...
#Dimensional reduction
#Evaluating kNN...
#Training model...
#Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0) :
#missing value where TRUE/FALSE needed

could you let me know how to solve it?

plger commented
twoneu commented

Hi @plger, sorry for the delay! What is the best way to share the data with you?

plger commented

If you don't have a drive/platform where you can put it, send me an email at pierre-luc.germain [at ] hest.ethz.ch and I'll give you a link.

twoneu commented

Thank you @plger for helping me solve this issue! I was able to successfully run scDblFinder by:

  1. installing the mbkmeans package
  2. Rerunning scDblFinder once the package was installed.