Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0): missing value where TRUE/FALSE needed

Question

Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0): missing value where TRUE/FALSE needed

Closed this issue a year ago · 9 comments

Hi! Thank you for this great tool. I am encountering the error in the title when running scDblFinder on a large dataset (CellRanger estimated ~20,000 cells):

Assuming the input to be a matrix of counts or expected counts.

Aggregating features...

Warning message:
"Quick-TRANSfer stage steps exceeded maximum (= 1905250)"
Creating ~11084 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0): missing value where TRUE/FALSE needed

I have not encountered this error in several other (much smaller) samples I have tried, so is this related to the dataset being too large?

Traceback:
1. scDblFinder(peak_assay, aggregateFeatures = TRUE, nfeatures = 25, 
 .     processing = "normFeatures")
2. .scDblscore(d, scoreType = score, addVals = pca[, includePCs, 
 .     drop = FALSE], threshold = threshold, dbr = dbr, dbr.sd = dbr.sd, 
 .     nrounds = nrounds, max_depth = max_depth, iter = iter, BPPARAM = BPPARAM, 
 .     features = trainingFeatures, verbose = verbose, metric = metric, 
 .     filterUnidentifiable = removeUnidentifiable, unident.th = unident.th)
3. which((d$type == "real" & doubletThresholding(d, dbr = dbr, dbr.sd = dbr.sd, 
 .     stringency = 0.7, perSample = perSample, returnType = "call") == 
 .     "doublet") | (d$type == "doublet" & d$score < unident.th & 
 .     filterUnidentifiable) | !d$include.in.training)
4. doubletThresholding(d, dbr = dbr, dbr.sd = dbr.sd, stringency = 0.7, 
 .     perSample = perSample, returnType = "call")
5. .optimThreshold(d, dbr = .gdbr(d, dbr), dbr.sd = dbr.sd, stringency = stringency)
6. optimize(totfn, c(0, 1), maximum = FALSE)
7. (function (arg) 
 . f(arg, ...))(0.381966011250105)
8. f(arg, ...)
9. .prop.dev(d$type, d$score, expected, x)

Session info

`R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/envs/NET_R_env/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3                  
 [3] Biostrings_2.66.0                 XVector_0.38.0                   
 [5] CopyscAT_0.40                     MASS_7.3-60                      
 [7] jsonlite_1.8.4                    sp_1.6-1                         
 [9] rtracklayer_1.58.0                gplots_3.1.3                     
[11] tibble_3.2.1                      tidyr_1.3.0                      
[13] edgeR_3.40.2                      limma_3.54.2                     
[15] stringr_1.5.0                     mclust_6.0.0                     
[17] changepoint_2.2.4                 zoo_1.8-12                       
[19] data.table_1.14.8                 igraph_1.4.3                     
[21] FNN_1.1.3.2                       Rtsne_0.16                       
[23] biomaRt_2.54.0                    fastcluster_1.2.3                
[25] NMF_0.26                          cluster_2.1.4                    
[27] rngtools_1.5.2                    registry_0.5-1                   
[29] viridis_0.6.3                     viridisLite_0.4.2                
[31] dplyr_1.1.2                       RColorBrewer_1.1-3               
[33] scDblFinder_1.13.13               SingleCellExperiment_1.20.1      
[35] SummarizedExperiment_1.28.0       MatrixGenerics_1.10.0            
[37] matrixStats_0.63.0                glue_1.6.2                       
[39] ggplot2_3.4.1                     EnsDb.Hsapiens.v86_2.99.0        
[41] ensembldb_2.22.0                  AnnotationFilter_1.22.0          
[43] GenomicFeatures_1.50.2            AnnotationDbi_1.60.0             
[45] Biobase_2.58.0                    GenomicRanges_1.50.2             
[47] GenomeInfoDb_1.34.9               IRanges_2.32.0                   
[49] S4Vectors_0.36.2                  BiocGenerics_0.44.0              
[51] Signac_1.10.0                     SeuratObject_4.1.3               
[53] Seurat_4.3.0                     

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3            pbdZMQ_0.3-9             
  [3] scattermore_1.0           bit64_4.0.5              
  [5] irlba_2.3.5.1             DelayedArray_0.24.0      
  [7] KEGGREST_1.38.0           RCurl_1.98-1.12          
  [9] doParallel_1.0.17         generics_0.1.3           
 [11] ScaledMatrix_1.6.0        cowplot_1.1.1            
 [13] RSQLite_2.2.20            RANN_2.6.1               
 [15] future_1.32.0             bit_4.0.5                
 [17] spatstat.data_3.0-1       xml2_1.3.4               
 [19] httpuv_1.6.11             hms_1.1.3                
 [21] evaluate_0.21             promises_1.2.0.1         
 [23] fansi_1.0.4               restfulr_0.0.15          
 [25] progress_1.2.2            caTools_1.18.2           
 [27] dbplyr_2.3.1              DBI_1.1.3                
 [29] htmlwidgets_1.6.2         spatstat.geom_3.2-1      
 [31] purrr_1.0.1               ellipsis_0.3.2           
 [33] gridBase_0.4-7            deldir_1.0-9             
 [35] sparseMatrixStats_1.10.0  vctrs_0.6.2              
 [37] ROCR_1.0-11               abind_1.4-5              
 [39] cachem_1.0.8              withr_2.5.0              
 [41] progressr_0.13.0          sctransform_0.3.5        
 [43] GenomicAlignments_1.34.1  prettyunits_1.1.1        
 [45] scran_1.26.2              goftest_1.2-3            
 [47] IRdisplay_1.1             lazyeval_0.2.2           
 [49] crayon_1.5.2              spatstat.explore_3.2-1   
 [51] pkgconfig_2.0.3           nlme_3.1-162             
 [53] vipor_0.4.5               ProtGenerics_1.30.0      
 [55] rlang_1.1.0               globals_0.16.2           
 [57] lifecycle_1.0.3           miniUI_0.1.1.1           
 [59] filelock_1.0.2            BiocFileCache_2.6.0      
 [61] rsvd_1.0.5                polyclip_1.10-4          
 [63] lmtest_0.9-40             Matrix_1.5-4             
 [65] IRkernel_1.3.2            base64enc_0.1-3          
 [67] beeswarm_0.4.0            ggridges_0.5.4           
 [69] png_0.1-8                 rjson_0.2.21             
 [71] bitops_1.0-7              KernSmooth_2.23-21       
 [73] blob_1.2.3                DelayedMatrixStats_1.20.0
 [75] parallelly_1.35.0         spatstat.random_3.1-5    
 [77] beachmat_2.14.2           scales_1.2.1             
 [79] memoise_2.0.1             magrittr_2.0.3           
 [81] plyr_1.8.8                ica_1.0-3                
 [83] zlibbioc_1.44.0           compiler_4.2.2           
 [85] dqrng_0.3.0               BiocIO_1.8.0             
 [87] fitdistrplus_1.1-11       Rsamtools_2.14.0         
 [89] cli_3.6.1                 listenv_0.9.0            
 [91] patchwork_1.1.2           pbapply_1.7-0            
 [93] tidyselect_1.2.0          stringi_1.7.12           
 [95] yaml_2.3.7                BiocSingular_1.14.0      
 [97] locfit_1.5-9.7            ggrepel_0.9.3            
 [99] grid_4.2.2                fastmatch_1.1-3          
[101] tools_4.2.2               future.apply_1.10.0      
[103] parallel_4.2.2            uuid_1.1-0               
[105] bluster_1.8.0             foreach_1.5.2            
[107] metapod_1.6.0             gridExtra_2.3            
[109] digest_0.6.31             BiocManager_1.30.20      
[111] shiny_1.7.4               Rcpp_1.0.10              
[113] scuttle_1.8.4             later_1.3.1              
[115] RcppAnnoy_0.0.20          httr_1.4.5               
[117] colorspace_2.1-0          XML_3.99-0.14            
[119] tensor_1.5                reticulate_1.28          
[121] splines_4.2.2             uwot_0.1.14              
[123] RcppRoll_0.3.0            statmod_1.5.0            
[125] spatstat.utils_3.0-3      scater_1.26.1            
[127] xgboost_1.7.5.1           plotly_4.10.1            
[129] xtable_1.8-4              R6_2.5.1                 
[131] pillar_1.9.0              htmltools_0.5.5          
[133] mime_0.12                 fastmap_1.1.1            
[135] BiocParallel_1.32.6       BiocNeighbors_1.16.0     
[137] codetools_0.2-19          utf8_1.2.3               
[139] lattice_0.21-8            spatstat.sparse_3.0-1    
[141] curl_4.3.3                ggbeeswarm_0.7.2         
[143] leiden_0.4.3              gtools_3.9.4             
[145] survival_3.5-5            repr_1.1.6               
[147] munsell_0.5.0             GenomeInfoDbData_1.2.9   
[149] iterators_1.0.14          reshape2_1.4.4           
[151] gtable_0.3.3             `

Answer 1 · 2023-06-04T10:19:31.000Z

Hi,
20k cells certainly shouldn't be a problem, how many peaks do you have? Do you have mbkmeans installed?
Can you reproduce the error without the aggregation (i.e. just scDblFinder(peak_assay) ) ?
I've never seen this error, so if it'd be possible to share the object (ideally smaller if you can still reproduce the error, genes/samples can obviously be scrambled) it'll make debugging easier.
Pierre-Luc

Answer 2 · 2023-06-19T08:55:37.000Z

Hi @twoneu ,
please respond or I'll close the issue.

Answer 3 · 2023-06-19T15:05:13.000Z

I also have same error. Did you solve it? I have also just 20000 cell( in cellranger websummary). but My another data have also 20000cell. but it ran fine without error.

Answer 4 · 2023-06-19T15:09:45.000Z

Then please provide the extra info requested above.

Answer 5 · 2023-06-22T07:33:43.000Z

Owner
counts <- Read10X_h5("/data/jrgong/AD_multiome/fastq/aggr_2023_0616/aggr_2023_0616/outs/filtered_feature_bc_matrix.h5")
fragpath <- "/data/jrgong/AD_multiome/fastq//aggr_2023_0616/aggr_2023_0616/outs/atac_fragments.tsv.gz"

#get gene annotations for hg 38
annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)

seqlevelsStyle(annotation) <- "UCSC"

#create a Seurat object containing the RNA data

AH_Mul <- CreateSeuratObject(counts = counts$Gene Expression, assay = "RNA")

#create a Seurat object containing the ATAC data
AH_Mul[["ATAC"]] <- CreateChromatinAssay(counts = counts$Peaks, sep = c(":","-"), fragments = fragpath, annotation = annotation)

AH.PBMC.7 <- subset(AH_Mul, subset=aggr_number=="7")

ce <- scDblFinder(SingleCellExperiment(list(counts=AH.PBMC.7@assays$RNA@counts)))
AH.PBMC.7$doublet_scores <- sce$scDblFinder.score
AH.PBMC.7$doublet_class <- sce$scDblFinder.class

and then I saw the error like this :

#Creating ~16000 artificial doublets...
#Dimensional reduction
#Evaluating kNN...
#Training model...
#Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0) :
#missing value where TRUE/FALSE needed

could you let me know how to solve it?

Answer 6 · 2023-06-22T08:42:47.000Z

Can you please report your session info (as one should always do)?

Answer 7 · 2023-07-04T02:28:17.000Z

Hi @plger, sorry for the delay! What is the best way to share the data with you?

Answer 8 · 2023-07-04T05:20:58.000Z

If you don't have a drive/platform where you can put it, send me an email at pierre-luc.germain [at ] hest.ethz.ch and I'll give you a link.

Answer 9 · 2023-07-09T18:23:50.000Z

Thank you @plger for helping me solve this issue! I was able to successfully run scDblFinder by:

installing the mbkmeans package
Rerunning scDblFinder once the package was installed.