samuel-marsh/scCustomize

MAD_stats mad_num variable doesn't change value

Closed this issue · 5 comments

Hi!! The mad_num variable in MAD_stats doesn't work.

MAD_Stats(seurat_object = pbmc3k, group_by_var = "orig.ident", 
          mad_num = 1, default_var = TRUE)
MAD_Stats(seurat_object = pbmc3k, group_by_var = "orig.ident", 
          mad_num = 2, default_var = TRUE)
MAD_Stats(seurat_object = pbmc3k, group_by_var = "orig.ident", 
          mad_num = 3, default_var = TRUE)
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.9 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.15.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pbmc3k.SeuratData_3.1.4     SeuratData_0.2.2.9001       scCustomize_2.1.2           wesanderson_0.3.7          
 [5] cowplot_1.1.3               ggvenn_0.1.10               fossil_0.4.0                shapefiles_0.7.2           
 [9] foreign_0.8-86              maps_3.4.2                  ggpointdensity_0.1.0        reshape2_1.4.4             
[13] readxl_1.4.3                ComplexHeatmap_2.18.0       scDblFinder_1.14.0          SingleCellExperiment_1.24.0
[17] SummarizedExperiment_1.32.0 Biobase_2.62.0              GenomicRanges_1.54.1        GenomeInfoDb_1.38.8        
[21] IRanges_2.36.0              S4Vectors_0.40.2            BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
[25] matrixStats_1.3.0           harmony_1.2.0               Rcpp_1.0.12                 patchwork_1.2.0            
[29] SeuratWrappers_0.3.5        miQC_1.10.0                 sctransform_0.4.1           Matrix_1.6-5               
[33] devtools_2.4.5              usethis_2.2.3               data.table_1.15.4           alevinQC_1.16.1            
[37] fishpond_2.6.2              ProjecTILs_3.3.1            scRepertoire_2.0.0          Seurat_5.0.2               
[41] SeuratObject_5.0.1          sp_2.1-3                    lubridate_1.9.3             forcats_1.0.0              
[45] stringr_1.5.1               dplyr_1.1.4                 purrr_1.0.2                 readr_2.1.5                
[49] tidyr_1.3.1                 tibble_3.2.1                ggplot2_3.5.0               tidyverse_2.0.0            

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2         urlchecker_1.0.1          nnet_7.3-19               goftest_1.2-3            
  [5] DT_0.33                   Biostrings_2.70.3         vctrs_0.6.5               spatstat.random_3.2-3    
  [9] shape_1.4.6.1             digest_0.6.35             png_0.1-8                 ggrepel_0.9.5            
 [13] deldir_2.0-4              parallelly_1.37.1         MASS_7.3-60.0.1           foreach_1.5.2            
 [17] httpuv_1.6.15             withr_3.0.0               ggrastr_1.0.2             xfun_0.43                
 [21] ellipsis_0.3.2            survival_3.5-8            memoise_2.0.1             ggbeeswarm_0.7.2         
 [25] janitor_2.2.0             MatrixModels_0.5-3        profvis_0.3.8             GlobalOptions_0.1.2      
 [29] zoo_1.8-12                gtools_3.9.5              pbapply_1.7-2             R.oo_1.26.0              
 [33] GGally_2.2.1              rematch2_2.1.2            promises_1.3.0            evmix_2.12               
 [37] httr_1.4.7                restfulr_0.0.15           globals_0.16.3            fitdistrplus_1.1-11      
 [41] ps_1.7.6                  rstudioapi_0.16.0         miniUI_0.1.1.1            generics_0.1.3           
 [45] processx_3.8.4            ggalluvial_0.12.5         curl_5.2.1                zlibbioc_1.48.2          
 [49] ScaledMatrix_1.10.0       ggraph_2.2.1              polyclip_1.10-6           GenomeInfoDbData_1.2.11  
 [53] SparseArray_1.2.4         desc_1.4.3                doParallel_1.0.17         xtable_1.8-4             
 [57] pracma_2.4.4              evaluate_0.23             S4Arrays_1.2.1            hms_1.1.3                
 [61] irlba_2.3.5.1             colorspace_2.1-0          ROCR_1.0-11               reticulate_1.35.0        
 [65] spatstat.data_3.0-4       snakecase_0.11.1          flexmix_2.3-19            magrittr_2.0.3           
 [69] lmtest_0.9-40             later_1.3.2               viridis_0.6.5             modeltools_0.2-23        
 [73] lattice_0.22-6            spatstat.geom_3.2-9       future.apply_1.11.2       SparseM_1.81             
 [77] scattermore_1.2           XML_3.99-0.16.1           scuttle_1.12.0            RcppAnnoy_0.0.22         
 [81] pillar_1.9.0              nlme_3.1-164              iterators_1.0.14          compiler_4.3.2           
 [85] beachmat_2.18.1           RSpectra_0.16-1           stringi_1.8.3             tensor_1.5               
 [89] GenomicAlignments_1.38.2  plyr_1.8.9                crayon_1.5.2              abind_1.4-5              
 [93] BiocIO_1.12.0             scater_1.30.1             ggdendro_0.2.0            locfit_1.5-9.9           
 [97] graphlayouts_1.1.1        scGate_1.6.0              codetools_0.2-20          BiocSingular_1.18.0      
[101] paletteer_1.6.0           GetoptLong_1.0.5          plotly_4.10.4             mime_0.12                
[105] splines_4.3.2             circlize_0.4.16           fastDummies_1.7.3         quantreg_5.97            
[109] sparseMatrixStats_1.14.0  cellranger_1.1.0          knitr_1.46                utf8_1.2.4               
[113] clue_0.3-65               fs_1.6.3                  listenv_0.9.1             evd_2.3-6.1              
[117] DelayedMatrixStats_1.24.0 pkgbuild_1.4.4            gsl_2.1-8                 callr_3.7.6              
[121] statmod_1.5.0             tzdb_0.4.0                tweenr_2.0.3              pkgconfig_2.0.3          
[125] pheatmap_1.0.12           tools_4.3.2               cachem_1.0.8              viridisLite_0.4.2        
[129] fastmap_1.1.1             rmarkdown_2.26            scales_1.3.0              ica_1.0-3                
[133] shinydashboard_0.7.2      Rsamtools_2.18.0          ggprism_1.0.5             BiocManager_1.30.22      
[137] ggstats_0.6.0             dotCall64_1.1-1           RANN_2.6.1                farver_2.1.1             
[141] tidygraph_1.3.1           yaml_2.3.8                VGAM_1.1-10               rtracklayer_1.62.0       
[145] cli_3.6.2                 UCell_2.6.2               leiden_0.4.3.1            lifecycle_1.0.4          
[149] uwot_0.1.16               bluster_1.12.0            sessioninfo_1.2.2         tximport_1.30.0          
[153] BiocParallel_1.36.0       timechange_0.3.0          gtable_0.3.4              rjson_0.2.21             
[157] ggridges_0.5.6            progressr_0.14.0          cubature_2.1.0            parallel_4.3.2           
[161] limma_3.58.1              jsonlite_1.8.8            edgeR_4.0.16              RcppHNSW_0.6.0           
[165] bitops_1.0-7              svMisc_1.2.3              xgboost_1.7.7.1           Rtsne_0.17               
[169] spatstat.utils_3.0-4      BiocNeighbors_1.20.2      metapod_1.10.1            dqrng_0.3.2              
[173] R.utils_2.12.3            truncdist_1.0-2           lazyeval_0.2.2            shiny_1.8.1.1            
[177] htmltools_0.5.8.1         iNEXT_3.0.1               rappdirs_0.3.3            glue_1.7.0               
[181] spam_2.10-0               XVector_0.42.0            RCurl_1.98-1.14           scran_1.30.2             
[185] gridExtra_2.3             igraph_2.0.3              R6_2.5.1                  labeling_0.4.3           
[189] STACAS_2.2.2              cluster_2.1.6             pkgload_1.3.4             stringdist_0.9.12        
[193] DelayedArray_0.28.0       tidyselect_1.2.1          vipor_0.4.7               ggforce_0.4.2            
[197] future_1.33.2             rsvd_1.0.5                munsell_0.5.1             KernSmooth_2.23-22       
[201] htmlwidgets_1.6.4         RColorBrewer_1.1-3        rlang_1.1.3               spatstat.sparse_3.0-3    
[205] spatstat.explore_3.2-7    remotes_2.5.0             fansi_1.0.6               beeswarm_0.4.0  

Also dropping what I got as output of the code above. Thanks!!

Screenshot 2024-05-08 at 10 04 50 AM

Yup, you’re correct. Sorry about the error.
I will work on bug fix later today and comment here when live.

Thanks!
Sam

I think the fix is to change to:

    mad_by_group <- meta_data %>% group_by(.data[[group_by_var]]) %>% 
        summarise(across(all_of(all_variables), mad)) %>%
        mutate(across(all_of(all_variables), ~ .x * mad_num))                      # <--- this line was added
    mad_overall <- meta_data %>% summarise(across(all_of(all_variables), 
        mad)) %>%
        mutate(across(all_of(all_variables), ~ .x * mad_num))                      # <--- this line was added

@samuel-marsh can you also comment here about how one uses this in the context of filtering potentially poor quality cells? For example, scater computes the MAD (* a scalar, default=3) then finds outliers based on that (isOutlier). What is the equivalent next step after MAD_Stats is called here?

Yup that totally works.
I just implemented slightly differently but end result is the same.

  mad_by_group <- meta_data %>%
    group_by(.data[[group_by_var]]) %>%
    summarise(across(all_of(all_variables), mad)*mad_num)
  
  # Calculate overall medians
  mad_overall <- meta_data %>%
    summarise(across(all_of(all_variables), mad)*mad_num)

I also added a check that was in original version I wrote some time ago but got lost somewhere along the way to ensure that mad_num is greater than 0.

Both of those fixes are now live in "release/2.2.0" branch (v2.1.2.9056 or greater). If you run into any issues after updating to that branch please let me know and I'll reopen the issue.

Best,
Sam

@jeremymsimon heading into meeting currently, but will return and offer what I have on the question of filtering here as well.