MAD_stats mad_num variable doesn't change value
Closed this issue · 5 comments
Hi!! The mad_num variable in MAD_stats doesn't work.
MAD_Stats(seurat_object = pbmc3k, group_by_var = "orig.ident",
mad_num = 1, default_var = TRUE)
MAD_Stats(seurat_object = pbmc3k, group_by_var = "orig.ident",
mad_num = 2, default_var = TRUE)
MAD_Stats(seurat_object = pbmc3k, group_by_var = "orig.ident",
mad_num = 3, default_var = TRUE)
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.9 (Green Obsidian)
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.15.so; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] pbmc3k.SeuratData_3.1.4 SeuratData_0.2.2.9001 scCustomize_2.1.2 wesanderson_0.3.7
[5] cowplot_1.1.3 ggvenn_0.1.10 fossil_0.4.0 shapefiles_0.7.2
[9] foreign_0.8-86 maps_3.4.2 ggpointdensity_0.1.0 reshape2_1.4.4
[13] readxl_1.4.3 ComplexHeatmap_2.18.0 scDblFinder_1.14.0 SingleCellExperiment_1.24.0
[17] SummarizedExperiment_1.32.0 Biobase_2.62.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
[21] IRanges_2.36.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 MatrixGenerics_1.14.0
[25] matrixStats_1.3.0 harmony_1.2.0 Rcpp_1.0.12 patchwork_1.2.0
[29] SeuratWrappers_0.3.5 miQC_1.10.0 sctransform_0.4.1 Matrix_1.6-5
[33] devtools_2.4.5 usethis_2.2.3 data.table_1.15.4 alevinQC_1.16.1
[37] fishpond_2.6.2 ProjecTILs_3.3.1 scRepertoire_2.0.0 Seurat_5.0.2
[41] SeuratObject_5.0.1 sp_2.1-3 lubridate_1.9.3 forcats_1.0.0
[45] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[49] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] R.methodsS3_1.8.2 urlchecker_1.0.1 nnet_7.3-19 goftest_1.2-3
[5] DT_0.33 Biostrings_2.70.3 vctrs_0.6.5 spatstat.random_3.2-3
[9] shape_1.4.6.1 digest_0.6.35 png_0.1-8 ggrepel_0.9.5
[13] deldir_2.0-4 parallelly_1.37.1 MASS_7.3-60.0.1 foreach_1.5.2
[17] httpuv_1.6.15 withr_3.0.0 ggrastr_1.0.2 xfun_0.43
[21] ellipsis_0.3.2 survival_3.5-8 memoise_2.0.1 ggbeeswarm_0.7.2
[25] janitor_2.2.0 MatrixModels_0.5-3 profvis_0.3.8 GlobalOptions_0.1.2
[29] zoo_1.8-12 gtools_3.9.5 pbapply_1.7-2 R.oo_1.26.0
[33] GGally_2.2.1 rematch2_2.1.2 promises_1.3.0 evmix_2.12
[37] httr_1.4.7 restfulr_0.0.15 globals_0.16.3 fitdistrplus_1.1-11
[41] ps_1.7.6 rstudioapi_0.16.0 miniUI_0.1.1.1 generics_0.1.3
[45] processx_3.8.4 ggalluvial_0.12.5 curl_5.2.1 zlibbioc_1.48.2
[49] ScaledMatrix_1.10.0 ggraph_2.2.1 polyclip_1.10-6 GenomeInfoDbData_1.2.11
[53] SparseArray_1.2.4 desc_1.4.3 doParallel_1.0.17 xtable_1.8-4
[57] pracma_2.4.4 evaluate_0.23 S4Arrays_1.2.1 hms_1.1.3
[61] irlba_2.3.5.1 colorspace_2.1-0 ROCR_1.0-11 reticulate_1.35.0
[65] spatstat.data_3.0-4 snakecase_0.11.1 flexmix_2.3-19 magrittr_2.0.3
[69] lmtest_0.9-40 later_1.3.2 viridis_0.6.5 modeltools_0.2-23
[73] lattice_0.22-6 spatstat.geom_3.2-9 future.apply_1.11.2 SparseM_1.81
[77] scattermore_1.2 XML_3.99-0.16.1 scuttle_1.12.0 RcppAnnoy_0.0.22
[81] pillar_1.9.0 nlme_3.1-164 iterators_1.0.14 compiler_4.3.2
[85] beachmat_2.18.1 RSpectra_0.16-1 stringi_1.8.3 tensor_1.5
[89] GenomicAlignments_1.38.2 plyr_1.8.9 crayon_1.5.2 abind_1.4-5
[93] BiocIO_1.12.0 scater_1.30.1 ggdendro_0.2.0 locfit_1.5-9.9
[97] graphlayouts_1.1.1 scGate_1.6.0 codetools_0.2-20 BiocSingular_1.18.0
[101] paletteer_1.6.0 GetoptLong_1.0.5 plotly_4.10.4 mime_0.12
[105] splines_4.3.2 circlize_0.4.16 fastDummies_1.7.3 quantreg_5.97
[109] sparseMatrixStats_1.14.0 cellranger_1.1.0 knitr_1.46 utf8_1.2.4
[113] clue_0.3-65 fs_1.6.3 listenv_0.9.1 evd_2.3-6.1
[117] DelayedMatrixStats_1.24.0 pkgbuild_1.4.4 gsl_2.1-8 callr_3.7.6
[121] statmod_1.5.0 tzdb_0.4.0 tweenr_2.0.3 pkgconfig_2.0.3
[125] pheatmap_1.0.12 tools_4.3.2 cachem_1.0.8 viridisLite_0.4.2
[129] fastmap_1.1.1 rmarkdown_2.26 scales_1.3.0 ica_1.0-3
[133] shinydashboard_0.7.2 Rsamtools_2.18.0 ggprism_1.0.5 BiocManager_1.30.22
[137] ggstats_0.6.0 dotCall64_1.1-1 RANN_2.6.1 farver_2.1.1
[141] tidygraph_1.3.1 yaml_2.3.8 VGAM_1.1-10 rtracklayer_1.62.0
[145] cli_3.6.2 UCell_2.6.2 leiden_0.4.3.1 lifecycle_1.0.4
[149] uwot_0.1.16 bluster_1.12.0 sessioninfo_1.2.2 tximport_1.30.0
[153] BiocParallel_1.36.0 timechange_0.3.0 gtable_0.3.4 rjson_0.2.21
[157] ggridges_0.5.6 progressr_0.14.0 cubature_2.1.0 parallel_4.3.2
[161] limma_3.58.1 jsonlite_1.8.8 edgeR_4.0.16 RcppHNSW_0.6.0
[165] bitops_1.0-7 svMisc_1.2.3 xgboost_1.7.7.1 Rtsne_0.17
[169] spatstat.utils_3.0-4 BiocNeighbors_1.20.2 metapod_1.10.1 dqrng_0.3.2
[173] R.utils_2.12.3 truncdist_1.0-2 lazyeval_0.2.2 shiny_1.8.1.1
[177] htmltools_0.5.8.1 iNEXT_3.0.1 rappdirs_0.3.3 glue_1.7.0
[181] spam_2.10-0 XVector_0.42.0 RCurl_1.98-1.14 scran_1.30.2
[185] gridExtra_2.3 igraph_2.0.3 R6_2.5.1 labeling_0.4.3
[189] STACAS_2.2.2 cluster_2.1.6 pkgload_1.3.4 stringdist_0.9.12
[193] DelayedArray_0.28.0 tidyselect_1.2.1 vipor_0.4.7 ggforce_0.4.2
[197] future_1.33.2 rsvd_1.0.5 munsell_0.5.1 KernSmooth_2.23-22
[201] htmlwidgets_1.6.4 RColorBrewer_1.1-3 rlang_1.1.3 spatstat.sparse_3.0-3
[205] spatstat.explore_3.2-7 remotes_2.5.0 fansi_1.0.6 beeswarm_0.4.0
Yup, you’re correct. Sorry about the error.
I will work on bug fix later today and comment here when live.
Thanks!
Sam
I think the fix is to change to:
mad_by_group <- meta_data %>% group_by(.data[[group_by_var]]) %>%
summarise(across(all_of(all_variables), mad)) %>%
mutate(across(all_of(all_variables), ~ .x * mad_num)) # <--- this line was added
mad_overall <- meta_data %>% summarise(across(all_of(all_variables),
mad)) %>%
mutate(across(all_of(all_variables), ~ .x * mad_num)) # <--- this line was added
@samuel-marsh can you also comment here about how one uses this in the context of filtering potentially poor quality cells? For example, scater
computes the MAD (* a scalar, default=3) then finds outliers based on that (isOutlier
). What is the equivalent next step after MAD_Stats
is called here?
Yup that totally works.
I just implemented slightly differently but end result is the same.
mad_by_group <- meta_data %>%
group_by(.data[[group_by_var]]) %>%
summarise(across(all_of(all_variables), mad)*mad_num)
# Calculate overall medians
mad_overall <- meta_data %>%
summarise(across(all_of(all_variables), mad)*mad_num)
I also added a check that was in original version I wrote some time ago but got lost somewhere along the way to ensure that mad_num
is greater than 0.
Both of those fixes are now live in "release/2.2.0" branch (v2.1.2.9056 or greater). If you run into any issues after updating to that branch please let me know and I'll reopen the issue.
Best,
Sam
@jeremymsimon heading into meeting currently, but will return and offer what I have on the question of filtering here as well.