samuel-marsh/scCustomize

Using Clustered_DotPlot function

Closed this issue · 5 comments

I am attempting to do hierarchical clustering for these markers:

markers.IC.plot <- c("Dmrt2", "Adgrf5", "Slc4a1", "Clnk", "2610016A17Rik", "Pygm", "Gm9871", "Aqp6", "Pdlim3", "Tldc2", "Hmx2", "Hmx3", "Insrr", "Slc26a4", "Sult2a3", "Kit", "Atp6v1c2", "Atp6v0d2", "Foxi1", "Slc26a4", "Slc4a9", "Hepacam2", "Atp6v1g3", "Pam", "Eps8")

Using this code:
Clustered_DotPlot(seurat_object = KS_filt, features = markers.IC.plot)

However, i keep falling into this error:
Error in kmeans(data, centers = i) :
more cluster centers than distinct data points.

How can i solve it? I've searched the internet but nothing. If i can't solve it is there another way to perform hierarchial clustering maybe with a heatmap?

Hi @isaiao01,

I'll look into this. To get better idea of couple things could you please post the output of:

length(x = Cells(KS_filt))

all(markers.IC.plot  %in% Features(KS_filt))

levels(Idents(KS_filt))

If you could also post the output of sessionInfo that would be great.

Thanks!
Sam

As already noted, KS_filt is the seurat object. Here is the ouput for the line of codes you requested:

levels(Idents(KS_filt))
[1] "1" "2" "3" "4" "5" "6" "7" "8"

 all(markers.IC.plot %in% Features(KS_filt))
[1] FALSE

length(x = Cells(KS_filt))
[1] 1550

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] ComplexHeatmap_2.16.0 scCustomize_2.0.1 openxlsx_4.2.5.2 RColorBrewer_1.1-3 SeuratWrappers_0.3.1
[6] ggplot2_3.4.4 reshape2_1.4.4 tidyr_1.3.0 Matrix_1.6-1.1 R.utils_2.12.2
[11] R.oo_1.25.0 R.methodsS3_1.8.2 patchwork_1.1.3 Seurat_5.0.0 SeuratObject_5.0.0
[16] sp_2.0-0 dplyr_1.1.3 monocle3_1.4.5 SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2
[21] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4 IRanges_2.34.1 S4Vectors_0.38.2 MatrixGenerics_1.12.3
[26] matrixStats_1.0.0 Biobase_2.60.0 BiocGenerics_0.46.0

loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.21 splines_4.3.1 later_1.3.1 bitops_1.0-7 tibble_3.2.1 polyclip_1.10-6
[7] janitor_2.2.0 fastDummies_1.7.3 lifecycle_1.0.3 doParallel_1.0.17 globals_0.16.2 lattice_0.21-8
[13] MASS_7.3-60 magrittr_2.0.3 plotly_4.10.3 rmarkdown_2.25 yaml_2.3.7 remotes_2.4.2.1
[19] httpuv_1.6.12 glmGamPoi_1.12.2 sctransform_0.4.1 spam_2.10-0 zip_2.3.0 spatstat.sparse_3.0-2
[25] reticulate_1.34.0 cowplot_1.1.1 pbapply_1.7-2 minqa_1.2.6 lubridate_1.9.3 abind_1.4-5
[31] zlibbioc_1.46.0 Rtsne_0.16 purrr_1.0.2 RCurl_1.98-1.12 circlize_0.4.15 GenomeInfoDbData_1.2.10
[37] ggrepel_0.9.4 irlba_2.3.5.1 listenv_0.9.0 spatstat.utils_3.0-3 goftest_1.2-3 RSpectra_0.16-1
[43] spatstat.random_3.1-6 fitdistrplus_1.1-11 parallelly_1.36.0 DelayedMatrixStats_1.22.6 leiden_0.4.3 codetools_0.2-19
[49] DelayedArray_0.26.7 shape_1.4.6 tidyselect_1.2.0 farver_2.1.1 lme4_1.1-34 spatstat.explore_3.2-3
[55] jsonlite_1.8.7 GetoptLong_1.0.5 ellipsis_0.3.2 progressr_0.14.0 iterators_1.0.14 ggridges_0.5.4
[61] survival_3.5-5 foreach_1.5.2 tools_4.3.1 ica_1.0-3 Rcpp_1.0.11 glue_1.6.2
[67] gridExtra_2.3 xfun_0.40 withr_2.5.0 BiocManager_1.30.22 fastmap_1.1.1 boot_1.3-28.1
[73] fansi_1.0.4 digest_0.6.33 rsvd_1.0.5 timechange_0.2.0 R6_2.5.1 mime_0.12
[79] ggprism_1.0.4 colorspace_2.1-0 Cairo_1.6-1 scattermore_1.2 tensor_1.5 spatstat.data_3.0-1
[85] utf8_1.2.3 generics_0.1.3 data.table_1.14.8 httr_1.4.7 htmlwidgets_1.6.2 S4Arrays_1.0.6
[91] uwot_0.1.16 pkgconfig_2.0.3 gtable_0.3.4 lmtest_0.9-40 XVector_0.40.0 htmltools_0.5.6.1
[97] dotCall64_1.1-0 clue_0.3-65 scales_1.2.1 png_0.1-8 snakecase_0.11.1 knitr_1.45
[103] rstudioapi_0.15.0 rjson_0.2.21 nlme_3.1-162 nloptr_2.0.3 GlobalOptions_0.1.2 zoo_1.8-12
[109] stringr_1.5.0 KernSmooth_2.23-21 parallel_4.3.1 miniUI_0.1.1.1 vipor_0.4.5 ggrastr_1.0.2
[115] pillar_1.9.0 vctrs_0.6.3 RANN_2.6.1 promises_1.2.1 xtable_1.8-4 cluster_2.1.4
[121] paletteer_1.5.0 beeswarm_0.4.0 evaluate_0.21 cli_3.6.1 compiler_4.3.1 rlang_1.1.1
[127] crayon_1.5.2 leidenbase_0.1.25 future.apply_1.11.0 labeling_0.4.3 rematch2_2.1.2 forcats_1.0.0
[133] plyr_1.8.9 ggbeeswarm_0.7.2 stringi_1.7.12 viridisLite_0.4.2 deldir_1.0-9 assertthat_0.2.1
[139] munsell_0.5.0 lazyeval_0.2.2 spatstat.geom_3.2-5 pacman_0.5.1 RcppHNSW_0.5.0 sparseMatrixStats_1.12.2
[145] future_1.33.0 shiny_1.7.5 ROCR_1.0-11 igraph_1.5.1

Hi @isaiao01,

Thanks for additional info. Can you also send output of:

length(intersect(markers.IC.plot, Features(KS_filt))

Thanks!
Sam

Hey,

Thank you for your consistent response. It seems like the Clustered_DotPlot function does not like when a list is not unique. I went back and removed the duplicate genes from my markers.IC.plot list and that seemed to work!

I am wondering is it possible to use the function and only show the percent expressing in the clustered dotplot rather than both percent expression and expression?

Hi @isaiao01,

Glad you got solution working. In terms of just showing percent expression no it doesn't support that and I think if just showing percent expression that dot plots in general are not best visualization because interpreting subtle differences in percent differences can be more difficult.

What I would suggest is simply pulling that data and either presenting as tabular data or using data to create your own plot type (perhaps a stacked barplot with a expressing/not distinction for each gene https://r-graph-gallery.com/stacked-barplot.html).

You can easily obtain this data using scCustomize's Percent_Expressing function. You can provide both group.by and split.by variables and it will return data.frame with all of the information.

Best,
Sam