Repel labels in reduced dimensions plots
Closed this issue · 7 comments
scater::plotReducedDim
provides the text_by=
argument, which is exremely useful in cases where there are more clusters than unique colours.
However, long labels of nearby clusters overlapp, making it impossible to read the label text. I tried to use ggrepel::geom_text_repel()
, however I get the error following missing aesthetics: label
.
The work around I could use is inspired from here (from when text_by was yet not available). Plotting the dimensional reduction, look where each label is, try to figure out where I could "repel" the overlapping labels, manually create a data.frame with all the coordinates, plot with personalised labels; and start a next round to correct the labels I did not estimate the position correctly and still overlap. This is extremely tedious and was wondering if there is a better way to achieve the same result.
Ideally if the function could accept ggrepel
arguments (may be it already does and I simply do not know how to use it properly?), if not at least automate this process somehow, extracting the initial coordinates or something similar?
I suppose this sounds reasonable enough. Might be a relatively simple augmentation to
replacing annotate
with geom_text_repel()
or whatever it's called. Note that the line will point to some arbitrary middle point that might not actually exist in the dataset. I suppose we could map it to the closest real point.
thoughts @alanocallaghan?
fwiw I think pals has some good palettes for large n (which I may at some point pilfer for bioccolors).
I think switching to repel text makes sense; IIRC most of the time it won't make any difference unless there's likely to be overlap
@NadineBestard can you test out text_by
on the repel_text
branch and see if that suits? Can switch to the closest point as Aaron suggests if it's not ideal but I don't think it'll matter most of the time
Thanks Alan,
I'll have a look at the pals
package, sounds better than manually creating combinations of discrete palettes (that might end up having similar colours again). But I think the text_by
option is still very useful to quickly locate the clusters, so thanks to add the repel functionality to the package!
I am not being able to download that branch now (it seems to be an error in my package versions). I'll try next week to update R, to be able to update all Bioconductor packages etc and give this a try.
Error
devtools::install_github("Alanocallaghan/scater", ref = "repel_text")
* installing source package 'scater' ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Warning messages:
1: package 'matrixStats' was built under R version 4.0.5
2: package 'BiocGenerics' was built under R version 4.0.5
3: package 'GenomeInfoDb' was built under R version 4.0.5
4: package 'ggplot2' was built under R version 4.0.5
Error: object 'realizeFileBackedMatrix' is not exported by 'namespace:beachmat'
Execution halted
Session info
BiocManager::version()
[1] ‘3.12’
sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] scater_1.18.6 ggplot2_3.3.5 SingleCellExperiment_1.12.0
[4] SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0
[7] GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[10] BiocGenerics_0.36.1 MatrixGenerics_1.2.1 matrixStats_0.59.0
loaded via a namespace (and not attached):
[1] viridis_0.6.1 pkgload_1.2.1 BiocSingular_1.6.0
[4] viridisLite_0.4.0 DelayedMatrixStats_1.12.3 scuttle_1.0.4
[7] assertthat_0.2.1 BiocManager_1.30.16 GenomeInfoDbData_1.2.4
[10] vipor_0.4.5 remotes_2.4.0 sessioninfo_1.1.1
[13] pillar_1.6.1 lattice_0.20-44 glue_1.4.2
[16] beachmat_2.6.4 XVector_0.30.0 colorspace_2.0-2
[19] Matrix_1.3-4 pkgconfig_2.0.3 devtools_2.4.2
[22] zlibbioc_1.36.0 purrr_0.3.4 scales_1.1.1
[25] processx_3.5.2 BiocParallel_1.24.1 tibble_3.1.2
[28] generics_0.1.0 usethis_2.0.1 ellipsis_0.3.2
[31] cachem_1.0.5 withr_2.4.2 cli_3.0.1
[34] magrittr_2.0.1 crayon_1.4.1 memoise_2.0.0
[37] ps_1.6.0 fs_1.5.0 fansi_0.5.0
[40] beeswarm_0.4.0 pkgbuild_1.2.0 tools_4.0.4
[43] prettyunits_1.1.1 lifecycle_1.0.0 munsell_0.5.0
[46] DelayedArray_0.16.3 irlba_2.3.3 callr_3.7.0
[49] compiler_4.0.4 rsvd_1.0.5 rlang_0.4.10
[52] grid_4.0.4 RCurl_1.98-1.3 BiocNeighbors_1.8.2
[55] rstudioapi_0.13 bitops_1.0-7 testthat_3.0.3
[58] gtable_0.3.0 DBI_1.1.1 curl_4.3.2
[61] R6_2.5.0 gridExtra_2.3 dplyr_1.0.7
[64] fastmap_1.1.0 utf8_1.2.1 rprojroot_2.0.2
[67] desc_1.3.0 ggbeeswarm_0.6.0 Rcpp_1.0.6
[70] vctrs_0.3.8 tidyselect_1.1.1 sparseMatrixStats_1.2.1
Ah, yes you'll need 4.1 for devel I believe
Hi Alan,
I finally tested this!
This solved the problem of labels overlapping. ggrepel also repels:
- "away from edges of the plotting area" This is a fantastic positive side effect. I also had some labels popping out of the frame and now they're back in.
- "away from data points" In this case the labels seem to be pulled to areas where there is less density of points. I think it is still nice, in most cases it even improves readability, for small clusters for example the label is pulled totally to the side, making the cluster visible and the letters readable. I prefer to let you know though, as you mentioned "IIRC most of the time it won't make any difference unless there's likely to be overlap", but actually the label is often not centred anymore. ( If you still want to modify this behaviour I've seen it is possible to tweak it with some of the geom_text_repels() arguments such as point.padding/point.size) .
Thanks for taking the time to change this! It was only "slightly annoying" for exploratory analysis but it will definitely be very helpful when creating the graphs for publication.
Great, no worries. I think for the 2nd point I'll just live with it unless people complain. It's probably still clearer than regular geom_text