text is misplaced with position_dodge()
slowkow opened this issue · 14 comments
In the example below, I would expect all of the text labels to be positioned perfectly on top of the data points. Instead, some of the text labels are not positioned correctly.
I think the issue is due to position_dodge(). I'm not sure exactly where to look to find the relevant code.
In the last example, I use ggrepel to help illustrate the problem more clearly. You can see the blue labels 34 and 290 are not pointing to the correct positions. It seems like they're pointing to the "undodged" positions instead of the "dodged" positions.
This issue was originally reported by @raviselker in ggrepel issues: slowkow/ggrepel#122
library(tidyverse)
library(ggrepel)
# remotes::install_github("thomasp85/patchwork)
library(patchwork)
set.seed(1337)
df <- tibble(
x = rnorm(500),
g1 = factor(sample(c("A", "B"), 500, replace = TRUE)),
g2 = factor(sample(c("A", "B"), 500, replace = TRUE)),
rownames = 1:500
)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
df_outliers <- df %>% group_by(g1, g2) %>% mutate(outlier = is_outlier(x))
p1 <- ggplot(df_outliers, aes(x = g1, y = x, fill = g2)) +
geom_boxplot(width = 0.3, position = position_dodge(0.5))
p2 <- p1 +
geom_text(
data = . %>% filter(outlier),
mapping = aes(label = rownames),
position = position_dodge(0.5)
)
p1 + p2ggplot(df_outliers, aes(x = g1, y = x, fill = g2)) +
geom_boxplot(width = 0.3, position = position_dodge(0.5)) +
ggrepel::geom_label_repel(
min.segment.length = 0,
data = . %>% filter(outlier),
mapping = aes(label = rownames),
position = position_dodge(0.5)
)Created on 2018-12-02 by the reprex package (v0.2.1)
The underlying principle is that dodging doesn't work as one might expect when some data groupings don't exist.
library(ggplot2)
df <- data.frame(
x = c("A", "A", "B"),
type = c("a", "b", "a")
)
ggplot(df, aes(x, 1, color = type)) +
geom_point(position = position_dodge(width = .5), size = 5)Created on 2018-12-02 by the reprex package (v0.2.1)
I'm not sure this can be fixed with the current positioning approach, because the position adjustments never see the entire dataset. The question is whether we can come up with some delicate surgery that fixes this problem without completely changing how position adjustments work.
It appears that the various position functions do receive the entire dataset, at least the dataset per panel
I'm afraid not. Position$compute_panel() is called from Position$compute_layer(), and Position$compute_layer() is called from Layer$compute_position(), which is called per layer with each layer's data. So, it doesn't know the other layer's data.
Line 77 in 23a23cd
BTW, I feel this description is not quite right. Maybe, "once per panel per layer"?
Lines 20 to 21 in 5e4a6ef
But that should still be good enough to get the dodging right within each layer and panel. I think the other problem is that we're not using an explicit dodging aesthetic. position_dodge() simply finds all distinct groups at each x position and spreads them out. If we gave it an explicit aesthetic, e.g. aes(dodge = type), or maybe as an optional argument to position_dodge(), e.g. position_dodge(dodge_by = type), then the position adjustment could make smarter decisions about where to place which data points.
Here is another example, building on Claus' code.
It seems that color and fill are not treated the same way by ggplot2. I found this surprising and unexpected -- perhaps this is intended behavior?
library(ggplot2)
df <- data.frame(
x = c("A", "A", "B"),
type = c("a", "b", "a")
)
pos <- position_dodge(width = 0.5)
p <- ggplot(df) +
geom_point(position = pos, shape = 21, size = 10, stroke = 1) +
geom_text(aes(label = type), color = "black", position = pos)
p + aes(x, 1, color = type)p + aes(x, 1, color = type, group = type)p + aes(x, 1, fill = type)Created on 2018-12-02 by the reprex package (v0.2.1)
Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.5.1 (2018-07-02)
#> os macOS High Sierra 10.13.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2018-12-02
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib
#> assertthat 0.2.0 2017-04-11 [1]
#> backports 1.1.2 2017-12-13 [1]
#> base64enc 0.1-3 2015-07-28 [1]
#> bindr 0.1.1 2018-03-13 [1]
#> bindrcpp 0.2.2 2018-03-29 [1]
#> callr 3.0.0 2018-08-24 [1]
#> cli 1.0.1 2018-09-25 [1]
#> colorspace 1.3-2 2016-12-14 [1]
#> crayon 1.3.4 2017-09-16 [1]
#> curl 3.2 2018-03-28 [1]
#> desc 1.2.0 2018-05-01 [1]
#> devtools 2.0.1 2018-10-26 [1]
#> digest 0.6.18 2018-10-10 [1]
#> dplyr 0.7.8 2018-11-10 [1]
#> evaluate 0.12 2018-10-09 [1]
#> fs 1.2.6 2018-08-23 [1]
#> ggplot2 * 3.1.0.9000 2018-12-02 [1]
#> glue 1.3.0 2018-07-17 [1]
#> gtable 0.2.0 2016-02-26 [1]
#> htmltools 0.3.6 2017-04-28 [1]
#> httr 1.3.1 2017-08-20 [1]
#> knitr 1.20 2018-02-20 [1]
#> labeling 0.3 2014-08-23 [1]
#> lazyeval 0.2.1 2017-10-29 [1]
#> magrittr 1.5 2014-11-22 [1]
#> memoise 1.1.0 2017-04-21 [1]
#> mime 0.6 2018-10-05 [1]
#> munsell 0.5.0 2018-06-12 [1]
#> pillar 1.3.0 2018-07-14 [1]
#> pkgbuild 1.0.2 2018-10-16 [1]
#> pkgconfig 2.0.2 2018-08-16 [1]
#> pkgload 1.0.2 2018-10-29 [1]
#> plyr 1.8.4 2016-06-08 [1]
#> prettyunits 1.0.2 2015-07-13 [1]
#> processx 3.2.0 2018-08-16 [1]
#> ps 1.2.1 2018-11-06 [1]
#> purrr 0.2.5 2018-05-29 [1]
#> R6 2.3.0 2018-10-04 [1]
#> Rcpp 1.0.0 2018-11-07 [1]
#> remotes 2.0.2 2018-10-30 [1]
#> rlang 0.3.0.1 2018-10-25 [1]
#> rmarkdown 1.10 2018-06-11 [1]
#> rprojroot 1.3-2 2018-01-03 [1]
#> scales 1.0.0 2018-08-09 [1]
#> sessioninfo 1.1.1 2018-11-05 [1]
#> stringi 1.2.4 2018-07-20 [1]
#> stringr 1.3.1 2018-05-10 [1]
#> testthat 2.0.1 2018-10-13 [1]
#> tibble 1.4.2 2018-01-22 [1]
#> tidyselect 0.2.5 2018-10-11 [1]
#> usethis 1.4.0 2018-08-14 [1]
#> withr 2.1.2 2018-03-15 [1]
#> xml2 1.2.0 2018-01-24 [1]
#> yaml 2.2.0 2018-07-25 [1]
#> source
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.1)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> Github (tidyverse/ggplot2@23a23cd)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library@slowkow What you're seeing is color = "black" shadowing the color aesthetic in the text layer. Apparently the label aesthetic is not considered when groups are calculated.
library(ggplot2)
df <- data.frame(
x = c("A", "A", "B"),
type = c("a", "b", "a")
)
pos <- position_dodge(width = 0.5)
p <- ggplot(df) +
geom_point(position = pos, shape = 21, size = 10, stroke = 1) +
geom_text(aes(label = type), position = pos)
p + aes(x, 1, color = type)Created on 2018-12-02 by the reprex package (v0.2.1)
Yes, labels are not considered when calculating grouping, and that is done by design. (Presumably because it's not uncommon for labels to be all different even within a group.)
Lines 7 to 10 in 1c09bae
to get the dodging right within each layer and panel.
Sorry, I don't get the point yet... Are we talking about the inconsistency of the positions between layers, not within each layer, right?
Letting positions to have aesthetics sounds cool to me, which you've also indicated in #2977 (comment).
I am talking within each layer. I think there should be an option that guarantees that dodging always looks the same across all x values. In the example here, we would want type = "a" always be dodged to the left and type = "b" always be dodged to the right, regardless of whether the other type is present at a given x or not. As a side effect, this would fix the original problem.
On a related note, see this closed PR that wasn't merged, and the issue of violins moving in the wrong spot under preserve = "single": #2813
It's the same problem. The dodging doesn't know about the variable that it is dodging by, and therefore it does strange things.
Thanks, I got what you mean. It's still unclear to me how to map groups to dodged positions without training over all layers,, but I think I'll find it later :)
In case this is still useful, here's another version of reprex which I believe is minimal for this issue:
library(ggplot2)
d <- data.frame(x = c("x", "x"), g = c("a", "b"), stringsAsFactors = FALSE)
pos <- position_dodge(width = .5)
ggplot(mapping = aes(x, 0, colour = g, label = g)) +
geom_point(data = d, size = 5, position = pos) +
geom_label(data = d[2, ], size = 5, position = pos)Created on 2018-12-03 by the reprex package (v0.2.1)
I think there should be an option that guarantees that dodging always looks the same across all x values. In the example here, we would want type = "a" always be dodged to the left and type = "b" always be dodged to the right, regardless of whether the other type is present at a given x or not. As a side effect, this would fix the original problem.
This has been requested before in #2076 and I agree that it would be a nice feature to have, though if I remember correctly it would require some significant refactoring. We'd also have to think through how geoms with different widths across groups would get placed (e.g. box plots with varwidth = TRUE). For this reason I don't know that fixing this would solve the original problem unless the position calculation knew about other layers. One of the things that's tricky about dodging points and labels in particular is that they have no width in the data space, so the position calculations that calculate where things go based on width don't work right.
yes I think so







