Marginal histogram bar heights vary depending on color aesthetic / groupFill
NikKrieger opened this issue · 14 comments
I would think that the bar heights in the marginal histograms of the three plots below should all be the same. They should only differ in fill color.
All three plots use the same underlying data mtcars
, and I standardized the bin widths across the three plots using the breaks
parameter.
Notably, this problem is not observed on the Shiny demo---you can toggle the "Show groups as 'fill'" checkbox, and all the histogram bar heights remain the same.
Is there something I don't know?
library(magrittr)
library(ggplot2)
library(ggExtra)
ggMarginal(
ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
type = "histogram",
xparams = list(breaks = seq(90, 500, 10)),
yparams = list(breaks = seq(2.6, 5, .1))
) %>% plot()
ggMarginal(
ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
type = "histogram",
xparams = list(breaks = seq(90, 500, 10)),
yparams = list(breaks = seq(2.6, 5, .1)),
groupFill = TRUE
) %>% plot()
ggMarginal(
ggplot(mtcars, aes(disp, drat, color = factor(am))) + geom_point(),
type = "histogram",
xparams = list(breaks = seq(90, 500, 10)),
yparams = list(breaks = seq(2.6, 5, .1)),
groupFill = TRUE
) %>% plot()
Created on 2020-04-10 by the reprex package (v0.3.0)
What you're seeing is not because of the colour aesthetic. It's because of the (incorrect? strange?) use of xparams. If you remove the xparams and yparams then the plots will look correct.
xparams and yparams are meant to pass parameters to use for the marginal plots, but passing a breaks
parameter is not an intended usecase. It's meant more for things like colour or binsize or things like that. The axis itself shouldn't be modified, it should mirror the axis on the main plot.
I can see that removing the breaks
parameters corrects the bar heights. The thing is, those parameters are normal geom_histogram()
parameters--it's another way to specify the bin width, much like the binwidth
parameter that you alluded to. In other words, these breaks
are distinct from specifying the axis breaks. Nonetheless, I'll try to find another way to get the bin width and placement that I want.
Nonetheless,
I looked into this some more and I do still think the problem is because you're effectively changing the axis of the marginal plots by specifying break points that don't match with the min/max of the original plot. For example, min(mtcars$disp) is 71.1 but your breaks start at 90 (and the actual x axis starts a bit before 71.1 because ggplot leaves a small buffer). I'm closing this as I'm fairly certain now that this is not an actual bug, it's working correctly
It turns out that it's not just a problem with specifying breaks
---it also occurs when specifying bins
or binwidth
:
library(magrittr)
library(ggplot2)
library(ggExtra)
ggMarginal(
ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
type = "histogram",
xparams = list(binwidth = 10),
yparams = list(binwidth = .1)
) %>% plot()
ggMarginal(
ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
type = "histogram",
xparams = list(binwidth = 10),
yparams = list(binwidth = .1),
groupFill = TRUE
) %>% plot()
ggMarginal(
ggplot(mtcars, aes(disp, drat, color = factor(am))) + geom_point(),
type = "histogram",
xparams = list(binwidth = 10),
yparams = list(binwidth = .1),
groupFill = TRUE
) %>% plot()
Created on 2020-04-13 by the reprex package (v0.3.0)
It seems that binwidth
and bins
also alter the axes of geom_histograms()
. This can be standardized by standardizing the limits
parameter in scale_*_continuous()
:
library(ggplot2)
plot(
ggplot(mtcars, aes(disp)) +
geom_histogram(bins = 30) +
scale_x_continuous(limits = c(40,450)) +
theme(axis.text.y = element_blank()) # Added so that axis text doesn't
) # change the width of the plot
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
#> Warning: Removed 2 rows containing missing values (geom_bar).
plot(
ggplot(mtcars, aes(disp)) +
geom_histogram(binwidth = 17) +
scale_x_continuous(limits = c(40,450)) +
theme(axis.text.y = element_blank()) # Added so that axis text doesn't
) # change the width of the plot
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
#> Warning: Removed 2 rows containing missing values (geom_bar).
plot(
ggplot(mtcars, aes(disp)) +
geom_histogram(breaks = c(117, 203, 333,600)) +
scale_x_continuous(limits = c(40,450)) +
theme(axis.text.y = element_blank()) # Added so that axis text doesn't
) # change the width of the plot
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
#> Warning: Removed 1 rows containing missing values (geom_bar).
Created on 2020-04-13 by the reprex package (v0.3.0)
But is there a way to do this for ggMarginal()
plots?
You're right, it does seem to be off.
After some investigation, it looks like the culprit may be this line
If I change it to ggplot2::geom_histogram
then it seems to work as expected.
I can trace back the change from geom_histogram
to geom_histogram2
in this commit
@crew102 This is almost 2 years old now, but do you happen to remember why histograms were changed from being plain geom_histogram to being geom_histogram(ggplot2::aes(y = ..density..)
?
I think geom_histogram(ggplot2::aes(y = ..density..)
was added in an effort to support densigram plots, as a way to try to match the histogram bar heights to the density curve height. But I'm not sure why it was needed for regular histogram marginal plots (when densigram isn't used).
Yep, that' why it was added. I didn't consider the fact that some people would want to be setting params of stat bin that aren't supported by the density stat (e.g., breaks, binwidth). I'll fix sometime this week.
@NikKrieger can you confirm this is working now as expected? And if it does, please thank @crew102 :)
@crew102 Hi I had the same problem here. Here is my script.
If i do not add "groupFill = TRUE", then the histogram matched the scatter plot, except that the filled colors are not varied according to groups:
p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge')
ggMarginal(p, type = "histogram", xparams = list(binwidth = 1))
However, if I add "groupFill = TRUE", the marginal histogram is wrong.
p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge')
ggMarginal(p, type = "histogram", xparams = list(binwidth = 1), groupFill = TRUE)
Hey @WJH58 , can you provide some sample data so that I can reproduce things on my end? Better yet, can you reproduce issue using a dataset that I would have access to (e.g., built in dataset, or dataset that ships with ggplot).
Hi this is my "data2".
data2.txt
And the code corresponding to the first figure is:
p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge')
ggMarginal(p, type = "histogram", xparams = list(binwidth = 1))
The code corresponding to the second figure is:
p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge')
ggMarginal(p, type = "histogram", groupFill = TRUE, xparams = list(binwidth = 1))
Thanks if you can help!
This reminds me that a new version should go out to CRAN. It will happen within the month.