daattali/ggExtra

Marginal histogram bar heights vary depending on color aesthetic / groupFill

NikKrieger opened this issue · 14 comments

I would think that the bar heights in the marginal histograms of the three plots below should all be the same. They should only differ in fill color.

All three plots use the same underlying data mtcars, and I standardized the bin widths across the three plots using the breaks parameter.

Notably, this problem is not observed on the Shiny demo---you can toggle the "Show groups as 'fill'" checkbox, and all the histogram bar heights remain the same.

Is there something I don't know?

library(magrittr)
library(ggplot2)
library(ggExtra)
ggMarginal(
  ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
  type = "histogram",
  xparams = list(breaks = seq(90, 500, 10)),
  yparams = list(breaks = seq(2.6, 5, .1))
) %>% plot()

ggMarginal(
  ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
  type = "histogram",
  xparams = list(breaks = seq(90, 500, 10)),
  yparams = list(breaks = seq(2.6, 5, .1)),
  groupFill = TRUE
) %>% plot()

ggMarginal(
  ggplot(mtcars, aes(disp, drat, color = factor(am))) + geom_point(),
  type = "histogram",
  xparams = list(breaks = seq(90, 500, 10)),
  yparams = list(breaks = seq(2.6, 5, .1)),
  groupFill = TRUE
) %>% plot()

Created on 2020-04-10 by the reprex package (v0.3.0)

What you're seeing is not because of the colour aesthetic. It's because of the (incorrect? strange?) use of xparams. If you remove the xparams and yparams then the plots will look correct.

xparams and yparams are meant to pass parameters to use for the marginal plots, but passing a breaks parameter is not an intended usecase. It's meant more for things like colour or binsize or things like that. The axis itself shouldn't be modified, it should mirror the axis on the main plot.

I can see that removing the breaks parameters corrects the bar heights. The thing is, those parameters are normal geom_histogram() parameters--it's another way to specify the bin width, much like the binwidth parameter that you alluded to. In other words, these breaks are distinct from specifying the axis breaks. Nonetheless, I'll try to find another way to get the bin width and placement that I want.

Nonetheless,

I looked into this some more and I do still think the problem is because you're effectively changing the axis of the marginal plots by specifying break points that don't match with the min/max of the original plot. For example, min(mtcars$disp) is 71.1 but your breaks start at 90 (and the actual x axis starts a bit before 71.1 because ggplot leaves a small buffer). I'm closing this as I'm fairly certain now that this is not an actual bug, it's working correctly

It turns out that it's not just a problem with specifying breaks---it also occurs when specifying bins or binwidth:

library(magrittr)
library(ggplot2)
library(ggExtra)
ggMarginal(
  ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
  type = "histogram",
  xparams = list(binwidth = 10),
  yparams = list(binwidth = .1)
) %>% plot()

ggMarginal(
  ggplot(mtcars, aes(disp, drat, color = factor(cyl))) + geom_point(),
  type = "histogram",
  xparams = list(binwidth = 10),
  yparams = list(binwidth = .1),
  groupFill = TRUE
) %>% plot()

ggMarginal(
  ggplot(mtcars, aes(disp, drat, color = factor(am))) + geom_point(),
  type = "histogram",
  xparams = list(binwidth = 10),
  yparams = list(binwidth = .1),
  groupFill = TRUE
) %>% plot()

Created on 2020-04-13 by the reprex package (v0.3.0)

It seems that binwidth and bins also alter the axes of geom_histograms(). This can be standardized by standardizing the limits parameter in scale_*_continuous():

library(ggplot2)
plot(
  ggplot(mtcars, aes(disp)) +
    geom_histogram(bins = 30) +
    scale_x_continuous(limits = c(40,450)) +
    theme(axis.text.y = element_blank()) # Added so that axis text doesn't
)                                        # change the width of the plot
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
#> Warning: Removed 2 rows containing missing values (geom_bar).

plot(
  ggplot(mtcars, aes(disp)) +
    geom_histogram(binwidth = 17) +
    scale_x_continuous(limits = c(40,450)) +
    theme(axis.text.y = element_blank()) # Added so that axis text doesn't
)                                        # change the width of the plot
#> Warning: Removed 2 rows containing non-finite values (stat_bin).

#> Warning: Removed 2 rows containing missing values (geom_bar).

plot(
  ggplot(mtcars, aes(disp)) +
    geom_histogram(breaks = c(117, 203, 333,600)) +
    scale_x_continuous(limits = c(40,450)) +
    theme(axis.text.y = element_blank()) # Added so that axis text doesn't
)                                        # change the width of the plot
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
#> Warning: Removed 1 rows containing missing values (geom_bar).

Created on 2020-04-13 by the reprex package (v0.3.0)

But is there a way to do this for ggMarginal() plots?

You're right, it does seem to be off.

After some investigation, it looks like the culprit may be this line

If I change it to ggplot2::geom_histogram then it seems to work as expected.

I can trace back the change from geom_histogram to geom_histogram2 in this commit

@crew102 This is almost 2 years old now, but do you happen to remember why histograms were changed from being plain geom_histogram to being geom_histogram(ggplot2::aes(y = ..density..)?

I think geom_histogram(ggplot2::aes(y = ..density..) was added in an effort to support densigram plots, as a way to try to match the histogram bar heights to the density curve height. But I'm not sure why it was needed for regular histogram marginal plots (when densigram isn't used).

Yep, that' why it was added. I didn't consider the fact that some people would want to be setting params of stat bin that aren't supported by the density stat (e.g., breaks, binwidth). I'll fix sometime this week.

@NikKrieger can you confirm this is working now as expected? And if it does, please thank @crew102 :)

WJH58 commented

@crew102 Hi I had the same problem here. Here is my script.
If i do not add "groupFill = TRUE", then the histogram matched the scatter plot, except that the filled colors are not varied according to groups:

p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge') 
ggMarginal(p, type = "histogram", xparams = list(binwidth = 1))

image

However, if I add "groupFill = TRUE", the marginal histogram is wrong.

p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge') 
ggMarginal(p, type = "histogram", xparams = list(binwidth = 1), groupFill = TRUE)

Screenshot 2022-03-01 at 6 59 34 PM

And, removing " xparams = list(binwidth = 1) " doesn't make a difference. Maybe this issue is not really solved.

Hey @WJH58 , can you provide some sample data so that I can reproduce things on my end? Better yet, can you reproduce issue using a dataset that I would have access to (e.g., built in dataset, or dataset that ships with ggplot).

WJH58 commented

Hi this is my "data2".
data2.txt

And the code corresponding to the first figure is:

p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge') 
ggMarginal(p, type = "histogram", xparams = list(binwidth = 1))

The code corresponding to the second figure is:

p <- ggplot(data2, aes(x = peptide_length, y = mz, color = factor(charge))) + geom_point() + xlab("Peptide length") + ylab("m.z") + theme_classic() + labs(color='Charge') 
ggMarginal(p, type = "histogram", groupFill = TRUE, xparams = list(binwidth = 1))

Thanks if you can help!

Hey @WJH58 , this was actually fixed in #154. If you download the dev version of ggExtra, you should see that the plot looks as expected.

WJH58 commented

@crew102 Thank you! I updated ggExtra and found it worked!

This reminds me that a new version should go out to CRAN. It will happen within the month.