Labels offset from mizani.breaks.breaks_date output.
Closed this issue · 3 comments
In this plot, both the x data values and label breakpoints are generated by a mizani.breaks.breaks_date object applied to the same range. While they should be the same values, the labels are 4 hours offset from the data points. E.g. The data x values are at 0, 6, 12, 18 while the labels are at 4, 10, 16, 22. All the values are specified as UTC, so I think this is unexpected output.
This is with mizani-0.11.4, plotnine-0.13.6. I understand a new release is imminent and would be happy to test again post-release.
Thanks for taking a look. I really appreciate the work you have put into plotnine.
These are the values returned by breaker
below:
0 2022-12-19 00:00:00+00:00
1 2022-12-19 06:00:00+00:00
2 2022-12-19 12:00:00+00:00
3 2022-12-19 18:00:00+00:00
4 2022-12-20 00:00:00+00:00
This code will reproduce the attached plot. Note that data points and labels locations are not lined up.
limits=[
datetime.datetime(2022, 10, 19, 0, 0, 0, tzinfo=datetime.timezone.utc),
datetime.datetime(2022, 10, 20, 0, 0, 0, tzinfo=datetime.timezone.utc)
]
breaker = mizani.breaks.breaks_date('6 hours')
x = breaker(limits)
y = range(len(x))
p_df = pd.DataFrame({'x':x, 'y':y})
plot = (
gg.ggplot(p_df, gg.aes(x='x' , y='y'))
+ gg.geom_point()
+ gg.scale_x_datetime(
date_breaks='6 hours',
date_labels='%m-%d %H:%M',
limits=limits,
)
+ gg.theme(axis_text_x=gg.element_text(angle=30, hjust=1))
)
plot.show()
By default plotnine generates breaks for an expanded coordinate system. That means the real limits are wider than those passed in the scale
. That is how you get space on either end of the data limits!
You can turn off the expansion in one of two ways.
- Using the scale.
+ scale_x_datetime(
date_breaks='6 hours',
date_labels='%m-%d %H:%M',
limits=limits,
expand=(0, 0) # Effectively turns off the expansion
)
- Through the coordinates
+ coord_cartesian(expand=False)
Interesting. Is there a way to specify where the labels will appear? The default expansion looks good but what if I want to align my breaks with the start of each day without adding a +/- date_breaks to the limits of the graph?
but what if I want to align my breaks with the start of each day
Start of the day are good locations, so you can ask for as many breaks as there are days in your range.
Generally, you can try passing the exact number of breaks that you want. e.g.
+ scale_x_datetime(
date_breaks=5,
date_labels='%m-%d %H:%M',
limits=limits,
)
It tells the algorithm to generate maximum n
breaks. If the n
can be placed at "good" locations, they will be generated. Otherwise you get fewer breaks at "good" locations. It won't always work, but it tries to be smarter at where to place the breaks than when you specify the width
.
If that doesn't generalise well and you always know your limits and width, you can write a small function to generate the breaks for you
def my_breaks(start, n=10, **td_kwargs):
return [start + i*datetime.timedelta(**td_kwargs) for i in range(n)]
(
...
+ scale_x_datetime(
breaks=my_breaks(limits[0], hours=6),
date_labels='%m-%d %H:%M',
)
)