Support for using datetime values for x axis labels
Opened this issue ยท 18 comments
Hi again @olavolav. If I am not wrong, currently there is support for only using Number Datatypes on both the x and y axes. However, while performing time series analysis and plotting trends for the time, we often need to mention the time(month/ date etc.) as a string. I am looking for something similar to this but on the terminal.
I dont think that you need to perform any sorting for the date and such internally and we can probably mandate that the user itself sorts the list prior to plotting. Can this be supported?
Hi @gourisariah that's a good point, indeed plotting data with time stamps and/or categorical data would be great.
I suppose a simple workaround for now would be to ask the users to map the timestamps to "number of day since X" or similar before plotting. But I agree with you that what you showed would be much nicer.
I'll think about it, and will also refresh my memory on how matplotlib and other plotting libraries handle this.
Can't promise anything right now, but I'll look into it. Thanks for the input! ๐
Sure @olavolav. That would be great! Thanks a lot.
Adding my vote for this. Using datetime.datetime
would helpful tons of data visualizations.
Awesome, thanks @leighleighleigh ! I will have a detailed look once I had a bit more time.
@leighleighleigh Quick update: Inspired by your work, I have started work on this. Check out this branch if you are interested: https://github.com/olavolav/uniplot/tree/os_from_leighleighleighz-datetime_support
It is similar to your approach, but differs in a few areas to keep with uniplot's (as of yet undocumented) design goals:
- No additional dependencies, just Python and NumPy
- No required configuration, it should "just work". In building uniplot I tried to take on complexity as part of the library, such that the user could just throw data at the plot function and uniplot would figure out the rest.
- Nice & friendly axis labels: This is 99% missing right now, I'm just trying to find any date labels right now.
Example of what I have so far:
import numpy as np
dates = np.arange('2002-10-27T04:30', 4*60, 60, dtype='M8[m]')
plot(xs=dates, ys=[1,2,3,2])
yields:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ 3
โ โ
โ โ
โ โ
โ โ
โ โ
โ โ
โ โ
โ โ โโ 2
โ โ
โ โ
โ โ
โ โ
โ โ
โ โ
โ โ
โโ โ 1
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
04:52:21.900 05:37:27.300 06:22:32.700 07:07:38.100
I'll update this thread as work progresses.
@leighleighleigh I've opened a pull request with a first working version, feel free to check it out and let me know what you think.
Just checked it out - looks great! The auto-scaling axis labels are really nice.
Have confirmed our test suite still runs perfectly fine with this new version, too.
uniplot
has been fantastic for understanding our unit-test failures by the way, many thanks!
Just keeping this open until the feature is complete
And we are live as of version v0.12.1 ๐
Thanks everyone for your support and your patience ๐
Just for the record, I wanted to see how the original request from 2021 would look like now. Taking the stock price of Meta from Yahoo as a CSV, we have:
data = pd.read_csv("META.csv")
data["Timestamp"] = pd.to_datetime(data["Date"])
plot(xs=data.Timestamp, ys=data.Close, title="Meta stock price", y_unit=" $", lines=True)
which yields
Meta stock price
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โโ
โ โโ
โ โโ 400 $
โ โโ โโ
โ โโ โ โ
โ โ โโโ โ โ โ
โ โ โ โ โโโ โ 300 $
โ โโโโ โ โโ โ
โ โ โโ โ โ โ
โ โ โโ โ โ
โ โโ โ โโโโโโโ โ โ โ 200 $
โ โโโโโ โโโโโ โ โ โโ โ โ
โ โโโโโ โโ โโโ โ
โ โโโโ โโ โโ โ 100 $
โ โโโโโโโ โ โ
โ โโโโโ โ
โโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2013 2014 2016 2017 2019 2020 2021 2023
There is room for improvement, but it works for the moment, I would say.
To everyone here, as of v0.14.1 (released today) we have support for datetime plotting that actually works and looks nice. Feel free to test it out and let me know if it works for you.
Only 3 years in the making ๐
For the record, this is how the example listed above looks like today:
>>> data = pd.read_csv("~/Downloads/META.csv")
>>> data["Timestamp"] = pd.to_datetime(data["Date"])
>>> plot(xs=data.Timestamp, ys=data.Close, title="Meta stock price", y_unit=" $", lines=True)
Meta stock price
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โโโ 500 $
โ โโโโ
โ โโ โ
โ โโโ โ
โ โ โ โ 400 $
โ โโ โ โ
โ โโโโ โ โ โ
โ โ โโ โ โโ โ 300 $
โ โโโโ โ โโ โ
โ โ โ โ โ โ
โ โโโโ โโ โ โ 200 $
โ โโโโโโโ โโโโโโ โ โ โ
โ โโ โ โโโโ โ โโโโ โ
โ โโโโโโ โ โโ โ
โ โโโโโโโโ โ โ 100 $
โ โโโโโโโ โ
โโโโโโโ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2016 2020 2024
This
has been long-waiting to be realised! Uncommenting it still does not work, but it's not far :-)
I get the following error :
โ 62 โ โ # In the end, the dimensions of xs and ys need to match โ
โ โฑ 63 โ โ assert len(self.xs) == len(self.ys) โ
โ 64 โ โ assert [len(xs_row) for xs_row in self.xs] == [ โ
โ 65 โ โ โ len(ys_row) for ys_row in self.ys โ
โ 66 โ โ ] โ
โ โ
โ โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ locals โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ
โ โ self = <uniplot.multi_series.MultiSeries object at 0x7a0a2567d250> โ โ
โ โ xs = DatetimeIndex(['2020-01-01 00:00:00+00:00', '2020-01-01 01:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 02:00:00+00:00', '2020-01-01 03:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 04:00:00+00:00', '2020-01-01 05:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 06:00:00+00:00', '2020-01-01 07:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 08:00:00+00:00', '2020-01-01 09:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 10:00:00+00:00', '2020-01-01 11:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 12:00:00+00:00', '2020-01-01 13:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 14:00:00+00:00', '2020-01-01 15:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 16:00:00+00:00', '2020-01-01 17:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 20:00:00+00:00', '2020-01-01 21:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-01 22:00:00+00:00', '2020-01-01 23:00:00+00:00', โ โ
โ โ โ โ โ '2020-01-02 00:00:00+00:00'], โ โ
โ โ โ โ โ dtype='datetime64[ns, UTC]', freq='h') โ โ
โ โ ys = [ โ โ
โ โ โ <xarray.DataArray (time: 25)> Size: 100B โ โ
โ โ array([ 0. , 0. , 0. , 0. , 0. , 0. , โ โ
โ โ โ โ 0. , 0. , 55.747078, 43.124203, 32.00251 , 24.715824, โ โ
โ โ โ 24.944908, 32.525303, 43.76436 , 56.4261 , 0. , 0. , โ โ
โ โ โ โ 0. , 0. , 0. , 0. , 0. , 0. , โ โ
โ โ โ โ 0. ], dtype=float32) โ โ
โ โ Coordinates: โ โ
โ โ * time (time) object 200B 1577836800000000000 ... 1577923200000000000, โ โ
โ โ โ <xarray.DataArray (time: 25)> Size: 100B โ โ
โ โ array([-66.35813 , -60.644665 , -51.892258 , -41.882614 , -31.464993 , โ โ
โ โ โ -21.09161 , -11.080342 , -1.6082226, 6.6248665, 13.403036 , โ โ
โ โ โ โ 18.310144 , 20.869581 , 20.797829 , 18.107506 , 13.092004 , โ โ
โ โ โ โ 6.235771 , -2.101055 , -11.5761795, -21.608566 , -31.98757 , โ โ
โ โ โ -42.39148 , -52.355972 , -60.999275 , -66.48172 , -66.3329 ], โ โ
โ โ โ dtype=float32) โ โ
โ โ Coordinates: โ โ
โ โ * time (time) object 200B 1577836800000000000 ... 1577923200000000000, โ โ
โ โ โ <xarray.DataArray (time: 25)> Size: 100B โ โ
โ โ array([ 18.413588, 46.935772, 66.35138 , 80.34414 , 91.67556 , โ โ
โ โ โ 101.85798 , 111.785736, 122.226585, 133.25078 , 145.36688 , โ โ
โ โ โ 158.80125 , 173.51736 , 187.32233 , 201.94618 , 215.31096 , โ โ
โ โ โ 227.3541 , 238.38051 , 248.74635 , 258.671 , 268.88892 , โ โ
โ โ โ 280.31454 , 294.50522 , 314.29495 , 343.2883 , 18.266356], โ โ
โ โ โ dtype=float32) โ โ
โ โ Coordinates: โ โ
โ โ * time (time) object 200B 1577836800000000000 ... 1577923200000000000 โ โ
โ โ ] โ โ
โ โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
AssertionError
The failing reason here is that ys
is a list of length 3, each of the items in the list being an array of 25 values. The timestamps is a DatetimeIndex of length 25. Wouldn't it make sense to address this without the need to replicate the timestamps as many times as the ys
for the xs
input. What do you think ?
I tried to quickly fix this by adding more tests in uniplot/multi_series.py
but then it messes-up things further in
def _safe_maxs(series: List) -> float:
return max([_safe_max(row) for row in series if len(row) > 0])
Just for the sake of it, doing
timestamps_series = [timestamps] * len(y_series) # Create a list of the same DatetimeIndex for each series in ys
plot(
xs=timestamps_series,
ys=y_series,
legend_labels=legend_labels,
lines=lines,
title=title if title else supertitle,
y_unit=" " + str(unit),
)
gives the expected plot
However, wouldn't it be better to not replicate the timestamps, since it'll be always the same timestamps ?
Hi @NikosAlexandris glad to hear the plotting with timestamps worked for you, at least once you manually duplicated the xs
.
You are right that we could add such logic but to be honest I would rather not do that, as that part of the code is already complicated enough in order to support what we have today, including adding data from pandas or polars.
In your case, [timestamps] * len(y_series)
is probably the best solution, exactly as you proposed
Hi @NikosAlexandris glad to hear the plotting with timestamps worked for you, at least once you manually duplicated the
xs
.You are right that we could add such logic but to be honest I would rather not do that, as that part of the code is already complicated enough in order to support what we have today, including adding data from pandas or polars.
In your case,
[timestamps] * len(y_series)
is probably the best solution, exactly as you proposed
I understand. I still wish it's something to be done for the user, as well as to save memory (?). I am working with quite some large time series. Cheers!