olavolav/uniplot

Support for using datetime values for x axis labels

Opened this issue ยท 18 comments

Hi again @olavolav. If I am not wrong, currently there is support for only using Number Datatypes on both the x and y axes. However, while performing time series analysis and plotting trends for the time, we often need to mention the time(month/ date etc.) as a string. I am looking for something similar to this but on the terminal.
image

I dont think that you need to perform any sorting for the date and such internally and we can probably mandate that the user itself sorts the list prior to plotting. Can this be supported?

Hi @gourisariah that's a good point, indeed plotting data with time stamps and/or categorical data would be great.

I suppose a simple workaround for now would be to ask the users to map the timestamps to "number of day since X" or similar before plotting. But I agree with you that what you showed would be much nicer.

I'll think about it, and will also refresh my memory on how matplotlib and other plotting libraries handle this.

Can't promise anything right now, but I'll look into it. Thanks for the input! ๐Ÿ˜„

Sure @olavolav. That would be great! Thanks a lot.

Adding my vote for this. Using datetime.datetime would helpful tons of data visualizations.

I've got a proof-of-concept implementation of this here, which I haven't PR'd due to it being tangled up with other changes.
I'd appreciate your thoughts on it, @olavolav - it feels a little hacky to me, but it does work :)

Awesome, thanks @leighleighleigh ! I will have a detailed look once I had a bit more time.

@leighleighleigh Quick update: Inspired by your work, I have started work on this. Check out this branch if you are interested: https://github.com/olavolav/uniplot/tree/os_from_leighleighleighz-datetime_support

It is similar to your approach, but differs in a few areas to keep with uniplot's (as of yet undocumented) design goals:

  1. No additional dependencies, just Python and NumPy
  2. No required configuration, it should "just work". In building uniplot I tried to take on complexity as part of the library, such that the user could just throw data at the plot function and uniplot would figure out the rest.
  3. Nice & friendly axis labels: This is 99% missing right now, I'm just trying to find any date labels right now.

Example of what I have so far:

import numpy as np
dates =  np.arange('2002-10-27T04:30', 4*60, 60, dtype='M8[m]')
plot(xs=dates, ys=[1,2,3,2])

yields:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                       โ–                    โ”‚ 3
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                    โ–˜                                      โ–โ”‚ 2
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚                                                            โ”‚
โ”‚โ––                                                           โ”‚ 1
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  04:52:21.900   05:37:27.300   06:22:32.700   07:07:38.100

I'll update this thread as work progresses.

@leighleighleigh I've opened a pull request with a first working version, feel free to check it out and let me know what you think.

Just checked it out - looks great! The auto-scaling axis labels are really nice.
Have confirmed our test suite still runs perfectly fine with this new version, too.
uniplot has been fantastic for understanding our unit-test failures by the way, many thanks!

Just keeping this open until the feature is complete

And we are live as of version v0.12.1 ๐Ÿš€

Thanks everyone for your support and your patience ๐Ÿ˜„

Just for the record, I wanted to see how the original request from 2021 would look like now. Taking the stock price of Meta from Yahoo as a CSV, we have:

data = pd.read_csv("META.csv")
data["Timestamp"] = pd.to_datetime(data["Date"])
plot(xs=data.Timestamp, ys=data.Close, title="Meta stock price", y_unit=" $", lines=True)

which yields

                       Meta stock price
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                           โ–โ”‚
โ”‚                                                           โ–žโ”‚
โ”‚                                                           โ–Œโ”‚ 400 $
โ”‚                                              โ–—โ––           โ–Œโ”‚
โ”‚                                              โ–Œโ–Œ          โ– โ”‚
โ”‚                                             โ–ž โ–šโ–žโ––      โ–— โ–› โ”‚
โ”‚                                         โ–—  โ–    โ–Œ      โ–žโ–žโ–˜ โ”‚ 300 $
โ”‚                                         โ–โ–™โ–œโ–ž    โ–Œ     โ–—โ–˜   โ”‚
โ”‚                                         โ–Œ โ–โ–˜    โ–Œ     โ–ž    โ”‚
โ”‚                                        โ–ž        โ–œโ––    โ–Œ    โ”‚
โ”‚                           โ––โ–— โ–Ÿ   โ–โ–Ÿโ–šโ–โ–€โ–™โ–˜         โ–œ   โ–     โ”‚ 200 $
โ”‚                         โ–—โ–€โ–โ–˜โ–› โ–€โ––โ–โ–žโ–˜ โ–˜ โ–           โ–™โ–– โ–ž     โ”‚
โ”‚                     โ–„โ––โ–—โ–€โ–€      โ–โ–Œ                  โ–™โ–—โ–˜     โ”‚
โ”‚                 โ–„โ–šโ–€โ–€ โ–โ–˜                            โ–โ–›      โ”‚ 100 $
โ”‚          โ–—โ–„โ–„โ–„โ–„โ–€โ–€                                   โ–       โ”‚
โ”‚      โ–„โ–„โ–€โ–€โ–˜                                                 โ”‚
โ”‚โ–™โ–„โ–„โ–„โ–„โ–žโ–˜                                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  2013    2014   2016    2017   2019    2020   2021    2023

There is room for improvement, but it works for the moment, I would say.

To everyone here, as of v0.14.1 (released today) we have support for datetime plotting that actually works and looks nice. Feel free to test it out and let me know if it works for you.

Only 3 years in the making ๐Ÿ˜„

cc @ryanwwest @NikosAlexandris @jladdjr @leighleighleigh

For the record, this is how the example listed above looks like today:

>>> data = pd.read_csv("~/Downloads/META.csv")
>>> data["Timestamp"] = pd.to_datetime(data["Date"])
>>> plot(xs=data.Timestamp, ys=data.Close, title="Meta stock price", y_unit=" $", lines=True)
                       Meta stock price
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                          โ–—โ–โ”‚ 500 $
โ”‚                                                         โ–ˆโ–โ–Œโ”‚
โ”‚                                                         โ–ˆโ–› โ”‚
โ”‚                                                        โ–—โ–˜โ–˜ โ”‚
โ”‚                                             โ––          โ–   โ”‚ 400 $
โ”‚                                            โ–โ–Œ          โ–ž   โ”‚
โ”‚                                           โ–„โ–˜โ–šโ–Ÿ       โ–– โ–Œ   โ”‚
โ”‚                                        โ–– โ–—โ–˜   โ–Œ      โ–ˆโ–Ÿ    โ”‚ 300 $
โ”‚                                       โ–โ–™โ–€โ–Ÿ    โ–Œ     โ–—โ–˜     โ”‚
โ”‚                                       โ–ž  โ–    โ–Œ     โ–Œ      โ”‚
โ”‚                                    โ–„โ––โ–žโ–˜       โ–›โ––   โ–       โ”‚ 200 $
โ”‚                        โ–—โ–„โ–„โ–™โ–—โ–šโ––  โ–Ÿโ–€โ–šโ–˜โ–โ–Œ         โ–   โ–ž       โ”‚
โ”‚                       โ–„โ–Ÿ   โ–˜ โ–โ–„โ–›โ–˜    โ–˜          โ–€โ––โ–—โ–˜       โ”‚
โ”‚                 โ–—โ–„โ–„โ–€โ–šโ–ž        โ–                  โ–šโ–Ÿ        โ”‚
โ”‚           โ–„โ––โ–„โ–„โ–šโ–›โ–˜โ–˜                               โ–         โ”‚ 100 $
โ”‚      โ–„โ–„โ–šโ–›โ–€โ–โ–                                               โ”‚
โ”‚โ–„โ–„โ–„โ–„โ–„โ–€ โ–˜                                                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                2016                2020               2024

This

2024-08-31T14:53:20,327039149+03:00

has been long-waiting to be realised! Uncommenting it still does not work, but it's not far :-)

I get the following error :

โ”‚    62 โ”‚   โ”‚   # In the end, the dimensions of xs and ys need to match                            โ”‚
โ”‚ โฑ  63 โ”‚   โ”‚   assert len(self.xs) == len(self.ys)                                                โ”‚
โ”‚    64 โ”‚   โ”‚   assert [len(xs_row) for xs_row in self.xs] == [                                    โ”‚
โ”‚    65 โ”‚   โ”‚   โ”‚   len(ys_row) for ys_row in self.ys                                              โ”‚
โ”‚    66 โ”‚   โ”‚   ]                                                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ locals โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ          โ”‚
โ”‚ โ”‚ self = <uniplot.multi_series.MultiSeries object at 0x7a0a2567d250>                  โ”‚          โ”‚
โ”‚ โ”‚   xs = DatetimeIndex(['2020-01-01 00:00:00+00:00', '2020-01-01 01:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 02:00:00+00:00', '2020-01-01 03:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 04:00:00+00:00', '2020-01-01 05:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 06:00:00+00:00', '2020-01-01 07:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 08:00:00+00:00', '2020-01-01 09:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 10:00:00+00:00', '2020-01-01 11:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 12:00:00+00:00', '2020-01-01 13:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 14:00:00+00:00', '2020-01-01 15:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 16:00:00+00:00', '2020-01-01 17:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 20:00:00+00:00', '2020-01-01 21:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-01 22:00:00+00:00', '2020-01-01 23:00:00+00:00',     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚      '2020-01-02 00:00:00+00:00'],                                 โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   โ”‚     dtype='datetime64[ns, UTC]', freq='h')                         โ”‚          โ”‚
โ”‚ โ”‚   ys = [                                                                            โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   <xarray.DataArray (time: 25)> Size: 100B                                 โ”‚          โ”‚
โ”‚ โ”‚        array([ 0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   0.      ,  0.      , 55.747078, 43.124203, 32.00251 , 24.715824,     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚      24.944908, 32.525303, 43.76436 , 56.4261  ,  0.      ,  0.      ,     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   0.      ], dtype=float32)                                            โ”‚          โ”‚
โ”‚ โ”‚        Coordinates:                                                                 โ”‚          โ”‚
โ”‚ โ”‚          * time     (time) object 200B 1577836800000000000 ... 1577923200000000000, โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   <xarray.DataArray (time: 25)> Size: 100B                                 โ”‚          โ”‚
โ”‚ โ”‚        array([-66.35813  , -60.644665 , -51.892258 , -41.882614 , -31.464993 ,      โ”‚          โ”‚
โ”‚ โ”‚        โ”‚      -21.09161  , -11.080342 ,  -1.6082226,   6.6248665,  13.403036 ,      โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚   18.310144 ,  20.869581 ,  20.797829 ,  18.107506 ,  13.092004 ,      โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   โ”‚    6.235771 ,  -2.101055 , -11.5761795, -21.608566 , -31.98757  ,      โ”‚          โ”‚
โ”‚ โ”‚        โ”‚      -42.39148  , -52.355972 , -60.999275 , -66.48172  , -66.3329   ],     โ”‚          โ”‚
โ”‚ โ”‚        โ”‚     dtype=float32)                                                         โ”‚          โ”‚
โ”‚ โ”‚        Coordinates:                                                                 โ”‚          โ”‚
โ”‚ โ”‚          * time     (time) object 200B 1577836800000000000 ... 1577923200000000000, โ”‚          โ”‚
โ”‚ โ”‚        โ”‚   <xarray.DataArray (time: 25)> Size: 100B                                 โ”‚          โ”‚
โ”‚ โ”‚        array([ 18.413588,  46.935772,  66.35138 ,  80.34414 ,  91.67556 ,           โ”‚          โ”‚
โ”‚ โ”‚        โ”‚      101.85798 , 111.785736, 122.226585, 133.25078 , 145.36688 ,           โ”‚          โ”‚
โ”‚ โ”‚        โ”‚      158.80125 , 173.51736 , 187.32233 , 201.94618 , 215.31096 ,           โ”‚          โ”‚
โ”‚ โ”‚        โ”‚      227.3541  , 238.38051 , 248.74635 , 258.671   , 268.88892 ,           โ”‚          โ”‚
โ”‚ โ”‚        โ”‚      280.31454 , 294.50522 , 314.29495 , 343.2883  ,  18.266356],          โ”‚          โ”‚
โ”‚ โ”‚        โ”‚     dtype=float32)                                                         โ”‚          โ”‚
โ”‚ โ”‚        Coordinates:                                                                 โ”‚          โ”‚
โ”‚ โ”‚          * time     (time) object 200B 1577836800000000000 ... 1577923200000000000  โ”‚          โ”‚
โ”‚ โ”‚        ]                                                                            โ”‚          โ”‚
โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ          โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
AssertionError

The failing reason here is that ys is a list of length 3, each of the items in the list being an array of 25 values. The timestamps is a DatetimeIndex of length 25. Wouldn't it make sense to address this without the need to replicate the timestamps as many times as the ys for the xs input. What do you think ?

I tried to quickly fix this by adding more tests in uniplot/multi_series.py but then it messes-up things further in

def _safe_maxs(series: List) -> float:
    return max([_safe_max(row) for row in series if len(row) > 0])

Just for the sake of it, doing

timestamps_series = [timestamps] * len(y_series)  # Create a list of the same DatetimeIndex for each series in ys
plot(
    xs=timestamps_series,
    ys=y_series,
    legend_labels=legend_labels,
    lines=lines,
    title=title if title else supertitle,
    y_unit=" " + str(unit),
)

gives the expected plot

2024-08-31T15:29:13,417602762+03:00

However, wouldn't it be better to not replicate the timestamps, since it'll be always the same timestamps ?

Hi @NikosAlexandris glad to hear the plotting with timestamps worked for you, at least once you manually duplicated the xs.

You are right that we could add such logic but to be honest I would rather not do that, as that part of the code is already complicated enough in order to support what we have today, including adding data from pandas or polars.

In your case, [timestamps] * len(y_series) is probably the best solution, exactly as you proposed

Hi @NikosAlexandris glad to hear the plotting with timestamps worked for you, at least once you manually duplicated the xs.

You are right that we could add such logic but to be honest I would rather not do that, as that part of the code is already complicated enough in order to support what we have today, including adding data from pandas or polars.

In your case, [timestamps] * len(y_series) is probably the best solution, exactly as you proposed

I understand. I still wish it's something to be done for the user, as well as to save memory (?). I am working with quite some large time series. Cheers!