DHI/mikeio

Read dfsu for future dates (beyond year 2262): Out of bounds nanosecond timestamp error

Closed this issue · 2 comments

Describe the bug
When reading in a dfsu (and other dfs* files - not tested) and converting to Pandas datetime with dates before 1677-09-22 and beyond 2262-04-11 an error is flagged:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp

To Reproduce
data = {
"company":{"0":"Comp1","2":"Comp2"},
"date":{"0":'2020-01-01',"2":'2500-01-01'},
"sales":{"0":"17","2":"27"}}

df = pd.DataFrame(data)

pd.to_datetime(df['date'])

Error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2500-01-01 00:00:00, at position 1.

This is a Pandas bug: pandas represents timestamps in nanosecond resolution,
the timespan that can be represented using a 64-bit integer is limited to approximately 584 years
'1677-09-22 00:12:43.145225' to '2262-04-11 23:47:16.854775807'
see: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations

Need to get a work around in order to read the data at least and possibly reconstruct the dataframe with the correct time series.

Wow! It is not very common to simulate pre-historic times or way into the future...

Would you like to submit a PR to fix this?

You would need to modify this line https://github.com/DHI/mikeio/blob/149b20c3502def504d62760fe5944ace8760dad0/mikeio/dfsu/_dfsu.py#L783C1-L783C1

to something like this, where you pass in time_unit in more appropriate unit than ns.

freq = pd.Timedelta(seconds=self.timestep, unit=time_unit)
        time = pd.date_range(
            start=self.start_time, periods=self.n_timesteps, freq=freq, unit=time_unit
        )

Or are there more to it🤔?

I will close this for now, I am under the impression that this is a niche need. @fveeden if you have this need, please submit a PR and we can work this out together.