worldbank/REaLTabFormer

Bug in process_datetime_data() converting datetime to int

efstathios-chatzikyriakidis opened this issue · 2 comments

Hi @avsolatorio,

I hope you are well.

The fix #72 is correct and allows to use latest pandas package. However, I am still blocked because of the line:

https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/data_utils.py#L265

There are cases where that could fail, e.g. I have tested in a Windows conda env and failed because bare int was translated to int32. Don't ask me why! My last conclusion was that it is related to Windows implementation of things as I have tested the same code and data and it succeeded in Google Colab and in an Ubuntu Linux container on the same Windows host (64bit machine) using WSL.

I think we can be more explicit and use int64 as datetimes are actually 64bit values, this will be in consistency with the following as well:

https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/data_utils.py#L271

So, I suggest to change it from

series = (series.astype(int) / 1e9)

to:

series = (series.astype('int64') / 1e9)

Can you help me on this? I will need a new PyPI version also (1.0.7).

Thank you!

Hello @efstathios-chatzikyriakidis , thanks for letting me know about the root cause likely being because of windows env. The patch is already published.

I highly recommend that you create a PR if you find some of these changes in the future! 😀

Hi @avsolatorio,

Yes, in the future in case I'll find some bug and it is easy to suggest a solution like this one, I will provide a PR.

Thank you so much!