NREL/buildstockbatch

glue crawler is casting time variables as bigints instead of timestamps

mdahlhausen opened this issue · 2 comments

Describe the bug
When the glue crawler is populating the athena tables, it is casting the timestamp as a bigint
new

It used to cast it as a 'timestamp', which is more useful as we don't have to convert back.
old

To Reproduce
Steps to reproduce the behavior:

  1. Pick a recent .yml file.
  2. Run it

Expected behavior
I'd like the glue crawler to case the time variables as timestamps, not bigints

Platform (please complete the following information):

  • Simulation platform: [e.g. Eagle, AWS, local docker; please label with this as well]
  • BuildStockBatch version, branch, or sha:
  • resstock or comstock repo version, branch, or sha:
  • Local Desktop OS: [e.g. Windows, Mac, Linux, especially important if running locally]

Additional context
Add any other context about the problem here.

Yeah, this is due to a change in Pandas 2.0. The raw timeseries data is now exported in Unix time, with nanosecond resolution. I the from_unixtime method in Presto to convert the data from Unix time to timestamp for the SightGlass. https://github.com/NREL/SightGlassDataProcessing/blob/main/telescope/glue.py#L331-L339

if col.name == 'timestamp':
    # Convert bigint to timestamp type if necessary
    if str(col.type) == 'BIGINT':
        # Pandas uses nanosecond resolution integer timestamps.
        # Presto expects second resolution values in from_unixtime.
        # Must divide values by 1e9 to go from nanoseconds to seconds.
        cols.append(sa.func.from_unixtime(col / 1e9).label('timestamp'))
    else:
        cols.append(col)

My guess is that to change this back to writing timestamps as strings, you'd have to change the code in this method: https://github.com/NREL/buildstockbatch/blob/develop/buildstockbatch/postprocessing.py#L259-L276

I wonder if it would work correctly if you converted the dtypes to Arrow.