glue crawler is casting time variables as bigints instead of timestamps
mdahlhausen opened this issue · 2 comments
Describe the bug
When the glue crawler is populating the athena tables, it is casting the timestamp as a bigint
It used to cast it as a 'timestamp', which is more useful as we don't have to convert back.
To Reproduce
Steps to reproduce the behavior:
- Pick a recent .yml file.
- Run it
Expected behavior
I'd like the glue crawler to case the time variables as timestamps, not bigints
Platform (please complete the following information):
- Simulation platform: [e.g. Eagle, AWS, local docker; please label with this as well]
- BuildStockBatch version, branch, or sha:
- resstock or comstock repo version, branch, or sha:
- Local Desktop OS: [e.g. Windows, Mac, Linux, especially important if running locally]
Additional context
Add any other context about the problem here.
Yeah, this is due to a change in Pandas 2.0. The raw timeseries data is now exported in Unix time, with nanosecond resolution. I the from_unixtime
method in Presto to convert the data from Unix time to timestamp for the SightGlass. https://github.com/NREL/SightGlassDataProcessing/blob/main/telescope/glue.py#L331-L339
if col.name == 'timestamp':
# Convert bigint to timestamp type if necessary
if str(col.type) == 'BIGINT':
# Pandas uses nanosecond resolution integer timestamps.
# Presto expects second resolution values in from_unixtime.
# Must divide values by 1e9 to go from nanoseconds to seconds.
cols.append(sa.func.from_unixtime(col / 1e9).label('timestamp'))
else:
cols.append(col)
My guess is that to change this back to writing timestamps as strings, you'd have to change the code in this method: https://github.com/NREL/buildstockbatch/blob/develop/buildstockbatch/postprocessing.py#L259-L276
I wonder if it would work correctly if you converted the dtypes to Arrow.