EPW.import_data_by_field() wrongly imports the first date
Closed this issue · 6 comments
Hello,
While trying to build a date range in Python from an .epw file using the module ladybug.epw.EPW.import_data_by_field (see documentation : https://www.ladybug.tools/ladybug/docs/ladybug.epw.html), I have noticed that the first date isn't properly imported.
The code I use is:
from ladybug.epw import *
from datetime import *
weather = EPW(path_to_epw_file) # ASSIGN PATH TO .EPW FILE
# Get the date_range for computing heating and cooling periods
date_range = [datetime(year, month, day, hour - 1) + timedelta(hours = 1) for year, month,
day, hour in zip(*[weather.import_data_by_field(i) for i in range(4)])]
# Show a sample of the dates in date_range
for date in date_range:
print(date)
The correction with timedelta is due to the fact that the hour range from EPW.import_Data_by_field(3) is [1, 24], while datetime accepts an hour range of [0, 23].
The problem only happens with the first row. Here are examples I have obtained with 2 epw files selected randomly from the EPW dataset:
- Link to epw file : https://energyplus.net/weather-location/africa_wmo_region_1/DZA//DZA_Algiers.603900_IWEC
The first row gives 2000-01-01 00:00:00
, while the later rows continue in 1986
- Link to epw file: https://energyplus.net/weather-location/north_and_central_america_wmo_region_4/USA/AK/USA_AK_Adak.NAS.704540_TMY
The first row gives 1968-01-01 00:00:00
, while the following rows continue in 1960
Thank you very much for your help :)
@AlexJew ,
Can you clarify what your question or request is here? If you are just asking whether the moving of the last datetime of the EPW to the start is intentional/desired, then I can say that this is all intentional. It was the best solution that we could think of if we wanted to use real datetime objects (eg. 0:00 instead of 24:00) that aligned with the time standard of the EPW format 😅 . We learned the hard way in our legacy plugin that trying to accommodate the EPW's standard creates a lot of confusion and issues with data analysis/manipulation. It's best to just be clear with everyone that, once you pass into the ladybug tools world, we use real datetimes and not the ones that the EPW uses. All conversion between these two happens in the import/export process from EPW.
You will see that, when you save the EPW back to a file, it puts the first datetime back where the EPW format expects it. Are you saying that you would like some other change to happen as part of the process of importing/exporting to/from EPW? Do you want use to change the year of that first datetime?
@chriswmackey
Hi, I am not talking about the use of real datetime objects. They can be easily accomodated. Rather, I was pointing out that the first year
object obtained from using EPW.import_data_by_field(0)
(0 corresponds to the year) gives repeatedly the wrong value.
To refer back to the examples I gave above :
Ex. 1 : year 2000 is given instead of 1986
Ex. 2 : year 1968 is given instead of 1960
This is the only bug I wanted to point out haha
Hi @AlexJew,
As @chriswmackey mentioned what you are seeing there happens because Ladybug has to shift the data from epw file. The first row in data that you see is the last row in the epw file. If you check the data in the last row then you will see the year will match. This is by design and is not a bug.
EPW starts from hour 1:00 and goes to hour 24:00. Ladybug (and almost all the datetime libraries) starts from 0:00 and goes to 23:00. In other words 31, Dec, 24:00
in EPW is considered 1, Jan, 0:00
in Ladybug. Hope this clarifies the reason behind the unexpected year in the first row.
I'm going to close this issue as it is not a bug. Feel free to re-open if you still think something unexpected is happening in data.
I was also confused by the shift ladybug performs when reading an EPW file. As @mostaphaRoudsari pointed out, EPW does start from hour 1:00, but, according the EnergyPlus documentation, the value at "Hour 1 is 00:01 to 01:00" (see https://bigladdersoftware.com/epx/docs/8-3/auxiliary-programs/energyplus-weather-file-epw-data-dictionary.html#field-hour). This would mean that for a python datetime at hour 00:00 should correspond to the first entry of the EPW file and not the last.
You are right, @samuelduchesne , but then what is the value for midnight on January 1? It still holds that the only datetime the EPW file provides for midnight on Jan 1 is at the end of the EPW and not the beginning.
But by assuming that the value for midnight on January 1 is the last value, are we not assuming that this last value will be the mean value for the whole first time step?
For example, with the EnergyPlus-9-2-0/WeatherData/USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw
, the last dry-bulb temperature is 11.1 degC. If we assume that this is the value at 2018-01-01 00:00, by definition of the date time convention, this will be the value for the reminder of the time step. Therefore the mean value of the first time step is 11.1 degC. But the mean value of the first time step of the EPW file (the first hour of the file) is 7.2 degC.
If we were to calculate the mean temperature for the whole day (first 24 values) of the dataset, we would get a different result for the EPW file vs ladybug.
I used the Minimal.idf example file with the San Francisco EPW and plotted the result of a simulation with 5 time steps per hour. I also read the epw file with pd.read_csv and forward-filled values assuming the first line of data is 2018-01-01 00:00. Finally, I parsed the epw with ladybug and also forward-filled the values with the 5 time step index. Here, I show the first 4 hours of the year.
As shown, ladybug uses 11.1 degC as the first hour value. Energy+, on the other hand, uses some weird interpolation to look backwards which creates a different mean value for the first hour—btw, E+ has been shown not to respect quantities (integrated values) when using sub-hourly time steps (see this paper I wrote. . . full disclosure!) Finally, the read_csv (raw_epw) shows the mean value over each time steps (more or less the truth).
No method gives the right answer; this is what is confusing me 😄
At the end of the day, what is important is that integrated values (quantities) remain constant; for example using pd.group_by().sum()