Investigate business days / hours impact
antoinecarme opened this issue · 4 comments
antoinecarme commented
Investigate business days / hours impact.
Impact : better handling of irregular physical time stamps (business dates). Incremental. May have a workaround with non-physical time equivalent.
- PyAF models do not take into account the weekends/lunch time when computing future dates.
- This can have some impact on date-dependent time-series models
- no impact on signal transformation
- Impact on time-based trends (linear , polynomial, etc). No impact on other/stochastic trends (lag1, etc)
- Impact on seasonal values (for DayOfWeek , will the t+3 be a Monday or a Thursday ?)
- No impact on AR-like models (previous date can skip week-ends).
- Business hours should be investigated further (next business hour skips lunch ;).
- The time column will be smarted and generate more business-friendly forecast dates and forecast values. Model explanation improves.
Implementation impacts :
- The delta of the dates is computed as the mean difference between two consecutive dates. the notion of "consecutive" will be impacted. Is most frequent diff better than average diff in this case?
- the next date value will skip some intermediate "non-business" values.
- For the tests, we can use pd.date_range and pd.bdate_range as time values
# 1000 consecutive business days
pd.bdate_range('2000-1-1', periods=1000)
# 1000 consecutive business hours
pd.bdate_range('2000-1-1', periods=1000, freq = 'BH')
- Impact on plots ?
- Activate by default ?
- Automatic detection based on HourOfWeek/DayOfWeek distribution ?
- Use pd.bdate_range implementation to compute the next date ?
- Not sure if this is not dependent on the locale/country/culture etc ...
antoinecarme commented
next business day
import pandas as pd
>>> pd.bdate_range('2000-1-1', periods=2)
DatetimeIndex(['2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='B')
>>> pd.bdate_range('2000-1-2', periods=2)
DatetimeIndex(['2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='B')
>>> pd.bdate_range('2000-1-3', periods=2)
DatetimeIndex(['2000-01-03', '2000-01-04'], dtype='datetime64[ns]', freq='B')
>>> pd.bdate_range('2000-1-4', periods=2)
DatetimeIndex(['2000-01-04', '2000-01-05'], dtype='datetime64[ns]', freq='B')
>>> lTwoNextBusinessDays = pd.bdate_range('2000-1-1', periods=2)
>>> lTwoNextBusinessDays[0]
Timestamp('2000-01-03 00:00:00', freq='B')
antoinecarme commented
>>> import pandas as pd
>>> def next_business_day(x):
... lNextTwoBusinessDays = pd.bdate_range(x, periods=2)
... lDays = [d for d in lNextTwoBusinessDays if (d > pd.Timestamp(x))]
... return lDays[0]
...
>>> next_business_day('2000-1-1')
Timestamp('2000-01-03 00:00:00', freq='B')
>>> next_business_day('2000-1-2')
Timestamp('2000-01-03 00:00:00', freq='B')
>>> next_business_day('2000-1-3')
Timestamp('2000-01-04 00:00:00', freq='B')
>>> next_business_day('2000-1-4')
Timestamp('2000-01-05 00:00:00', freq='B')
>>> next_business_day('2000-1-5')
Timestamp('2000-01-06 00:00:00', freq='B')
>>> next_business_day('2000-1-6')
Timestamp('2000-01-07 00:00:00', freq='B')
>>> next_business_day('2000-1-7')
Timestamp('2000-01-10 00:00:00', freq='B')
>>> next_business_day('2000-1-8')
Timestamp('2000-01-10 00:00:00', freq='B')
>>> next_business_day('2000-1-9')
Timestamp('2000-01-10 00:00:00', freq='B')
>>> next_business_day('2000-1-10')
Timestamp('2000-01-11 00:00:00', freq='B')
antoinecarme commented
>>> import pandas as pd
>>>
>>> def next_business_hour(x):
... lNextTwoBusinessHours = pd.date_range(x, periods=2, freq = 'BH')
... lHours = [h for h in lNextTwoBusinessHours if (h > pd.Timestamp(x))]
... print("next_business_hour" , (x , lHours[0]))
... return lHours[0]
...
>>> next_business_hour('2000-1-10 08:00:00')
next_business_hour ('2000-1-10 08:00:00', Timestamp('2000-01-10 09:00:00', freq='BH'))
Timestamp('2000-01-10 09:00:00', freq='BH')
>>> next_business_hour('2000-1-10 09:00:00')
next_business_hour ('2000-1-10 09:00:00', Timestamp('2000-01-10 10:00:00', freq='BH'))
Timestamp('2000-01-10 10:00:00', freq='BH')
>>> next_business_hour('2000-1-10 10:00:00')
next_business_hour ('2000-1-10 10:00:00', Timestamp('2000-01-10 11:00:00', freq='BH'))
Timestamp('2000-01-10 11:00:00', freq='BH')
>>> next_business_hour('2000-1-10 11:00:00')
next_business_hour ('2000-1-10 11:00:00', Timestamp('2000-01-10 12:00:00', freq='BH'))
Timestamp('2000-01-10 12:00:00', freq='BH')
>>> next_business_hour('2000-1-10 12:00:00')
next_business_hour ('2000-1-10 12:00:00', Timestamp('2000-01-10 13:00:00', freq='BH'))
Timestamp('2000-01-10 13:00:00', freq='BH')
>>> next_business_hour('2000-1-10 12:03:00')
next_business_hour ('2000-1-10 12:03:00', Timestamp('2000-01-10 13:03:00', freq='BH'))
Timestamp('2000-01-10 13:03:00', freq='BH')
>>> next_business_hour('2000-1-10 13:00:00')
next_business_hour ('2000-1-10 13:00:00', Timestamp('2000-01-10 14:00:00', freq='BH'))
Timestamp('2000-01-10 14:00:00', freq='BH')
>>> next_business_hour('2000-1-10 22:00:00')
next_business_hour ('2000-1-10 22:00:00', Timestamp('2000-01-11 09:00:00', freq='BH'))
Timestamp('2000-01-11 09:00:00', freq='BH')
>>> next_business_hour('2000-1-10 23:00:00')
next_business_hour ('2000-1-10 23:00:00', Timestamp('2000-01-11 09:00:00', freq='BH'))
Timestamp('2000-01-11 09:00:00', freq='BH')
>>> next_business_hour('2000-1-10 00:00:00')
next_business_hour ('2000-1-10 00:00:00', Timestamp('2000-01-10 09:00:00', freq='BH'))
Timestamp('2000-01-10 09:00:00', freq='BH')
>>>
antoinecarme commented
Not sure if this feature will be implemented. User value ?
Delayed. Priority : low