mikekeith52/scalecast

VECM

michellebaugraczyk opened this issue · 3 comments

Hi Michael,

VECM has been showing a frequency error when you have gaps in your data. I would like to know if it's possible to correct that for all type of frequency even if we have gaps in the data like other models in scalecast that works well for this cases.

Best regards,

Michelle

If you have missing values in the data, it is most likely a statsmodels native issue: statsmodels/statsmodels#3534

Just in case, I will change how the freq argument is specified in the vecm model to see if that fixes the issue and that will be implemented in 0.14.4, planned for implementation on 9/23/22.

Please test the model from 0.14.4 to see if you have the same issue. Thanks, as always, for raising the issue!

I happened to run an example recently where I was able to reproduce this error. I'm seeing that it is most likely from using business-day data. Sometimes business days from various data sources don't line up with the business day definition used by pandas. To fix that, you can use df = df.asfreq('B', method='ffill'). Replace 'ffill' with the na-fill method of your choice in case nulls are introduced in this process. Make sure the dataframe's index is the datetime column. Here is an example where this would work:

import pandas_datareader as pdr
from scalecast.Forecaster import Forecaster
from scalecast.MVForecaster import MVForecaster

FANG = [
    'META',
    'AMZN',
    'NFLX',
    'GOOG',
]

fs = []
for sym in FANG:
    df = pdr.get_data_yahoo(sym)
    df = df.asfreq('B', method='ffill') # fixes the issue
    f = Forecaster(
        y=df['Close'],
        current_dates = df.index,
        future_dates = 65,
        end = '2022-09-30',
    )
    fs.append(f)
    
mvf = MVForecaster(*fs,names=FANG)

I think this is something that users will have to do in pandas before loading to a scalecast object, as I don't know how this could be implemented into the package.