Nixtla/hierarchicalforecast

TopDown returns NaN

Closed this issue · 10 comments

@jmoralez thanks for the response, in my case, my code was correct I provided the in-sample predictions in Y_df but still the TopDown results are "NaN". it works for the BottomUp and other methods that I tested like "OptimalCombination" and "MinTrace". So it is strange that is returning NaN for the TopDown method.

any recommendation, please ? below is a snippet of my code:

image

Originally posted by @mjsandoval04 in #253 (comment)

Hey @mjsandoval04. Do you have zeros in your insample predictions?

as a matter of fact yes I do have some "0" values in some of my in-sample predictions, corresponding to "0" sales values for that period, here is a snippet of my data.
I tried to put a small value like "1" and a large value of "1000" but still got "NaN" for the TopDown method.
what should I do?

image

What about for CES? I think there's a division by zero going on. Can you try adding some small values to both columns (y and CES)?

I have checked my data several times and forecast results (the Y_hat_df and Y_df) When it comes to the forecast "CES" there are no "zeros" (as it should be) and in the in-sample df only "y" has the "0" meaning there is forecast greater than 0 for that period although the actual sales were "0".

I have tried adding values to the zeros in column "y" and still I'm getting NaN. For example for the zeros make "y" equals to "CES".
I also changed the forecasting method for example "SES" and I'm getting the same results.

to my understanding if CES (which is the forecast is greater than 0) then the TopDown should return a value. am I missing something?
image

Can you provide a reproducible example? The following works fine:

import numpy as np
import pandas as pd
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import TopDown
from hierarchicalforecast.utils import aggregate

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
df = df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
df.insert(0, 'Country', 'Australia')
spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'State', 'Region'], 
]
y_df, s_df, tags = aggregate(df, spec)
y_df = y_df.reset_index()
y_df.loc[y_df['unique_id'] == 'Australia', 'y'] = 0.1
y_df['model'] = np.random.rand(y_df.shape[0])
valid = y_df.groupby('unique_id').tail(12)
train = y_df.drop(valid.index)
hrec = HierarchicalReconciliation(reconcilers=[TopDown(method='average_proportions')])
hrec.reconcile(Y_hat_df=valid, Y_df=train, S=s_df, tags=tags)

yes, the example described in the lib documentation worked for me as well.
here is Jupiter notebook and I've uploaded the data for your reference (here is the link for the excel files)

data: https://drive.google.com/drive/folders/1Ix_noPRb70KUaMtMy9LYu-4xxcdHwq5O?usp=sharing
Jupiter NB:
TopDown returns NaN_test.zip

PS, apologies I'm a newbie when it comes to GitHub I don't know how to paste the code as you did so I just uploaded the files

Hello @jmoralez were u able to reproduce my example?

Yes, you have a serie that is shorter than the others which produces null values in data2. You can use the following to add the missing dates and fill them with zero:

# %pip install utilsforecast if necessary
from utilsforecast.preprocessing import fill_gaps

data2['ds'] = pd.to_datetime(data2['ds'])
data2_filled = fill_gaps(data2.reset_index(), start='global', end='global', freq='M')
data2_filled = data2_filled.fillna(0)
p_rec = rec_model.reconcile(Y_hat_df=data1, Y_df=data2_filled, S=S_train, tags=tags)

@jmoralez thank you for the feedback brother! it works flawlessly :)

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.