Nixtla/hierarchicalforecast

StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0

Closed this issue · 5 comments

What happened + What you expected to happen

Carryover from: #234

I upgraded my installation to 0.4.0. However, upon running my script (no code changed), I am now getting the below error. This seems to be inducing an error in using StatsForecast AutoETS models now. I also tried StatsForecast HoltWinters and received the same error.

This error did not raise with the same dataset & script when running 0.3.0 - any ideas what might have changed the behavior from 0.3.0 to 0.4.0?


NotImplementedError Traceback (most recent call last)
Cell In[8], line 109
98 #valid_agg_reset = valid_agg.reset_index()
100 model = StatsForecast(models=[
102 AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA')
(...)
107 ],
108 freq='MS', n_jobs=1, verbose=True)
--> 109 model.fit(train_agg)
111 p = model.forecast(h=h_months, fitted=True)
112 p_fitted = model.forecast_fitted_values()

File ~/lib/python3.10/site-packages/statsforecast/core.py:880, in StatsForecast.fit(self, df, sort_df, prediction_intervals)
878 self.prepare_fit(df, sort_df)
879 if self.n_jobs == 1:
--> 880 self.fitted
= self.ga.fit(models=self.models)
881 else:
882 self.fitted
= self._fit_parallel()

File ~/lib/python3.10/site-packages/statsforecast/core.py:77, in GroupedArray.fit(self, models)
75 for i_model, model in enumerate(models):
76 new_model = model.new()
---> 77 fm[i, i_model] = new_model.fit(y=y, X=X)
78 return fm

File ~/lib/python3.10/site-packages/statsforecast/models.py:650, in AutoETS.fit(self, y, X)
628 def fit(
629 self,
630 y: np.ndarray,
631 X: Optional[np.ndarray] = None,
632 ):
633 """Fit the Exponential Smoothing model.
634
635 Fit an Exponential Smoothing model to a time series (numpy array) y
(...)
648 Exponential Smoothing fitted model.
649 """
--> 650 self.model_ = ets_f(
651 y, m=self.season_length, model=self.model, damped=self.damped
652 )
653 self.model_["actual_residuals"] = y - self.model_["fitted"]
654 self._store_cs(y=y, X=X)

File ~/lib/python3.10/site-packages/statsforecast/ets.py:1241, in ets_f(y, m, model, damped, alpha, beta, gamma, phi, additive_only, blambda, biasadj, lower, upper, opt_crit, nmse, bounds, ic, restrict, allow_multiplicative_trend, use_initial_values, maxit)
1238 # ses for non-optimized tiny datasets
1239 if n <= npars + 4:
1240 # we need HoltWintersZZ function
-> 1241 raise NotImplementedError("tiny datasets")
1242 # fit model (assuming only one nonseasonal model)
1243 if errortype == "Z":

NotImplementedError: tiny datasets

Versions / Dependencies


dateutil 2.8.2
hierarchicalforecast 0.4.0
matplotlib 3.7.1
numpy 1.23.5
pandas 2.0.2
session_info 1.0.0
statsforecast 1.6.0


IPython 8.14.0
jupyter_client 8.2.0
jupyter_core 5.3.0
notebook 6.5.4

Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
Linux-4.18.0-372.16.1.0.1.el8_6.x86_64-x86_64-with-glibc2.35

Reproduction script

model = StatsForecast(models=[AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA') ], freq='MS', n_jobs=1, verbose=True)
model.fit(train_agg)

The call to model.fit generates the NotImplementedError: tiny datasets from statsforecast/ets.py

The same code executes successfully when running version 0.3.0, instead of 0.4.0

Issue Severity

High: It blocks me from completing my task.

Hey. Without an example it's hard to tell. Are you using aggregate? #189 was fixed in 0.4.0, so you were maybe getting leading zeros giving your series some more samples, which is no longer the case.

Hi @jmoralez , I inspected the train_agg dataframe (produced using the aggregate function) for 0.3.0 vs 0.4.0

I'm inspecting the result of this line:
train_agg, S_train, tags = aggregate(df_train, spec)

0.3.0
In 0.3.0, the aggregate function is interpolating 0 values for 'y' in 'ds' periods where df_train has null values
So, for example, if I have a 'ds' range from '2018-01-01' thru '2018-12-01', but I'm missing 'y' values for months '2018-03-01' and '2018-04-01', the aggregate function will still populate train_agg at these 'ds' values with 'y' = 0

This allows the script to fit the StatsForecast AutoETS model and execute reconciliation for train_agg

0.4.0
In 0.4.0, the aggregate function no longer interpolates 0 values for 'y' in 'ds' periods where df_train has null values
This seems to be breaking the call to model.fit(train_agg), whereas before it was executing in 0.3.0

Should I aim to add back in the interpolated 'y'=0 values for the missing 'ds' values to replicate the 0.3.0 behavior for model.fit()? Just want to ensure this is the intended behavior for the aggregate function, before I implement a post-hoc fix

The problem with aggregate was leading zeros, e.g. if one of your series started at 2018-01-01 and another one at 2019-01-01 the aggregate function would then add all of 2018 as 0 for the second one. The fact that you have gaps in your series is a different problem and you should address it first (before running aggregate), you can use the fill_gaps function for that.

Thanks! The fill_gaps function helped resolve this issue & successfully executed the full script. However, I did have to set fill_gaps(df,freq='MS',start='global'), which reintroduces the leading zeros problem you're referencing for late-start series.

I tried leaving the start param at its default (start=‘per_serie’), but this still generated the NotImplementedError: tiny datasets.

Looking at statsforecast/ets.py where this error is tracing, I believe it may be a problem specific to my dataset:
https://github.com/Nixtla/statsforecast/blob/main/statsforecast/ets.py
n = len(y)
npars = 2 # alpha + l0
if trendtype in ["A", "M"]:
npars += 2 # beta + b0
if seasontype in ["A", "M"]:
npars += 2 # gamma + s
if damped is not None:
npars += damped
# ses for non-optimized tiny datasets
if n <= npars + 4:
# we need HoltWintersZZ function
raise NotImplementedError("tiny datasets")

I have sub-series in the hierarchy with too few data points (without adding in leading zeros). Since I am trying to fit AutoETS(model='AAA') onto all series, the (npars + 4) term is greater than n=len(y), which is raising the "tiny datasets" error.

Therefore, I believe this issue can be closed, since it's specific to a modeling approach vs. a bug in the code. Thanks for your help!

Incidentally, are there any plans to implement a MinTraceSparse(nonnegative=True) method in the future? I can handle negative values post-reconciliation, just curious about the roadmap.

Thanks. Can you please open a new issue requesting the nonnegative sparse MinTrace?