StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0
Closed this issue · 5 comments
What happened + What you expected to happen
Carryover from: #234
I upgraded my installation to 0.4.0. However, upon running my script (no code changed), I am now getting the below error. This seems to be inducing an error in using StatsForecast AutoETS models now. I also tried StatsForecast HoltWinters and received the same error.
This error did not raise with the same dataset & script when running 0.3.0 - any ideas what might have changed the behavior from 0.3.0 to 0.4.0?
NotImplementedError Traceback (most recent call last)
Cell In[8], line 109
98 #valid_agg_reset = valid_agg.reset_index()
100 model = StatsForecast(models=[
102 AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA')
(...)
107 ],
108 freq='MS', n_jobs=1, verbose=True)
--> 109 model.fit(train_agg)
111 p = model.forecast(h=h_months, fitted=True)
112 p_fitted = model.forecast_fitted_values()
File ~/lib/python3.10/site-packages/statsforecast/core.py:880, in StatsForecast.fit(self, df, sort_df, prediction_intervals)
878 self.prepare_fit(df, sort_df)
879 if self.n_jobs == 1:
--> 880 self.fitted = self.ga.fit(models=self.models)
881 else:
882 self.fitted = self._fit_parallel()
File ~/lib/python3.10/site-packages/statsforecast/core.py:77, in GroupedArray.fit(self, models)
75 for i_model, model in enumerate(models):
76 new_model = model.new()
---> 77 fm[i, i_model] = new_model.fit(y=y, X=X)
78 return fm
File ~/lib/python3.10/site-packages/statsforecast/models.py:650, in AutoETS.fit(self, y, X)
628 def fit(
629 self,
630 y: np.ndarray,
631 X: Optional[np.ndarray] = None,
632 ):
633 """Fit the Exponential Smoothing model.
634
635 Fit an Exponential Smoothing model to a time series (numpy array) y
(...)
648 Exponential Smoothing fitted model.
649 """
--> 650 self.model_ = ets_f(
651 y, m=self.season_length, model=self.model, damped=self.damped
652 )
653 self.model_["actual_residuals"] = y - self.model_["fitted"]
654 self._store_cs(y=y, X=X)
File ~/lib/python3.10/site-packages/statsforecast/ets.py:1241, in ets_f(y, m, model, damped, alpha, beta, gamma, phi, additive_only, blambda, biasadj, lower, upper, opt_crit, nmse, bounds, ic, restrict, allow_multiplicative_trend, use_initial_values, maxit)
1238 # ses for non-optimized tiny datasets
1239 if n <= npars + 4:
1240 # we need HoltWintersZZ function
-> 1241 raise NotImplementedError("tiny datasets")
1242 # fit model (assuming only one nonseasonal model)
1243 if errortype == "Z":
NotImplementedError: tiny datasets
Versions / Dependencies
dateutil 2.8.2
hierarchicalforecast 0.4.0
matplotlib 3.7.1
numpy 1.23.5
pandas 2.0.2
session_info 1.0.0
statsforecast 1.6.0
IPython 8.14.0
jupyter_client 8.2.0
jupyter_core 5.3.0
notebook 6.5.4
Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
Linux-4.18.0-372.16.1.0.1.el8_6.x86_64-x86_64-with-glibc2.35
Reproduction script
model = StatsForecast(models=[AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA') ], freq='MS', n_jobs=1, verbose=True)
model.fit(train_agg)
The call to model.fit generates the NotImplementedError: tiny datasets from statsforecast/ets.py
The same code executes successfully when running version 0.3.0, instead of 0.4.0
Issue Severity
High: It blocks me from completing my task.
Hey. Without an example it's hard to tell. Are you using aggregate? #189 was fixed in 0.4.0, so you were maybe getting leading zeros giving your series some more samples, which is no longer the case.
Hi @jmoralez , I inspected the train_agg dataframe (produced using the aggregate function) for 0.3.0 vs 0.4.0
I'm inspecting the result of this line:
train_agg, S_train, tags = aggregate(df_train, spec)
0.3.0
In 0.3.0, the aggregate function is interpolating 0 values for 'y' in 'ds' periods where df_train has null values
So, for example, if I have a 'ds' range from '2018-01-01' thru '2018-12-01', but I'm missing 'y' values for months '2018-03-01' and '2018-04-01', the aggregate function will still populate train_agg at these 'ds' values with 'y' = 0
This allows the script to fit the StatsForecast AutoETS model and execute reconciliation for train_agg
0.4.0
In 0.4.0, the aggregate function no longer interpolates 0 values for 'y' in 'ds' periods where df_train has null values
This seems to be breaking the call to model.fit(train_agg), whereas before it was executing in 0.3.0
Should I aim to add back in the interpolated 'y'=0 values for the missing 'ds' values to replicate the 0.3.0 behavior for model.fit()? Just want to ensure this is the intended behavior for the aggregate function, before I implement a post-hoc fix
The problem with aggregate was leading zeros, e.g. if one of your series started at 2018-01-01 and another one at 2019-01-01 the aggregate function would then add all of 2018 as 0 for the second one. The fact that you have gaps in your series is a different problem and you should address it first (before running aggregate), you can use the fill_gaps function for that.
Thanks! The fill_gaps function helped resolve this issue & successfully executed the full script. However, I did have to set fill_gaps(df,freq='MS',start='global'), which reintroduces the leading zeros problem you're referencing for late-start series.
I tried leaving the start param at its default (start=‘per_serie’), but this still generated the NotImplementedError: tiny datasets.
Looking at statsforecast/ets.py where this error is tracing, I believe it may be a problem specific to my dataset:
https://github.com/Nixtla/statsforecast/blob/main/statsforecast/ets.py
n = len(y)
npars = 2 # alpha + l0
if trendtype in ["A", "M"]:
npars += 2 # beta + b0
if seasontype in ["A", "M"]:
npars += 2 # gamma + s
if damped is not None:
npars += damped
# ses for non-optimized tiny datasets
if n <= npars + 4:
# we need HoltWintersZZ function
raise NotImplementedError("tiny datasets")
I have sub-series in the hierarchy with too few data points (without adding in leading zeros). Since I am trying to fit AutoETS(model='AAA') onto all series, the (npars + 4) term is greater than n=len(y), which is raising the "tiny datasets" error.
Therefore, I believe this issue can be closed, since it's specific to a modeling approach vs. a bug in the code. Thanks for your help!
Incidentally, are there any plans to implement a MinTraceSparse(nonnegative=True) method in the future? I can handle negative values post-reconciliation, just curious about the roadmap.
Thanks. Can you please open a new issue requesting the nonnegative sparse MinTrace?