georgia-tech-db/evadb

Better frequency inference and message for timeseries forecasting

xzdandy opened this issue · 4 comments

Search before asking

  • I have searched the EvaDB issues and found no similar feature requests.

Description

  1. Better error message when frequency can not be inferred. Instead of the following long error message:
10-12-2023 06:21:49 ERROR [plan_executor:plan_executor.py:execute_plan:0179] Can not infer the frequency for HomeSaleForecast. Please explicitly set it.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/plan_executor.py", line 175, in execute_plan
    yield from output
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/create_function_executor.py", line 526, in exec
    ) = self.handle_forecasting_function()
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/create_function_executor.py", line 254, in handle_forecasting_function
    raise RuntimeError(
RuntimeError: Can not infer the frequency for HomeSaleForecast. Please explicitly set it.
ERROR:evadb.utils.logging_manager:Can not infer the frequency for HomeSaleForecast. Please explicitly set it.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/plan_executor.py", line 175, in execute_plan
    yield from output
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/create_function_executor.py", line 526, in exec
    ) = self.handle_forecasting_function()
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/create_function_executor.py", line 254, in handle_forecasting_function
    raise RuntimeError(
RuntimeError: Can not infer the frequency for HomeSaleForecast. Please explicitly set it.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/evadb/executor/plan_executor.py in execute_plan(self, do_not_raise_exceptions, do_not_print_exceptions)
    174             if output is not None:
--> 175                 yield from output
    176         except Exception as e:

6 frames
RuntimeError: Can not infer the frequency for HomeSaleForecast. Please explicitly set it.

During handling of the above exception, another exception occurred:

ExecutorError                             Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/evadb/executor/plan_executor.py in execute_plan(self, do_not_raise_exceptions, do_not_print_exceptions)
    178                 if do_not_print_exceptions is False:
    179                     logger.exception(str(e))
--> 180                 raise ExecutorError(e)

ExecutorError: Can not infer the frequency for HomeSaleForecast. Please explicitly set it.

we shall return a simple response with Can not infer the frequency for HomeSaleForecast. Please explicitly set it.

  1. Frequency has a non-negligible affect on the forecasting result. For example in the Colab, when frequency is 'W', we have a flat prediction and frequency is 'M', we no longer have flat prediction. Given the auto frequency infer is not working, how should user choose the frequency?

Use case

Check Colab Notebook for the data and reproducing the error.

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Hi @americast, do you have any idea on how to improve the frequency for forecasting? Thanks!

Thanks @xzdandy for creating this issue. One quick solution could be to redirect users to use a neural model that does not need explicit frequency determination/declaration.

Thanks @americast, I am trying the following query:

CREATE OR REPLACE FUNCTION HomeSaleForecast FROM
    (
      SELECT propertytype, datesold, price
      FROM postgres_data.home_sales
      WHERE bedrooms = 3 AND postcode = 2607
    )
  TYPE Forecasting
  LIBRARY 'neuralforecast'
  PREDICT 'price'
  TIME 'datesold'
  ID 'propertytype'
  FREQUENCY 'M'
  HORIZON 3
  AUTO 'F'

When you say "a neural model that does not need explicit frequency determination/declaration", does that imply we should not provide FREQUENCY when using neuralforecast unless we are certain the frequency of the data? And does FREQUENCY 'M' have an affect to the neuralforecast in this case? Thanks.


Update 1:

The following query does not work. So it seems with neuralforecast, we still need to provide an explicit frequency.

CREATE OR REPLACE FUNCTION HomeSaleForecast FROM
    (
      SELECT propertytype, datesold, price
      FROM postgres_data.home_sales
      WHERE bedrooms = 3 AND postcode = 2607
    )
  TYPE Forecasting
  LIBRARY 'neuralforecast'
  PREDICT 'price'
  TIME 'datesold'
  ID 'propertytype'
  HORIZON 3
  AUTO 'F'

Update 2:

I am trying the neueralforecast with Frequency M and W. The result I get is below:

	propertytype	datesold	price
0	unit	2018-12-31	496118.0625
1	unit	2019-02-28	539504.7500
2	house	2019-09-30	582368.6875
3	house	2019-07-31	638825.6250
4	house	2019-08-31	659751.0000
5	unit	2019-01-31	692144.7500
	propertytype	datesold	price
0	unit	2018-12-23	496118.0625
1	unit	2019-01-06	539504.7500
2	house	2019-08-04	582368.6875
3	house	2019-07-21	638825.6250
4	house	2019-07-28	659751.0000
5	unit	2018-12-30	692144.7500

We can see the forecasting prices are identical but the forecasting datasold are different due to frequency. This seems to indicate the price in neueralforecast is not affected by the frequency but the date is. This makes how to choose the frequency more confusing.

Yes frequency is required to be provided in neuralforecast by default, however I'll check the source code to determine if we can bypass that. I'll get back with more details soon.