Feature request - exogenous data in forecasting
ian-grover opened this issue · 4 comments
Hi!
I am looking at this package as a very useful tool for a project I plan to start. I have been reading the documentation and examples and want to understand something a bit further.
There is functionality for adding additional time-series data along with the target/signal variable through the exogenous data argument (and I am happy to see this is also possible for hierarchical modelling) but it appears that this function is not available in the forecasting function.
I would like to be able to adjust a parameter and see how the forecast varies. From what I have read, I am not sure this is possible because while the exogenous data can be provided upon training, it does not seem to be provided with the input for forecasting.
I am afraid my theoretical knowledge of these tools is lacking a bit, but it feels that if the input data is provided for the forecast up until the point where the data stops, it would be practical to also provide the exogenous data in the same way?
In the example notebook it says:
In addition to the training dataset, the user can provide an external table with exogenous variables data. This table is merged with the signal dataset when exogenous vairables values are need (either when training the model or when producing forecasts).
But it is not clear to me from the code (and the example) where this is occurring? If it is occurring, could you advise if there is a procedure where I could alter some of the exogenous data after training (say on the final timestep) to see how the forecast is changed?
Many thanks!
Ian
Hi @ian-grover
Thank you for your interest in PyAF.
Exogenous data are used in PyAF through their past values (ARX Models and the like). PyAF uses the same dataframe for the training and the forecast. The exogenous dataset can cover a much larger time frame than the training dataset, PyAf will use the exogenous data for the dates it needs.
If I understand well , you want to be able to play with this dataframe at the forecast time and see the impact on the forecast.
could you please confirm and share a sample dataset (anonymized) with exogenous data so that we share the same view of this issue. A python script is welcome.
Thanks in advance.
I appreciate the quick response!
I don't yet have an example as I've not tried to get started with the tool (trying to understand if what I want is possible first). If I take the example in the PyAF_Exogenous
notebook in docs/
, I would like to understand if the following is possible.
- Train the forecasting model with the exogenous data which in my case would have the same time-series as my input dataset (this seems fine).
2a) Run a forecasting prediction where either I can adjust the last time step of the exogenous model to test different values
2b) Run a forecasting prediction where I adjust the exogenous data for the full time series I am forecasting for.
In both cases, I would not want to retrain the model with a slight adjustment to the exogenous data.
In your example, lets say Exog2
represents population. I would like to know if I can use these train a model with the historical data on Ozone and Population, and then generate a prediction for Ozone amount with different values of population for the time-series in which I am forecasting. That is to say, the exogenous history is fixed, but I am interested in how the forecasting changes if the population was reduced by 1-20% either on the very last time-step of the history, or with a fixed value out into the future where I am forecasting.
Thanks!
Ian
Short answer : if this is a feature request, as of 2022-06-30, with PyAF , version 3.0, what you are asking for is not feasible.
Longer answer : PyAF uses all the available signal and exogenous data to train a model and get a forecast. If you change the data after the training, you have to train a new model based on the new data. We make all what a is possible to shorten the training time (code optimization, columnar design, CPU/parallelization, etc). Retraining a model, should not be a time-consuming task. Please file an issue about that if it is a real problem (with a dataset, script , etc).
Closing.