Name: Orlando Vilar Date: 5/1/2023
In this project, I work with data collected from the New York Independent System Operator (NYISO). The dataset hourly load estimates for all of the New York Zones and daily weather data. I analyze these series using three modeling frameworks and provide recommendations for the stakeholder.
The modeling framework is streamlined as follows: time series analysis, basic and rolling statistics, seasonal decomposition, unit-root test, SARIMAX modeling, Long Short-Term Meory (LSTM) and the use of Facebook's Prophet.
The stakeholder is the NYISO. A well-function system, which relies on accurate forecasts with the aim of avoiding the social and economic costs through a fine-tuned operation.
Specifically taking a look at the New York City Zone (NYISO J), I provide three modeling frameworks that aim to reduce forecasting deviations.
The dataset used is a collection of hourly load usage across the NYISO. To simplify our analysis, I focus only on the NYC Zone (NYISO Zone J). The data spans from 2018 through 2020, and has more than 300,000 hourly load observations. The variables are Load, Region, Holiday, Peak and Off-peak.I also provide weather statistics, such as Cumulative Temperature and Humidity Index, CDD and HDD, feature engineering and a fully saturated dummy set.
I start with testing stationarity and performing a seasonal decomposition to assess the best combination of Autoregression Integrated Moving Average factors (i.e. p, d, q) and Seasonal Factors (p, d, q, s). Given computational constraints, I limit those to 1 round of runs. Then, Autocorrelation and Partial Autocorrelation plots are provided to check the consistency of the seasonal decomposition findings. Lastly, I perform a prediction on testing data. Generally, the SARIMAX model is too intensive to be run which limits tuning alternatives.
Next, I provide two Long Short-Term Memory Neural Network models. Both the simple and the more structure models provide good Mean Absolute Percent Errors and allow for hyperparameter tuning. However, while achieving interesting results, the models fail to capture the troughs in the data. The simple LSTM model has a MAPE of close to 10.4%, whereas the Deeper LSTM has a MAPE of 10.5%.
Lastly, I run two Prophet settings: a plain vanilla one and another with exogenous features. Even though the general load pattern resembles the overall data look, the model fails to capture the data peaks. However, it captures the seasonality aspects from the data accurately and it might be a good alternative for trough modeling. The overall MAPES are around 30-40% for the full period, however the load forecast starts to widen its interval as periods go by.
Using a time series approach I provide three hourly load forecasting alternatives: the SARIMAX, the LSTM and the Prophet. With the aim of fine tuning grid forecasting the conclusion are as follows.
The stakeholder could follow three strategies:
- Using the LSTM to capture data peaks;
- Using the Prophet to capture the general shape and data troughs;
- Ensemble: generating a model tha combines both alternatives listed above for the best load pattern matching and reduced MAPEs.
The addition of granular data or generation/transmission and weather estimates most likely would improve the results. Furthermore computational limitations also impede further and a thorough assessment of hyperparameter tuning.
Here you will find:
- Presentation file;
- Jupyter Notebook;
- README.md file;
- Image Folder;
- Data (.CSV) folders.