This repository provides the code to develop an LSTM model for multivariate time series forecasting to predict the pollution at the current hour (t) given the pollution measurement and weather conditions at the prior time step.
numpy
pandas
matplotlib
scikit-learn
keras
- A dataset that reports on the weather and the level of pollution each hour for five years is being used here that includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain.
- Please ensure to save
raw.csv
file given in theData
folder in the main data directory and runData.py
which preprocesses the data, appropriate for stabilized training of the LSTM model, and then creates a new data file namedpollution.csv
in the same directory.
- To train the LSTM model on merely single previous time step window setting and test it in the same setting, run
Train_On_Single_Lag_Timesteps.py
- To train the LSTM model on multiple previous time steps, run
Train_On_Multiple_Lag_Timesteps.py
- All hyperparameters to control training and testing of the model in single as well as multiple time step window settings are provided in their respective
.py
files. - The average and validation set losses are printed after every epoch.
Line Plots of Air Pollution Time Series | Performance of the LSTM model on Single Lag Timesteps Example | Performance of the LSTM model on Multiple Lag Timesteps Example |
---|---|---|