/time-series-forecasting-with-python

A use-case focused tutorial for time series forecasting with python

Primary LanguageJupyter Notebook

⏳ time-series-forecasting-wiki

This repository contains a series of analysis, transforms and forecasting models frequently used when dealing with time series. The aim of this repository is to showcase how to model time series from the scratch, for this we are using a real usecase dataset (Beijing air polution dataset to avoid perfect use cases far from reality that are often present in this types of tutorials. If you want to rerun the notebooks make sure you install al neccesary dependencies, Guide

You can find the more detailed toc on the main notebook

📂 Dataset

The dataset used is the Beijing air quality public dataset. This dataset contains polution data from 2014 to 2019 sampled every 10 minutes along with extra weather features such as preassure, temperature etc. We decided to resample the dataset with daily frequency for both easier data handling and proximity to a real use case scenario (no one would build a model to predict polution 10 minutes ahead, 1 day ahead looks more realistic). In this case the series is already stationary with some small seasonalities which change every year #MORE ONTHIS

In order to obtain a exact copy of the dataset used in this tutorial please run the script under datasets/download_datasets.py which will automatically download the dataset and preprocess it for you.

📚 Analysis and transforms

  • Time series decomposition

    • Level
    • Trend
    • Seasonality
    • Noise
  • Stationarity

    • AC and PAC plots
    • Rolling mean and std
    • Dickey-Fuller test
  • Making our time series stationary

    • Difference transform
    • Log scale
    • Smoothing
    • Moving average

📐 Models tested

  • Autoregression (AR)

  • Moving Average (MA)

  • Autoregressive Moving Average (ARMA)

  • Autoregressive integraded moving average (ARIMA)

  • Seasonal autoregressive integrated moving average (SARIMA)

  • Bayesian regression Link

  • Lasso Link

  • SVM Link

  • Randomforest Link

  • Nearest neighbors Link

  • XGBoost Link

  • Lightgbm Link

  • Prophet Link

  • Long short-term memory with tensorflow (LSTM)Link

  • DeepAR

🔍 Forecasting results

We will devide our results wether the extra features columns such as temperature or preassure were used by the model as this is a huge step in metrics and represents two different scenarios. Metrics used were:

Evaluation Metrics

  • Mean Absolute Error (MAE)
  • Mean Absolute Percentage Error (MAPE)
  • Root Mean Squared Error (RMSE)
  • Coefficient of determination (R2)
mae rmse mape r2
EnsembleXG+TF 27.960503 39.686824 0.438655 0.762551
EnsembleLIGHT+TF 27.909784 39.750590 0.435101 0.761787
EnsembleXG+LIGHT+TF 27.763722 39.346366 0.451024 0.766607
EnsembleXG+LIGHT 29.614893 41.660260 0.520180 0.738349
Randomforest tunned 40.740963 53.151506 0.903182 0.574099
SVM RBF GRID SEARCH 38.565602 50.340027 0.776779 0.617963
DeepAR 72.528734 105.391313 0.957377 -0.674509
Tensorflow simple LSTM 31.406890 44.007715 0.466331 0.708032
Prophet multivariate 38.346791 50.502186 0.744734 0.615498
Kneighbors 57.048847 80.387336 1.082936 0.025789
SVM RBF 40.808894 56.032800 0.794224 0.526672
Lightgbm 30.173660 42.745285 0.522338 0.724543
XGBoost 31.043099 43.195546 0.542145 0.718709
Randomforest 45.837942 59.448943 1.029276 0.467198
Lasso 39.236966 54.583998 0.709031 0.550832
BayesianRidge 39.243001 54.634477 0.707874 0.550001
Prophet univariate 61.533802 83.646732 1.268213 -0.054814
AutoSARIMAX (1, 0, 1),(0, 0, 0, 6) 51.291983 71.486838 0.912563 0.229575
SARIMAX 51.250482 71.328643 0.905278 0.232981
AutoARIMA (1, 0, 1) 47.096859 64.861693 1.005644 0.365759
ARIMA 48.249243 66.387526 1.061672 0.335567
ARMA 47.096859 64.861693 1.005644 0.365759
MA 49.043875 66.201671 1.052869 0.339282
AR 47.238049 65.321718 1.015593 0.356730
HWES 52.960293 74.671752 1.112627 0.159398
SES 52.960293 74.671752 1.112627 0.159398
Yesterdays value 52.674951 74.522764 1.044050 0.162749
Naive mean 59.320940 81.444360 1.321357 0.000000

:shipit: Additional resources and literature

Models not tested but that are gaining popularity

There are several models we have not tried in this tutorials as they come from the academic world and their implementation is not 100% reliable, but is worth mentioning them:

  • Neural basis expansion analysis for interpretable time series forecasting (N-BEATS) | link Code
  • ESRRN link Code

Adhikari, R., & Agrawal, R. K. (2013). An introductory study on time series modeling and forecasting [1]
Introduction to Time Series Forecasting With Python [2]
Deep Learning for Time Series Forecasting [3]
The Complete Guide to Time Series Analysis and Forecasting [4]
How to Decompose Time Series Data into Trend and Seasonality [5]

Contributing

Want to see another model tested? Do you have anything to add or fix? I'll be happy to talk about it! Open an issue/PR :)