/Time-Series-Forecasting-SARIMAX

Using SARIMAX for Time Series Forecasting on Seasonal Data that is influenced by Exogenous variables

Primary LanguageJupyter Notebook

Time-Series-Forecasting-SARIMAX

Using SARIMAX for Time Series Forecasting on Seasonal Data that is influenced by Exogenous variables

(Prepared for) Sangam 2019 - ML Hackathon by IITMAA

Data Provided: Traffic Data (refer train.csv for more)

Data description

Columns

Description

date_time

Date, time, and hour of the data that is collected in the local IST time

is_holiday

Categorical Indian national holidays combined with regional holidays

air_pollution_index

Air Quality Index (10-300)

humidity

Numeric humidity in Celcius

wind_speed

Numeric wind speed in miles per hour

wind_direction

Cardinal wind direction (0-360 degree)

visibility_in_miles

Visibility of distance in miles

dew_point

Numeric dew point in Celcius

temperature

Numeric average temperature in Kelvin

rain_p_h

Numeric amount in mm of rain that occurred in the hour

snow_p_h

Numeric amount in mm of snow that occurred in the hour

clouds_all

Numeric percentage of cloud cover

weather_type

Categorical short textual description of the current weather

weather_description

Categorical longer textual description of the current weather

traffic_volume

Numeric hourly traffic volume bound in a specific direction

The traffic_volume attribute has to be forecasted on the basis of the time series data provided, taking the exogenous variables into account

Approach used: SARIMAX (Seasonal Autoregressive Integrated Moving Average with eXogeneous variables)

Reason: The data provided is seasonal, and it is a time series data with multiple exogeneous variables influencing the result. Hence, the optimal statistical model that can be applied to this task is SARIMAX

Main Modules Used:
  • statsmodel package in Python