In this lesson, you'll fit an ARMA model using statsmodels
to a real-world dataset.
In this lab you will:
- Decide the optimal parameters for an ARMA model by plotting ACF and PACF and interpreting them
- Fit an ARMA model using StatsModels
Run the cell below to import the dataset containing the historical running times for the men's 400m in the Olympic games.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
data = pd.read_csv("winning_400m.csv")
data["year"] = pd.to_datetime(data["year"].astype(str))
data.set_index("year", inplace=True)
data.index = data.index.to_period("Y")
# Preview the dataset
data
Plot this time series data.
# Plot the time series
If you plotted the time series correctly, you should notice that it is not stationary. So, difference the data to get a stationary time series. Make sure to remove the missing values.
# Difference the time series
data_diff = None
data_diff
Use statsmodels
to plot the ACF and PACF of this differenced time series.
# Plot the ACF
# Plot the PACF
Based on the ACF and PACF, fit an ARMA model with the right orders for AR and MA. Feel free to try different models and compare AIC and BIC values, as well as significance values for the parameter estimates.
# Your comments here
Well done. In addition to manipulating and visualizing time series data, you now know how to create a stationary time series and fit ARMA models.