In this project, I consider a time series of monthly revenue of Jim Beam Brands, a liquor company in Iowa State, USA. First of all, I explore the important characterizations of this time series, such as decompositions (both multiplicative and additive), stationary property, seasonal plot ... In the next step, I make some forecasts based on two models: SARIMA and Prophet. Trend changes detection, cross validation and more importantly, the hyperparameters tuning are also included.
The dataset used in this project is a part deduced from a huge dataset at Iowa’s state-hosted open data portal, which has around 24 million rows and 24 columns.
For the purpose of this project, I filtered the vendor named Jim Beam Brands
, resulting in 2180745
rows. The following columns are considered:
Date
: date that bottles were sold to retailers.Item Description
: names of the sold bottles.Bottle Volume
: volume of each bottle.State Bottle Cost
: cost the Iowa State pays to Jim Beam Brands for sold bottles.State Bottle Retail
: money the State receives from selling bottles to retailers.Bottle Sold
: number of bottles sold to retailers.
- After cleansing the data, I created a time series of monthly revenue of
Jim Beam Brands
and made some visualizations:
A first glimpse of seasonality as well as the distribution of revenue are also explored.
- For better understanding, I check the stationary property, seasonality and make decompositions in both multiplicative and additive cases.
- Finally, the forecasts are carried out based on two models:
SARIMA
andProphet
.
In SARIMA, I use auto_arima
to search for best possible parameters p
, q
, d
and test the result on a training set. Due to fact that this dataset has unexpected trend changes and outliers, two training sets are used to validate the accuracy.
With Prophet model, a cross-validation is implemented, and later used in hyperparameters tuning, including changepoint_prior_scale
and seasonality_prior_scale
. These parameters are then put into the forecast to effectively the RMSE
error.
Forecast on a training set containing 80% of our dataset:
The errors in this case are:
On another training set containing info of outliers, the result is better
with better errors
The first forecast is done without hyperparameters indicated:
We get an improvement of reducing RMSE
error after hyperparameters tuning:
The RMSE
error drops from 178302.105 to 150059.697.