/Forcous

Forcous : Estimating Unit Sales of Walmart Retail Goods

Primary LanguageJupyter Notebook

Forcous : Estimating Unit Sales of Walmart Retail Goods


Background of the competition

The Makridakis Competitions (also known as the M Competitions) are series of open competitions organized by teams led by forecasting researcher Spyros Makridakis and intended to evaluate and compare the accuracy of different forecasting methods. he first competition named M-Competition was held way back in 1982 with only 1001 data points, the complexity of the model and data scale increased with every successive iteration.

Objectives

The objective of the M5 forecasting competition is to advance the theory and practice of forecasting by identifying the method(s) that provide the most accurate point forecasts for each of the 42,840 time series of the competition for 28 days ahead point forecasts.
This competition aims to forecast daily unit sales for the next 28 days ie. till 22nd May 2016.

Dataset

Files :

  • calendar.csv - contains information about the dates on which the products are sold along with special events and its type. Also includes SNAP events.
  • sales_train_validation.csv - contains the historical daily unit sales data per product and store [d_1 to d_1913]
  • sell_prices.csv - contains information about the price of the products sold per store and date.
  • sales_train_evaluation.csv - includes the sales of range [d_1 to d_1941] ie. 28 days ahead of 1913.
  • sample_submission.csv - Submission for the 28 days ahead forecast.

The dataset is made available by Walmart, one of the biggest retail corporation in the world.
This dataset includes the unit sales of various products sold in the USA, organized in the form of grouped time series.

Detailed Breakdown

  • The dataset involves the unit sales of 3,049 products classified in 3 product categories [Hobbies, Foods, Household] and 7 product departments in which the above 3 product categories are disaggregated.
  • The products are sold across 10 stores located in 3 states [CA, TX, WI].

Hierarchical Item ordering

Number of Series per aggregation level

Level ID Aggregation Level Number of Series
1 Unit sales of all products, aggregated for all stores/states 1
2 Unit sales of all products, aggregated for each State 3
3 Unit sales of all products, aggregated for each store 10
4 Unit sales of all products, aggregated for each category 3
5 Unit sales of all products, aggregated for each department 7
6 Unit sales of all products, aggregated for each State and category 9
7 Unit sales of all products, aggregated for each State and department 21
8 Unit sales of all products, aggregated for each store and category 30
9 Unit sales of all products, aggregated for each store and department 70
10 Unit sales of product x, aggregated for all stores/states 3,049
11 Unit sales of product x, aggregated for each State 9,147
12 Unit sales of product x, aggregated for each store 30,490
13 Total 42, 840
              6: ["state_id", "cat_id"], 7: ["state_id", "dept_id"], 8: ["store_id", "cat_id"], 9: ["store_id", "dept_id"],
              10: ["item_id"], 11: ["item_id", "state_id"]}

The historical data range from 2011-01-29 to 2016-06-19. Thus, the products have a (maximum) selling history of 1,941 days / 5.4 years (test data of h=28 days not included).

Our Approaches

We will be building 4 models across various domains:

  • Probabilistic Naive Approach
  • Time Series Modelling
  • Machine Learning
  • Deep Learning

Our Results

SNo Approach WRMSSE
1 LightGBM 0.49608
2 FaceBook Prophet 0.63419
3 Stacked LSTM 0.74118
4 ARIMA 0.78013
5 Same as last 28 days 0.85582
6 Average of same day in historical years 0.971497
7 Bidirectional LSTM 1.05585
8 30 Days Average 1.07118
9 Historical mean 1.65414
10 Historical mean after first non-zero 1.24674
11 Mean of recent 30 days 1.13489
12 Mean of recent 40 days 1.13808
13 LSTM 2.19568