/Store-Sales-Forecasting

Utilized facebook prophet to perform forecasting on datasets that consist sales data from 1115 stores. Our predictive model attempts at forecasting future sales based on historical data while taking into account seasonality effects, demand, holidays, promotions, and competition.

Primary LanguageJupyter Notebook

Store-Sales-Forecasting

Utilized facebook prophet to perform forecasting on datasets that consist sales data from 1115 stores. Our predictive model attempts at forecasting future sales based on historical data while taking into account seasonality effects, demand, holidays, promotions, and competition.

For the dataset that i used on this project, i put it on google drive and you can see it using this link : https://drive.google.com/drive/u/0/folders/1yWxgxkqNPTcVkJBbHNefgyAjzYAv3sTP

1. Understand the Problem Statement and Business Case

For companies to become competitive and skyrocket their growth, they need to leverage AI/ML to develop predictive models to forecast sales in the future. Predictive models attempt at forecasting future sales based on historical data while taking into account seasonality effects, demand, holidays, promotions, and competition.

In this project, we tried to predict future daily sales based on the features of 1115 stores. We used facebook prophet for our predictive model. Facebook prophet is open source software released by Facebook's Core Data Science Team. Prophet is a procedure for forecasting time series data based on additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. Facebook Prophet works best with time series that have strong seasonal effects and several seasons of historical data.

2. Import Libraries and Datasets

We used two csv files for our dataset, the first ones is csv contains the information of sales from 1115 stores and the second ones is csv contains the information of 1115 stores.

Sales Datasets

The following is display of first two rows of the datasets :

Store DayofWeek Date Sales Customers Open Promo StateHoliday SchoolHolidays
1 5 2015-07-31 5263 555 1 1 0 1
2 5 2015-07-31 6064 625 1 1 0 1
  • Id: transaction ID (combination of Store and date)
  • Store: unique store Id
  • Sales: sales/day, this is the target variable
  • Customers: number of customers on a given day
  • Open: Boolean to say whether a store is open or closed (0 = closed, 1 = open)
  • Promo: describes if store is running a promo on that day or not
  • StateHoliday: indicate which state holiday (a = public holiday, b = Easter holiday, c = Christmas, 0 = None)
  • SchoolHoliday: indicates if the (Store, Date) was affected by the closure of public schools

Stores Information Datasets

The following is display of first two rows of the datasets :

Store StoreType Assortment CompetitionDistance CompetitionOpenSinceMonth Promo2 Promo2SinceWeek Promo2SinceYear PromoInterval
1114 a c 870.0 NaN 0 NaN NaN NaN
1112 d c 5350.0 NaN 1 22.0 2012.0 Mar,Jun,Sept,Dec
  • StoreType: categorical variable to indicate type of store (a, b, c, d)
  • Assortment: describes an assortment level: a = basic, b = extra, c = extended
  • CompetitionDistance (meters): distance to closest competitor store
  • CompetitionOpenSince [Month/Year]: provides an estimate of the date when competition was open
  • Promo2: Promo2 is a continuing and consecutive promotion for some stores (0 = store is not participating, 1 = store is participating)
  • Promo2Since [Year/Week]: date when the store started participating in Promo2
  • PromoInterval: describes the consecutive intervals Promo2 is started, naming the months the promotion is started anew. E.g. "Feb,May,Aug,Nov" means each round starts in February, May, August, November of any given year for that store

3. Explore Dataset

Explore Sales Training Data

Checking Missing Values

Fortunately we don't have any missing values, lets proceed with data visualization

Data Visualization

Data Vis 1 Data Vis 2

  • Average 600 customers per day, maximum is 4500 (note that we can't see the outlier at 7388!)
  • Data is equally distibuted across various Days of the week (~150000 observations x 7 day = ~1.1 million observation)
  • Stores are open ~80% of the time
  • Data is equally distributed among all stores (no bias)
  • Promo #1 was running ~40% of the time
  • Average sales around 5000-6000 Euros
  • School holidays are around ~18% of the time

Now lets see how many stores that are open and closed

Stores Open and Closed

Lets keep open stores and remove closed stores. Open column has no meaning now, lets drop the column

Explore Stores Information Datasets

Checking Missing Values

The following is columns with missing values and how we handle it :

  • CompetitionDistance with 3 missing values (we fill them up with average values of the 'CompetitionDistance' columns)
  • CompetitionOpenSinceMonth with 354 missing values (We fill them up with zero)
  • CompetitionOpenSinceYear with 354 missing values (we fill them up with zero)
  • Promo2SinceWeek with 544 missing values (we fill them up with zero)
  • Promo2SinceYear with 544 missing values (we fill them up with zero)
  • PromoInterval with 544 missing values (we fill them up with zero)

The reason we fill them up with zero because the value of promo2 column. It seems like if 'promo2' is zero, 'promo2SinceWeek', 'Promo2SinceYear', and 'PromoInterval' information is set to zero. If there are no promo, naturally there are no competition as well.

Data Visualization

Data Vis 3

Data Vis 4

  • half of stores are involved in promo 2
  • half of the stores have their competition at a distance of 0-3000m (3 kms away)

Explore Merged Dataset

Merged The Dataset

We succesfully cleaned the dataset, lets merge them into one dataset. The following is first two row of merged dataset :

Store DayofWeek Date Sales Customers Promo StateHoliday SchoolHolidays StoreType Assortment CompetitionDistance CompetitionOpenSinceMonth Promo2 Promo2SinceWeek Promo2SinceYear PromoInterval
1 5 2015-07-31 5263 555 1 0 1 c a 1270.0 9.0 2008.0 0 0.0 0.0
1 4 2015-07-30 5020 546 1 0 1 c a 1270.0 9.0 2008.0 0 0.0 0.0

Data Visualization

Corr Plot

  • Customers/Prmo2 and sales are strongly correlated

Before we do another visualization, we separate month,day,and year into separate columns

Data Vis 5

It looks like sales and number of customers peak around christmas timeframe

Data Vis 6

  • Minimum number of customers are generally around the 24th of the month
  • Most customers and sales are around 30th and 1st of the month

Data Vis 7

It looks like sales and number of customers peak around Saturday and Sunday

Data Vis 8

  • Store type b is stores with highest numbers of average sales
  • Store type a is stores with lowerst numbers of average sales

Data Vis 9

Promo can increased the number of sales and customers

4. Train the Model Part A

We utilized facebook prophet for our predictive model. We trained the model with historical data of sales from each stores. The following is the result of our forecasting of store number 10 sales for 60 days :

Fb Prophet Model A-1

Fb Prophet Model A-2

5. Train the Model Part B

In this part, we incorporated the holidays information into our model. The following is the result of our forecasing of store number 6 sales for 90 days :

Fb Prophet Model B-1

Fb Prophet Model B-2

Fb Prophet Model B-3