postsuraj/Supply-Chain-Analytics-Project

Demand forecasting of items using three step machine learning model. Clustering - Classification - Prediction

Jupyter Notebook

Supply-Chain-Analytics-Project

Goal

Demand Forecasting of products

Demand Forecasting is the art & science of predicting customer’s future demand  for  products.

DATASET

Source: https://archive.ics.uci.edu/ml/datasets/online+retail
Multivariate, Sequential, Time-Series dataset
Contains all the transactions occurring between 2010-2011 for a UK-based and registered non-store online retail.
Dataset is used from online retail data of a gift shop.
Attributes are InvoiceNo, StockCode, Description, Quantity, InvoiceDate UnitPrice, CustomerID, Country.

Correlation

Created new features from text column "Description"

Picked out the nouns from every rows using POS tagger for product name and created a new column named Product_type
Picked out colour of the product from each rows and created a new column named Colour_type

Created new column revenue

Revenue = UnitPrice * Quantity

A Machine-Learning Approach (3- steps Model)

Data is first clustered (Clustering)
Output from clustering is then used as labelled(with cluster no.) training data for classification (Classification)
Then the no. of sales is predicted on the basis of regression model employing ‘cluster no.’ as one of the features. (Prediction)

CLUSTERING:

Challenge:

Mixed Attributes : Numerical + Categorical
Categorical were also important, couldn’t be removed !
Converting to numerical would compromise with significance of categorical attributes

Solution:

Algorithm which considers mixed attribute : K-Prototypes

CLUSTERING: k-prototypes

Based on the k-means paradigm
Works well with mixed data, preserving its efficiency.
Maximises the intra cluster similarity of objects
Object similarity measure is derived from both numeric and categorical attributes

Clustering output:

The output column from clustering of Train was used as Target variable for classification training set

CLASSIFICATION:

To build model for cluster classification Linear SVM of machine learning algorithm performed very well.

Prediction of demands

Collected all features together and mark "Quantity" column as a target variable.
Splitted the whole dataset into train test and predicted the quantity of the items
Demand forecasting of quantity column, the machine learning algorithm random forest performed very well.

Conclusion

The data pre-processing phase facilitates the formation of the inputs to the models.
The feature engineering process helps create new variables that bring additional value to demand interpretation.
The three-step model involving clustering, classification and prediction enables the company further to visualize the relationship between predictor variables and customize the forecasting approaches accordingly.