/Supply-Chain-Analytics-Project

Demand forecasting of items using three step machine learning model. Clustering - Classification - Prediction

Primary LanguageJupyter Notebook

Supply-Chain-Analytics-Project

Goal

Demand Forecasting of products

Demand Forecasting is the art & science of predicting customer’s future demand  for  products.

DATASET

  • Source: https://archive.ics.uci.edu/ml/datasets/online+retail
  • Multivariate, Sequential, Time-Series dataset
  • Contains all the transactions occurring between 2010-2011 for a UK-based and registered non-store online retail.
  • Dataset is used from online retail data of a gift shop.
  • Attributes are InvoiceNo, StockCode, Description, Quantity, InvoiceDate UnitPrice, CustomerID, Country.

Correlation

Created new features from text column "Description"

  • Picked out the nouns from every rows using POS tagger for product name and created a new column named Product_type
  • Picked out colour of the product from each rows and created a new column named Colour_type

Created new column revenue

  • Revenue = UnitPrice * Quantity

A Machine-Learning Approach (3- steps Model)

  • Data is first clustered (Clustering)
  • Output from clustering is then used as labelled(with cluster no.) training data for classification (Classification)
  • Then the no. of sales is predicted on the basis of regression model employing ‘cluster no.’ as one of the features. (Prediction)

CLUSTERING:

Challenge:

  • Mixed Attributes : Numerical + Categorical
  • Categorical were also important, couldn’t be removed !
  • Converting to numerical would compromise with significance of categorical attributes

Solution:

  • Algorithm which considers mixed attribute : K-Prototypes

CLUSTERING: k-prototypes

  • Based on the k-means paradigm
  • Works well with mixed data, preserving its efficiency.
  • Maximises the intra cluster similarity of objects
  • Object similarity measure is derived from both numeric and categorical attributes

Clustering output:

The output column from clustering of Train was used as Target variable for classification training set

CLASSIFICATION:

To build model for cluster classification Linear SVM of machine learning algorithm performed very well.

Prediction of demands

  • Collected all features together and mark "Quantity" column as a target variable.
  • Splitted the whole dataset into train test and predicted the quantity of the items
  • Demand forecasting of quantity column, the machine learning algorithm random forest performed very well.

Conclusion

  • The data pre-processing phase facilitates the formation of the inputs to the models.
  • The feature engineering process helps create new variables that bring additional value to demand interpretation.
  • The three-step model involving clustering, classification and prediction enables the company further to visualize the relationship between predictor variables and customize the forecasting approaches accordingly.