/Customer-Segmentation-Project

Unsupervised learning project to identify customer segments. Data provided by Arvato Bertelsmann.

Primary LanguageJupyter Notebook

Customer Segmentation Project

This repo contains the jupyter notebook of the Customer Segmentation Project which is a component of the Data Scientist Nanodegree by Udacity. Unsupervised methods are applied to customer data of Arvato Bertelsmann in order to identify and characterize certain customer segments. The data is proprietary so I am not allowed to share it.

Contents:

Part 1: Data and Preprocessing

  • Load and explore demographics data
  • Identify missing and unknown data
  • Assessment of missing data and outliers
  • Feature Engineering
  • Data cleaning

Part 2: Feature Transformation and Principal Component Analysis

  • Data Imputation and Feature Scaling
  • Dimensionality Reduction using PCA
  • Interpretation of principal components

Part 3: Clustering

  • Determine appropriate number of clusters for the data
  • Fit model using k-means algorithm
  • Reapply all previous steps to customer data
  • Fit objects of demographics onto customer data for comparison
  • Compare customer data to demographic data