Customer Segmentation Project

This repo contains the jupyter notebook of the Customer Segmentation Project which is a component of the Data Scientist Nanodegree by Udacity. Unsupervised methods are applied to customer data of Arvato Bertelsmann in order to identify and characterize certain customer segments. The data is proprietary so I am not allowed to share it.

Contents:

Part 1: Data and Preprocessing

Load and explore demographics data
Identify missing and unknown data
Assessment of missing data and outliers
Feature Engineering
Data cleaning

Part 2: Feature Transformation and Principal Component Analysis

Data Imputation and Feature Scaling
Dimensionality Reduction using PCA
Interpretation of principal components

Part 3: Clustering

Determine appropriate number of clusters for the data
Fit model using k-means algorithm
Reapply all previous steps to customer data
Fit objects of demographics onto customer data for comparison
Compare customer data to demographic data

tm1611/Customer-Segmentation-Project

Customer Segmentation Project