/bertelsmann_arvato_project

Capstone Project in Udacity Data Science Nanodegree

Primary LanguageJupyter Notebook

Create a Customer Segmentation Report for Arvato Financial Solutions

This project is a part of Udacity Data Science Nanodegree. In this project, we worked with real-life data provided to us by Bertelsmann AZ Direct and Arvato Finance Solution. The data here concerns a company that performs mail-order sales in Germany. Their main question of interest is to identify facets of the population that are most likely to be purchasers of their products for a mailout campaign. The job was to use unsupervised learning techniques to organize the general population into clusters, then use those clusters to see which of them comprise the main user base for the company.

We are not allowed to publish the data provided by Arvato Financial Services due to the terms and conditions. In the notebook you can find the code and the analysis that I perfomed on the datasets.

Dataset

There are four data files associated with this project provided by Arvato:

  • Udacity_AZDIAS_052018.csv: Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns).
  • Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns).
  • Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns).
  • Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns).

Pre-Installation

This project requires Python 3.x and the following Python libraries installed:

  • NumPy
  • Pandas
  • SciPy
  • matplotlib
  • seaborn
  • scikit-learn
  • xgboost

Instruction

Due to publicity Terms Arvato Project Workbook.ipynb is only for exploration. Blog post about this project is published on Medium

Licence

The code in this repository is free for everyone to use. The data from AZ Direct GmbH is solely for use in the Unsupervised Learning and Bertelsmann Capstone projects are governed by additional terms and conditions.