Create a Customer Segmentation Report for Arvato Financial Solutions

Table of Contents

  1. Introduction
  2. Installation
  3. File Descriptions
  4. Results
  5. Terms & Conditions

Introduction :

The project has three major steps: the customer segmentation report, the supervised learning model, and the Kaggle Competition.

  1. Customer Segmentation Report This section will be similar to the corresponding project in Term 1 of the program, but the datasets now include more features that you can potentially use. You'll begin the project by using unsupervised learning methods to analyze attributes of established customers and the general population in order to create customer segments.

  2. Supervised Learning Model You'll have access to a third dataset with attributes from targets of a mail order campaign. You'll use the previous analysis to build a machine learning model that predicts whether or not each individual will respond to the campaign.

  3. Kaggle Competition Once you've chosen a model, you'll use it to make predictions on the campaign data as part of a Kaggle Competition. You'll rank the individuals by how likely they are to convert to being a customer, and see how your modeling skills measure up against your fellow students.

Installation :

Below are some libraries we used.

  • Python 3.6.8
  • numpy 1.16.4
  • pandas 0.24.2
  • matplotlib 3.0.2
  • seaborn 0.9.0
  • scikit-learn 0.21.2
  • xgboost 0.90
  • ipython 6.1.0
  • ipython-genutils 0.2.0
  • jupyter-client 5.1.0
  • jupyter-console 5.2.0
  • jupyter-core 4.3.0
  • jupyterlab 0.35.6
  • jupyterlab-launcher 0.4.0
  • jupyterlab-server 0.2.0

Or you can run below command to setup the environment.

    conda create --name arvato-project python=3.6
	source activate arvato-project
	pip install -r requirements/requirements.txt

File Descriptions :

- terms_and_conditions
|- terms_completed.md
|- terms.md
|- terms.pdf
- Arvato Project Workbook.ipynb
- Arvato Project Workbook.html
- Project Rubric.pdf
- requirements.txt
- README.md

Results :

  • Data Preprocessing

    • missing value distribution

    alt text

    • missing value distribution after change unknown value to NA

    alt text

    • categorical feature

    alt text

    • quantitative feature

    alt text

    • quantitative feature after log transform and outlier caping

    alt text

    • drop

    alt text

  • Customer Segmentation Report

    • PCA

    alt text

    • KMeans

    alt text

    • cluster

    alt text

    • feature difference

    alt text

  • Supervised Learning Model

    • data distribution

    alt text

    • model selection

    alt text

    • XGBoost

    alt text

The main findings of the code can be found at the post available here.

Terms & Conditions :

In addition to Udacity's Terms of Use and other policies, your downloading and use of the AZ Direct GmbH data solely for use in the Unsupervised Learning and Bertelsmann Capstone projects are governed by the following additional terms and conditions. The big takeaways:

  1. You agree to AZ Direct GmbH's General Terms provided below and that you only have the right to download and use the AZ Direct GmbH data solely to complete the data mining task which is part of the Unsupervised Learning and Bertelsmann Capstone projects for the Udacity Data Science Nanodegree program.

  2. You are prohibited from using the AZ Direct GmbH data in any other context.

  3. You are also required and hereby represent and warrant that you will delete any and all data you downloaded within 2 weeks after your completion of the Unsupervised Learning and Bertelsmann Capstone projects and the program.

  4. If you do not agree to these additional terms, you will not be allowed to access the data for this project. The full terms are provided in the workspace below. You will then be asked in the next workspace to agree to these terms before gaining access to the project, which you may also choose to download if you would like to read in full the terms.