
Customer idendentification using unsupervised learning models. Potential customer detection using machine learning models including Random Forrest and AdaBoost Classifier.

Primary LanguageJupyter Notebook

Capstone Project

Udacity - Machine Learning Engineer Nanodegree Program

Project Overview

This project is to aid a mail-order sales company to acquire new German clients for their mail-out campaign based on analyzing and comparing the attributes of general population and potential target Germen population. The end goal is to identify and predict a group of target audience of the campaign that could bring the highest return for the company. This study would point out a general direction for the company to move forward with higher return on investment.


  • EDA
  • PCA
  • K-means, Mini-Batch K-Means
  • Logistic Regression
  • AdaBoost Classifier
  • Random Forrest

Software and Libraries

This project uses Python 3 and is designed to be completed through the Jupyter Notebooks IDE. It is highly recommended that you use the Anaconda distribution to install Python, since the distribution includes all necessary Python libraries as well as Jupyter Notebooks. The following libraries are expected to be used in this project:

  • NumPy
  • pandas
  • pickle
  • Sklearn / scikit-learn
  • Matplotlib (for data visualization)
  • Seaborn (for data visualization)

How the project is organized

File Descriptions:

proposal.pdf: Summarize the intent and initial blueprint of the project

report.pdf: Summarize the findings and analysis of the project

Arvato_capstone.ipynb: Jupyter Notebook that runs in the following three segments:

  1. Data Cleaning and Transformation
  2. Unsupervised Learning
  3. Supervised Learning

README.md: This file, describing the contents of the project

DIAS Attributes - Values 2017.xlsx: Explaining the features of the data headers

The notebooks expect that the following files are present in the data folder:

  • Udacity_AZDIAS_052018.csv
  • Udacity_CUSTOMERS_052018.csv
  • Udacity_MAILOUT_052018_TEST.csv
  • Udacity_MAILOUT_052018_TRAIN.csv