Capstone Project

Udacity - Machine Learning Engineer Nanodegree Program

Project Overview

This project is to aid a mail-order sales company to acquire new German clients for their mail-out campaign based on analyzing and comparing the attributes of general population and potential target Germen population. The end goal is to identify and predict a group of target audience of the campaign that could bring the highest return for the company. This study would point out a general direction for the company to move forward with higher return on investment.

Methodologies

EDA
PCA
K-means, Mini-Batch K-Means
Logistic Regression
AdaBoost Classifier
Random Forrest

Software and Libraries

This project uses Python 3 and is designed to be completed through the Jupyter Notebooks IDE. It is highly recommended that you use the Anaconda distribution to install Python, since the distribution includes all necessary Python libraries as well as Jupyter Notebooks. The following libraries are expected to be used in this project:

NumPy
pandas
pickle
Sklearn / scikit-learn
Matplotlib (for data visualization)
Seaborn (for data visualization)

How the project is organized

File Descriptions:

proposal.pdf: Summarize the intent and initial blueprint of the project

report.pdf: Summarize the findings and analysis of the project

Arvato_capstone.ipynb: Jupyter Notebook that runs in the following three segments:

Data Cleaning and Transformation
Unsupervised Learning
Supervised Learning

README.md: This file, describing the contents of the project

DIAS Attributes - Values 2017.xlsx: Explaining the features of the data headers

The notebooks expect that the following files are present in the data folder: