Udacity - Machine Learning Engineer Nanodegree Program
This project is to aid a mail-order sales company to acquire new German clients for their mail-out campaign based on analyzing and comparing the attributes of general population and potential target Germen population. The end goal is to identify and predict a group of target audience of the campaign that could bring the highest return for the company. This study would point out a general direction for the company to move forward with higher return on investment.
- EDA
- PCA
- K-means, Mini-Batch K-Means
- Logistic Regression
- AdaBoost Classifier
- Random Forrest
This project uses Python 3 and is designed to be completed through the Jupyter Notebooks IDE. It is highly recommended that you use the Anaconda distribution to install Python, since the distribution includes all necessary Python libraries as well as Jupyter Notebooks. The following libraries are expected to be used in this project:
- NumPy
- pandas
- pickle
- Sklearn / scikit-learn
- Matplotlib (for data visualization)
- Seaborn (for data visualization)
proposal.pdf
: Summarize the intent and initial blueprint of the project
report.pdf:
Summarize the findings and analysis of the project
Arvato_capstone.ipynb
: Jupyter Notebook that runs in the following three segments:
- Data Cleaning and Transformation
- Unsupervised Learning
- Supervised Learning
README.md
: This file, describing the contents of the project
DIAS Attributes - Values 2017.xlsx
: Explaining the features of the data headers
The notebooks expect that the following files are present in the data
folder:
Udacity_AZDIAS_052018.csv
Udacity_CUSTOMERS_052018.csv
Udacity_MAILOUT_052018_TEST.csv
Udacity_MAILOUT_052018_TRAIN.csv