Finding-Donors-For-CharityML

Machine Learning Nanodegree Project Udacity

Introduction

This repo contains all my work for Project 1 of Udacity's Machine Learning Basic Nanodegree Program. In this project, I applied supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. I first explored the data to learn how the census data is recorded. Next, I applied a series of transformations and preprocessing techniques to manipulate the data into a workable format. Then I evaluated several supervised learners of my choice on the data, and considered which is best suited for the solution. Afterwards, I optimized the model I had selected and presented it as my solution to CharityML. Finally, I explored the chosen model and its predictions under the hood, to see just how well it’s performing when considering the data it’s given. predicted selling price to the statistics.

Disclaimer:

As a CS minor student of IIT Kharagpur and a long-time self-taught learner, I have completed many CS related MOOCs on Coursera, Udacity, Udemy, and Edx. I do understand the hard time you spend on understanding new concepts and debugging your program. Here I released these solutions, which are only for your reference purpose. It may help you to save some time. And I hope you don't copy any part of the code (the programming assignments are fairly easy if you read the instructions carefully), see the solutions before you start your own adventure. This Project is almost one of the simplest Machine Learning Project I have ever taken, but the simplicity is based on the fabulous course content and structure. It's a treasure given by Udacity team.

Project Overview

In this project, you will apply supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. You will first explore the data to learn how the census data is recorded. Next, you will apply a series of transformations and preprocessing techniques to manipulate the data into a workable format. You will then evaluate several supervised learners of your choice on the data, and consider which is best suited for the solution. Afterwards, you will optimize the model you've selected and present it as your solution to CharityML. Finally, you will explore the chosen model and its predictions under the hood, to see just how well it's performing when considering the data it's given.

Project Highlights

This project is designed to get you acquainted with the many supervised learning algorithms available in sklearn, and to also provide for a method of evaluating just how each model works and performs on a certain type of data. It is important in machine learning to understand exactly when and where a certain algorithm should be used, and when one should be avoided.

Things you will learn by completing this project:

How to identify when preprocessing is needed, and how to apply it.
How to establish a benchmark for a solution to the problem.
What each of several supervised learning algorithms accomplishes given a specific dataset.
How to investigate whether a candidate solution model is adequate for the problem.

Helpful Links For The Project:

Supervised learning Material Udacity [https://classroom.udacity.com/nanodegrees/nd009-InMB1/parts/fa53d27c-8e26-4a81-ac5f-a6781f5e0953]
Scikit Learn Supervised Learning Algorithms [http://scikit-learn.org/stable/supervised_learning.html]
Tuning GBM [https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/]
Skewness [https://becominghuman.ai/how-to-deal-with-skewed-dataset-in-machine-learning-afd2928011cc]
Data Transformation Statistics [https://en.wikipedia.org/wiki/Data_transformation_(statistics)]