Dataset is downloaded from http://archive.ics.uci.edu/ml/datasets/Adult Features selected are:
- Age
- Workclass
- Education
- Country
- Martial Status Feature
- Occupation
- Relationship
- Race
- Gender
- Capital
- Hours per week working
I have also clubbed some of the values of the features, eg occupation in blue and green collar. (higher and lower class) You can get through the code to understand it.
There are lot of missing values in features,
- Workclass
- Country So, using features, age, education, race, gender, capital, hours per week working we are going to predict missing values using GradientBoostingClassifier.
so we used classifers to predict those values using fixed features which don't have any missing values.
- GradientBoostingClassifier classifier yielded 86% accuracy.
- Neural Network yielded an accuracy of 88%