/sklearn-census-earnings

Using sklearn to predict if an individual earns above or below $50k from census information

Primary LanguageJupyter Notebook

This notebook uses US census information to attempt to predict whether an individual earns more or less than $50k.

The final model achieves an accuracy of 86.7% on the test data set.

The data used can be found here: https://www.kaggle.com/uciml/adult-census-income

Overview

A rough outline of the steps taken is as follows:

  1. Visualize the data to identify trends and artifacts that my affect later processes
  2. Prepare the data (train/test split, scaling, one-hot encoding, etc)
  3. Grid search on several models to identify which will most likely be successful
  4. Visualise the performance of the models to better understand shortcomings and areas for improvement
  5. Continue to tune the most promising models
  6. Run the model on the test set