This notebook uses US census information to attempt to predict whether an individual earns more or less than $50k.
The final model achieves an accuracy of 86.7% on the test data set.
The data used can be found here: https://www.kaggle.com/uciml/adult-census-income
Overview
A rough outline of the steps taken is as follows:
- Visualize the data to identify trends and artifacts that my affect later processes
- Prepare the data (train/test split, scaling, one-hot encoding, etc)
- Grid search on several models to identify which will most likely be successful
- Visualise the performance of the models to better understand shortcomings and areas for improvement
- Continue to tune the most promising models
- Run the model on the test set