This is my submission for MSA 2019 AI and Machine Learning Assignment.
Here I create a RandomForest model to predict if an person from the united states will have an income over 50K
Azure Notebook Link: https://notebooks.azure.com/PathToLife/projects/usaincomepredict
Tasks:
- create an Azure Notebook
- find data to analyse
- wrangle data to prepare it for machine learning
- analyse it to get a gist of what's going on
- run a machine learning model on the data and use the model to make predictions
- Complete the "intro-to-ml-with-python" exercises on the Microsoft Learn Platform https://docs.microsoft.com/en-gb/learn/paths/intro-to-ml-with-python/
- Create an Azure notebook (here: https://notebooks.azure.com/)
- Find another dataset (not the one used in this video) [There are some good ones here: https://archive.ics.uci.edu/ml/datasets.php]
- Clean it (remove bad data)
- Analyse it (plot histograms and scatter plots to see what the correlations are like)
- Run a machine learning model on the data (e.g. Linear Regression, Random Forest, etc.)
- Add comments explaining why you performed each step and add comments about any notable observation you make.
- The MS learn Machine Learning video
- General Data Analysis information
- This Pandas cheat sheet
- Example notebook from video