This project aims to evaluate the performance of some ML regression algorithms including linear regression, gradient boosting, random forest and KNN to predict the price (target column) in famous dataset "boston housing". loaded from sklearn library.
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Sklearn
It consists of 506 rows and 13 columns (feature variables) in addition to the target column which is the price of each house (example). For the record, before implementation of any data visualization or modelling codes, all the numeric variables have to be normalized to prevent any propable skewed results towards the column with the largest scale.
- import necessary libraries
- load dataset online (no need to download it)
- normalize data
- visualize data
- split data into training and testing ratios
- apply linear regression
- apply random forest for regression
- apply gradient boosting for regression
- apply KNN for regression
- evaluate the four models using sklearn metrics
- visualize some actual and predicted results
Implementation of this project applying whatever regression algorithm you prefer could be the stepping stone to the world of machine learning especially the regression part. Besides, the simplicity of this dataset will really motivate you to explore more challenges later.