Linear regression is one of the simplest yet powerful tools for finding regularities in data and using them for prediction. It is widely applied both in science and practice. In this project, you will learn how to apply scikit-learn library to fit linear models, use them for prediction, compare the models, and select the best one. You will also learn how to carry out testing for certain issues with data.
Practice fitting linear models with scikit-learn to predict values on the unknown data. Apply polynomial feature engineering, test your data for multi-collinearity, and evaluate models with the MAPE score.
Stage 1 : Fit a simple model with one predictor and evaluate it.
Stage 2 : Use the linear model to handle the polynomial relationship between independent and dependent variables.
Stage 3 : Fit a linear model with many independent variables and compare it with the previous models.
Stage 4 : Check whether the variables have a high correlation and try to improve the model's performance by removing them.
Stage 5 : Get rid of negative predictions and see whether the model performance improves.
To learn more about this project, please visit HyperSkill Website - Salary Prediction.
This project's difficulty has been labelled as Challenging where this is how HyperSkill describes each of its four available difficulty levels:
- Easy Projects - if you're just starting
- Medium Projects - to build upon the basics
- Hard Projects - to practice all the basic concepts and learn new ones
- Challenging Projects - to perfect your knowledge with challenging tasks
This Repository contains one .py file and one folder:
code.py - Contains the code used to complete the data analysis requirements
Data repository - Contains the data.csv files that contain the data
Project was built using python version 3.11.3
It contains the following 9 columns:
-
Numerical features
rating
draft_round
age
experience
bmi
-
Non-numerical features
team
position
country
-
Target prediction
salary
Download the files to your local repository and open the project in your choice IDE and run the project. Different models were used to fit the data, evaluated with the Mean Absolute Percentage Error (MAPE) metric, according to each Stage's docstrings. Please read each Stage's docstring to know the requirements.