This repository contains the complete FIFA 18 player dataset from Kaggle.com.
Goal Use multiple regression to predict player wage based on player value, potential, stamina, speed and freekick accuracy.
Environment Jupyter Notebook pandas matplotlib seaborn statsmodels scikitlearn
Steps
- Split the data into training and test sets.
In the training dataset
- Create histograms and scatterplots for all variables. Report unusual or missing values.
- Transform variables where the relationship is not linear.
- Standardise (normalise) variables before regression.
- Build a multiple regression model predicting wage using player value, potential, stamina, speed and freekick accuracy.
- Check model accuracy by plotting the residuals.
Once you are satisfied with your model, run the same model on the test set and compare model accuracy between the training and test sets.