✔️ Complete
The objective of this project is to create a model to predict prices of diamonds, practicing linear regression.
To acess complete objective informations click here.
Predicted price must be below 900 RMSE
Datasets was provided by IronHack.
00-diamonds.csv
00-rick_diamonds.csv
- Import the dataframe;
- Create a first baseline predicting the price by the mean;
- Start to Explore and Clean the Data:
- Check null values;
- Search for outliers - comparing mean and median;
- Calculate values to correct x,y and z;
- Check correlations.
- Apply the linear regression to predict the prices of diamonds;
- Improve the model until RMSE (root mean squared error) < 900.
After a lot of attempts, we obtained:
R² | RMSE |
---|---|
95.8% | 660 |
And comparing to Rick's dataset we got:
- Numpy
- Pandas
- MatplotLib and Seaborn
- Linear Regression
- Apply the model for two non-linear variables;
- Decrease the RMSE for the amount requested.
- Use target encoder on categorical variables;
Lucas Angulski