/Diamonds

Primary LanguageJupyter Notebook

DIAMOND

📈 Diamond Price Prediction| Linear Regression

Project Status

✔️ Complete

Table of Contents

Objective

The objective of this project is to create a model to predict prices of diamonds, practicing linear regression.
To acess complete objective informations click here.

Problem Statement

Predicted price must be below 900 RMSE

Resources

Datasets was provided by IronHack.
00-diamonds.csv
00-rick_diamonds.csv

Process

  • Import the dataframe;
  • Create a first baseline predicting the price by the mean;
  • Start to Explore and Clean the Data:
    • Check null values;
    • Search for outliers - comparing mean and median;
    • Calculate values to correct x,y and z;
    • Check correlations.
  • Apply the linear regression to predict the prices of diamonds;
  • Improve the model until RMSE (root mean squared error) < 900.

Results

After a lot of attempts, we obtained:

Image summary:

RMSE
95.8% 660

And comparing to Rick's dataset we got:
pawn

Learning Process

Theory Applied

  • Numpy
  • Pandas
  • MatplotLib and Seaborn
  • Linear Regression

Challenges

  • Apply the model for two non-linear variables;
  • Decrease the RMSE for the amount requested.

Improvements

  • Use target encoder on categorical variables;

Authors

Lucas Angulski