/Diamond_Predictions

Used Linear Regression to accurately predict the price of diamonds.

Primary LanguageJupyter Notebook

Diamond_Predictions

INTRODUCTION:

Picked a data set of 54,000 diamonds to predict the continuous variable of price.

OBJECTIVE:

Taking a look at the different features such as carat weight, color, cut, and clarity to see how these independent variables impact and influence the target variable of price.

THE DATASET:

• Kaggle

SKILLS REQUIRED TO COMPLETE:

The skills used to complete this project consisted of working with Python to make visualizations using Pandas and cleaning the data set well. Also understanding & knowing how to interpret various regression models based on feature engineering & selection.

WHAT WAS POSTED ON GITHUB:

On GitHub I had posted four separate notebooks. One which was consisted of the data collection & cleaning (including visualizations/EDA) ,the other for the different models I used to depict the best predictions, and finally the ReadMe notebook which is a layout of how my project was presented.

QUESTIONS I POSED:

Is there any correlation between price & carat weight? Is there any correlation between price & cut of the diamond? Is there any correlation between price & color grade of the diamond? Is there any correlation between price & clarity of the diamond? How can I use feature engineering to enhance my prediction model values?

HOW I PUT MY DATA TOGETHER:

First, I gathered a data set of 54,000 different diamonds. After I gathered the data and cleaned it, I had selected the features from the data in which I thought would most strongly correlate to the ultimate price of the diamond. Next, I did some EDA and decided which features I should include in my models. Following that, I had split my data into training and testing and analyzed the different values of my R^2 & RMSE (Root Mean Squared Error) for each model. Finally, I compared the different models to see which could predict the best price of the diamonds.

FUTURE/STEPS I WOULD HAVE DONE:

The future steps I would have taken would be to include a Ridge regression model for my data set. Another goal would have been to find another data set of even more features of diamonds and merge the two & apply more feature engineering & selection from there.

RECOMMENDATIONS BASED ON ANALYSIS:

Based on my results from my analysis, I can suggest that carat weight is the most statistically significant feature in determining the price of a diamond. There are other important features that can heavily change the total amount of your diamond, however carat is the most influential. In conclusion, the OLS model of this data set is best represented to predict pricing of your average diamond.

PRESENTATION LINK:

https://docs.google.com/presentation/d/1J5C9aVBEaC5PkE2vk-u2zqNhYORiG2ELBIQOCTACJTM/edit?usp=sharing