Picked a data set of 54,000 diamonds to predict the continuous variable of price.
Taking a look at the different features such as carat weight, color, cut, and clarity to see how these independent variables impact and influence the target variable of price.
• Kaggle
The skills used to complete this project consisted of working with Python to make visualizations using Pandas and cleaning the data set well. Also understanding & knowing how to interpret various regression models based on feature engineering & selection.
On GitHub I had posted four separate notebooks. One which was consisted of the data collection & cleaning (including visualizations/EDA) ,the other for the different models I used to depict the best predictions, and finally the ReadMe notebook which is a layout of how my project was presented.
Is there any correlation between price & carat weight? Is there any correlation between price & cut of the diamond? Is there any correlation between price & color grade of the diamond? Is there any correlation between price & clarity of the diamond? How can I use feature engineering to enhance my prediction model values?
First, I gathered a data set of 54,000 different diamonds. After I gathered the data and cleaned it, I had selected the features from the data in which I thought would most strongly correlate to the ultimate price of the diamond. Next, I did some EDA and decided which features I should include in my models. Following that, I had split my data into training and testing and analyzed the different values of my R^2 & RMSE (Root Mean Squared Error) for each model. Finally, I compared the different models to see which could predict the best price of the diamonds.
The future steps I would have taken would be to include a Ridge regression model for my data set. Another goal would have been to find another data set of even more features of diamonds and merge the two & apply more feature engineering & selection from there.
Based on my results from my analysis, I can suggest that carat weight is the most statistically significant feature in determining the price of a diamond. There are other important features that can heavily change the total amount of your diamond, however carat is the most influential. In conclusion, the OLS model of this data set is best represented to predict pricing of your average diamond.
https://docs.google.com/presentation/d/1J5C9aVBEaC5PkE2vk-u2zqNhYORiG2ELBIQOCTACJTM/edit?usp=sharing