/Diamond-Price

Primary LanguageJupyter Notebook

Diamond-Price

In this project, I'll be working with a dataset regarding the prices and attributes of approximately 54,000 round-cut diamonds. I'll go through the steps of an explanatory data visualization, systematically starting from univariate visualizations, moving through bivariate visualizations, and finally multivariate visualizations. Finally, I'll work on polishing up selected plots from the analysis so that their main points can be clearly conveyed to others.

Information on the Diamond Dataset

  • price: Price in dollars. Data were collected in 2008.

  • carat: Diamond weight. 1 carat is equal to 0.2 grams.

  • cut: Quality of diamond cut, affects its shine. Grades go from (low) Fair, Good, Very Good, Premium, Ideal (best).

  • color: Measure of diamond coloration. Increasing grades go from (some color) J, I, H, G, F, E, D (colorless).

  • clarity: Measure of diamond inclusions. Increasing grades go from (inclusions) I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF (internally flawless).

  • x, y, z: Diamond length, width, and depth, respectively, in mm.

  • table: Ratio of the width of the top face of diamond to its overall width, as a percentage.

  • depth: Proportional depth of the diamond, as a percentage. This is computed as 2 * z / (x + y), or the ratio of the depth to the average of length and width.

  • In the project, I will concentrate only on the variables in the top five bullet points: price and the four 'C's of diamond grade.

  • My focus will be on answering the question about the degree of importance that each of these quality measures has on the pricing of a diamond.