/-Python-Analysis_of_wine_quality

Analysis of the Wine Quality Data Set from the UCI Machine Learning Repository

Primary LanguagePython

-Python-Ahalysis_of_wine_quality

Analysis of the Wine Quality Data Set from the UCI Machine Learning Repository. This project has the same structure as the Distribution of craters on Mars project.

Data

The two data sets containing physicochemical and sensory characteristics of red and white variants of the Portuguese "Vinho Verde" wine were taken from the UCI Machine Learning Repository. These data sets are the courtesy of Paulo Cortez.

There are 1599 samples of red wine and 4898 samples of white wine in the data sets. Each wine sample (row) has the following characteristics (columns):

  1. Fixed acidity
  2. Volatile acidity
  3. Citric acid
  4. Residual sugar
  5. Chlorides
  6. Free sulfur dioxide
  7. Total sulfur dioxide
  8. Density
  9. pH
  10. Sulphates
  11. Alcohol
  12. Quality (score between 0 and 10)

Goals and work flow

By the means of data management, visualization, analysis, regression modeling, and machine learning, I explore the relationships and correlations between the wine characteristics and its quality score. The main focus of this work is to try different predictive algorithms on the data and examine the resutls.

The work flows through the following sections:

  1. Data management and visualization
  2. Data analysis
  3. Regression modeling
  4. Machine learning

Recources

  • P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.