Analysis of the Wine Quality Data Set from the UCI Machine Learning Repository. This project has the same structure as the Distribution of craters on Mars project.
The two data sets containing physicochemical and sensory characteristics of red and white variants of the Portuguese "Vinho Verde" wine were taken from the UCI Machine Learning Repository. These data sets are the courtesy of Paulo Cortez.
There are 1599 samples of red wine and 4898 samples of white wine in the data sets. Each wine sample (row) has the following characteristics (columns):
- Fixed acidity
- Volatile acidity
- Citric acid
- Residual sugar
- Chlorides
- Free sulfur dioxide
- Total sulfur dioxide
- Density
- pH
- Sulphates
- Alcohol
- Quality (score between 0 and 10)
By the means of data management, visualization, analysis, regression modeling, and machine learning, I explore the relationships and correlations between the wine characteristics and its quality score. The main focus of this work is to try different predictive algorithms on the data and examine the resutls.
The work flows through the following sections:
- P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.