Wine_Quality_Prediction 🍷

The aim of this project is to understand different types of learning algorithms on a popular wine quality dataset on kaggle using machine learning.

Libraries Used

  • Numpy

    Importing Numpy Library

     import numpy as np

    About Numpy

    Numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

  • Pandas

    Importing Pandas Library

    import pandas as pd

    About Pandas

    Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.

  • Seaborn

    Importing Seaborn

    import seaborn as sns

    About Seaborn

    Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.It helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots.

  • Matplotlib

    Importing Matplolib

    import matplotlib.pyplot as plt

    About Matplotlib

    Matplotlib is easy to use and an amazing visualizing library in Python. It is built on NumPy arrays and designed to work with the broader SciPy stack and consists of several plots like line, bar, scatter, histogram, etc.

  • Sklearn

    Importing Sklearn

    import sklearn

    About Sklearn

    Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.

Algorithms Used

  • Logistic Regression

    Importing Logistic Regression Classifier

    from sklearn.linear_model import LogisticRegression

    About

    Logistic Regression is an easily interpretable classification technique that gives the probability of an event occurring, not just the predicted classification. It also provides a measure of the significance of the effect of each individual input variable, together with a measure of certainty of the variable's effect.
  • Decision Tree Classifier

    Importing Decision Tree Classifier

    from sklearn.tree import DecisionTreeClassifier

    About

    Decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes.
  • Random Forest Classifier

    Importing Random Forest Classifier

    from sklearn.ensemble import RandomForestClassifier

    About

    Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set.
  • Support Vector Machine

    Importing Support Vector Machine CLassifier

    from sklearn import svm

    About

    Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning.The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
  • KNeighbours Classifier

    Importing KNeighbours Classifier

    from sklearn.neighbors import KNeighborsClassifier

    About

    k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. While it can be used for either regression or classification problems, it is typically used as a classification algorithm, working off the assumption that similar points can be found near one another
  • Gradient Boosting Classifier

    Importing Gradient Boosting Classifier

    from sklearn.ensemble import GradientBoostingClassifier

    About

    Gradient boosting algorithm is one of the most powerful algorithms in the field of machine learning. As we know that the errors in machine learning algorithms are broadly classified into two categories i.e. Bias Error and Variance Error. As gradient boosting is one of the boosting algorithms it is used to minimize bias error of the mode

Dataset Analysis

  • Quality_Count Analysis

    Count v/s Quality
  • Alcohol v/s Quality Plot

    • Using Barplot visualizing the change of quality of wine on the basis of alcohol amout present in it Alcohol v/s Quality
  • HeatMap Analysis of Features

    • Determing the co-relation of different features among each other heatmap
  • Features Pairplot Analysis

    • Pairplot brings the ability of visualizing all features against each other at the same time Alcohol v/s Quality Alcohol v/s Quality

Model Analysis

  • Plotting the accuracy of different models used Model_Accuracy

Enjoy your wine