/Udacity_Project_Exploratory_Data_Analysis

An exploration of Udacity's curated Vino-Verde Wine White data set, using R.

Primary LanguageHTMLMIT LicenseMIT

Udacity_Project_Exploratory_Data_Analysis

Introduction

In this project, you will use R and apply exploratory data analysis techniques in a selected dataset to discover relationships among multiple variables, and create explanatory visualizations illuminating distributions, outliers, and anomalies.

EDA can lead to insights, which may uncover to other questions, and eventually predictive models. It also is an important “line of defense” against bad data and is an opportunity to notice that your assumptions or intuitions about a data set are violated.

As John Tukey stated, "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." We want you to ask interesting questions about data and give you a chance to explore.

Sources

  1. P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

  2. UCI Machine Learning Respository Wine Quality Data Set

  3. Log Transformations For Skewed Data - R-Statistics.com

  4. Statistical Tools for High-Throughput Data Analysis- GGPlot2 - Axis Scales and Transformations

  5. Statistical Tools for High-Throughput Data Analysis- GGally Correlation Matrix Guide

  6. Adventures in Statistics blog post