/EDA_R

Primary LanguageHTML

EDA in R Studio

In this project, I will use R and apply exploratory data analysis techniques to explore relationships in one variable to multiple variables and to explore a selected data set for distributions, outliers, and anomalies.

R Studio Installation

Download and install R from the Comprehensive R Archive Network (CRAN). After installing R, need to download and install R Studio. Choose the appropriate installation for your operating system. Also need to install a few packages. Opening R Studio and installing the following packages using the command line.

install.packages("ggplot2", dependencies = T) 
install.packages("knitr", dependencies = T)
install.packages("dplyr", dependencies = T)

Why this Project?

Exploratory Data Analysis (EDA) is the numerical and graphical examination of data characteristics and relationships before formal, rigorous statistical analyses are applied.EDA can lead to insights, which may uncover to other questions, and eventually predictive models. It also is an important “line of defense” against bad data and is an opportunity to notice that your assumptions or intuitions about a data set are violated.

After completing the project, you will:

Understand the distribution of a variable and to check for anomalies and outliers Learn how to quantify and visualize individual variables within a data set by using appropriate plots such as scatter plots, histograms, bar charts, and box plots Explore variables to identify the most important variables and relationships within a data set before building predictive models; calculate correlations, and investigate conditional means Learn powerful methods and visualizations for examining relationships among multiple variables, such as reshaping data frames and using aesthetics like color and shape to uncover more information

For code in R, see EDR_project.rmd

For report and visulization, copy link to GitHub & BitBucket HTML Preview