/Data-Visualisation-

exploratory analysis in r

Primary LanguageR

-------Abas-H---------

Data-Visualisation-

Exploratory data analysis is an approach for summarizing and visualizing the important characteristics of a data set. Promoted by John Tukey, exploratory data analysis focuses on exploring data to understand the data’s underlying structure and variables, to develop intuition about the data set, to consider how that data set came into existence, and to decide how it can be investigated with more formal statistical methods.

Promoted by John Tukey, exploratory data analysis focuses on exploring data to understand the data’s underlying structure and variables, to develop intuition about the data set, to consider how that data set came into existence, and to decide how it can be investigated with more formal statistical methods. The greatest value of a picture is when it forces us to notice what we never expected to see. — John W. Tukey. Exploratory Data Analysis

Why do we use exploratory graphs in data analysis?

  • Find patterns in data
  • Understand How data is collected
  • Suggest modeling strategies

Summaries of Data

One dimensional Data– Univariate EDA for a quantitative variable is a way to make preliminary assessments about the population distribution of the variable using the data of the observed sample.

Boxplots–

boxplot consists of a rectangular box bounded above and below by “hinges” that represent the quartiles Q3 and Q1 respectively, and with a horizontal “median” line through it. You can also see the upper and lower “whiskers”, and a point marking a potential “outlier”.

Histograms- The most basic graph is the histogram, which is a bar plot in which each bar represents the frequency (count) or proportion

(count/total count) of cases for a range of values. Typically the bars run vertically with the count (or proportion) axis running vertically. To manually construct a histogram, define the range of data for each bar (called a bin), count how many cases fall in each bin, and draw the bars high enough to indicate the count.

Scatter Plot

This essantially provides information about the minimum value, 1st quartile, median, 3rd quartile and the maximum.