/Exploratory-Data-Analysis

Introducing Data Visualization to help explain data through various visualizations

Primary LanguageJupyter Notebook

Exploratory-Data-Analysis

Introducing Data Visualization to help explain data through various visualizations

Overview

Using the Penguins Dataset we developed numerous ways to visualize the dataset

Statastical Insights

Penguins Dataset Statistical Insights

Boxplot

Boxplots are a good way to check the outliers or understand the relationship between a categorical and continuous feature by showing the distribution of data.

Histogram

Histograms are used to depict the frequency distribution. It can be only used with numerical data

Kdeplot

kdepot is another way to visualize the distribution of the data.

Bar Plot

In a bar plot, the x-axis represents a categorical variable while the y-axis is a numerical variable. That is why the bar plot depicts a relationship between these two variables. For instance, below the body masses of the penguins for each islands are seen. The categorical data in the x-axis is the islands whereas the numerical data in y-axis is the body mass of the penguin.

Cross Tab

Crosstab is a pandas function that displays the relationship between two or more variables that can be analyzed.

Count Plot

A count plot is similiar to a bar plot but used for only categorical data. It plots the cout of observations by category in form of a bar plot. It differs from a bar plot because bar plots show the mean of a feature by category

Violin Plot

Violin plots have common properties with box plots and used when the objective is to observe the distribution of numerical data for different categories.It gives more insights than a box plot, because two different categories might have the same mean but it doesn’t mean that they are the same. Their distributions might differ and in that case, violin plots would be more useful to observe.

Correlation Matrix

Correlation Matrix is the summarization of the dataset. The matrix shows the correlations between features. In this dataset the relationship between length of flipper and body mass shows a correlation of 0.87, which is quite high.

Summary

This dataset is quite small yet really useful in performing data analysis. The numerous and different visualizations play a key role in helping us understand the broader picture of datasets.

Resoures

Files

  • penguins_size.csv

  • Exploratory Data Visualization.ipynb