/Exploratory-Data-Analysis

Exploratory Data Analysis on Iris Dataset

Primary LanguageJupyter Notebook

Exploratory Data Analysis (EDA)

It is very important to know our data very well before applying it to build the model. Initially, we don't have any knowledge about the data, so we need to explore our data very well. Exploratory Data Analysis or EDA is a task of analyzing data using multiple important and useful tools, different statistical methods, visualization plots and many other important techniques like linear regression, rule based methods etc. in order to get the complete understanding about the data. It is the first and very important step of any kind of data modeling and model building.

Pre-Analysis part:

  1. Get the no. of features (columns) and no. of records (rows).
  2. Get features (independent variables) name.
  3. Get unique class (dependent variables) names.
  4. Get no. of feature values or dataponits belongs to each class.
  5. Check whether the dataset is balanced or not.

Statistical Methods:

  1. Mean
  2. Standard Deviation
  3. Median
  4. Quantile & Percentile
  5. Median Absolute Deviation (MAD)

Analysis between variables:

  1. Univariate Analysis
  2. Bivariate Analysis
  3. Multivariate Analysis

Multiple plots:

  1. Histogram Plot
  2. Probability Density Function (PDF) and Cumulative Distribtion Function (CDF)
  3. Scatter Plot
  4. Pair Plot
  5. Box Plot
  6. Violin Plot
  7. Contour Plot
  8. Dist Plot
  9. Joint Plot

Python packages used for EDA:

  1. NumPy
  2. Seaborn
  3. Matplotlib
  4. Pandas