Advance Statistics and Linear Algebra for Data Science

img_normal_distribution-svg (image source: W3 School point)

Emperical Formula 68% - 95% - 99.7% Rule

  • within 1st Standard Deviation, 68% of the data will fall
  • within 2nd Standard Deviation, 95% of data will fall
  • within 3nd Standard Deviation, 99.7% of data will fall

Methods to find Outliers:

  1. calculating Z-Score
  2. calculating Inter Quartile Range (IQR)

Z-score

Z = standard score
x = observed value
µ = mean of the sample
∑ = standard deviation of the sample

To calculate Inter Quartile Range (IQR):

  • Short the dataset
  • calculate 1st Quartile and 3rd Quartile
  • find Lower fence = Q1 -1.5(IQR)
  • find Upper fence = Q3 -1.5(IQR)