/Outlier-Detection-and-Removal

An outlier is a data point that is noticeably different from the rest. They represent errors in measurement, bad data collection, or simply show variables not considered when collecting the data.An outlier is a data point that is noticeably different from the rest. They represent errors in measurement, bad data collection, or simply show variables not considered when collecting the data.

Primary LanguageJupyter Notebook

Outlier-Detection-and-Removal

types of Outlier handling techniques

z score technique

credits:bit.ly/393HjIP (what is Outlierand more)

credits:bit.ly/3KTs9Dp (normal distribution data)

credits:bit.ly/3KVyDll (skwed data)

this technique helps you to find if there is any Outlier or not with the formula of image

***this is only used if the data is in normal distribution ***

iqr

The interquartile range is calculated in much the same way as the range. All you do to find it is subtract the first quartile from the third quartile: IQR = Q3 – Q1.

***this is only used if the data is in skewed data distribution ***

skewed distribution

after finding the Outlier the approaches are capping or Trimming

Capping

the capping will keep two caps in min and max values it will not extend beyond that if it extends it will replace the value with the maximum or minimum values

Trimming

in Trimming it will remove the Outlier by taking the endpoints