Anomaly detection is a process for identifying unexpected data, event or behavior that require some examination. It is a well-established field within data science and there is a large number of algorithms to detect anomalies in a dataset depending on data type and business context.
What is Z-score?
Simply speaking, Z-score is a statistical measure that tells you how far is a data point from the rest of the dataset. In a more technical term, Z-score tells how many standard deviations away a given observation is from the mean. For example, a Z score of 2.5 means that the data point is 2.5 standard deviation far from the mean. And since it is far from the center, it’s flagged as an outlier/anomaly.
- Step 0: You can understand about the data from the offical page of KDD Cup 1999
- Step 1: Download the data from KDD Cup 1999 Data HERE!!!
- Step 2: Add the columns to the dataset, as there are two different files for that available on the same page.
- Step 3: Save the dataset as a CSV file to use it.
Note: This repository is under updation, more methods will be added soon.