This repo contains class based implementations of three outlier detection algorithms.
Implementations were used for a university project.
Uses agglomerative clustering to detect outliers. The hierarchy is cut at a specified number of clusters. All clusters containing less samples than a threshold are considered outliers.
The original paper can be found here.
Uses a modified k-means algorithm to perform robust centroid updates with respect to outliers. The points with the largest point to centroid distances are considered outliers.
The original paper can be found here.
Outlier detection algorithm based on k-means# [2017, Olukanmi & Twala]. Uses a k-medians based robust hierarchical initialization [2007, Arai & Barakbah]. Performs k-medians and detects the points which are more than z standard deviations distant from their centroid as outliers. Additionally clusters which are both distant from all other clusters and contain few observations are considered as clusters of outliers.
Experiments.py contains a number of sample experiments on toy datasets.
Python 3.x
- numpy
- pandas
- SciPy
- scikit-learn
- matplotlib
It is recommended to install the requirements through the Anaconda Python distribution. IMPORTANT: scikit-learn version needs to be >0.20.0, else the function pairwise_distances_argmin_min is bugged.
- Timo Klein -
- Oscar for being a cool cat and lending his name to the project