pyODPlus is an extension of PyOD, a Python toolkit for detecting outlying objects in multivariate data. PyOD includes more than 30 detection algorithms from classical LOF (SIGMOD 2000) to the latest COPOD (ICDM 2020).
A requirements.txt
file is provided in the main directory and contains all packages that have to be installed. To install all the packages, run the following code section:
pip install -r requirements.txt
Implementations of the following algorithms can be found in the outlier_detection
directory. Demonstrations of how to load and use these algorithms are also available in the demo
directory.
ROCF is an outlier detection method proposed by Huang, et. al. in "A novel outlier cluster detection algorithm without top-n parameter", published in Elsevier Knowledge-Based Systems 121 (2017) pp.32-40. The ROCF algorithm aims eliminate the need to specify the number or percentage of outliers in the dataset, i.e. n parameter or the contamination parameter in PyOD.
rocf = ROCF()
rocf.fit(X)
rocf.get_outliers()
CBOF is an outlier detection method proposed by Duan, et. al. in "Cluster-Based Outlier Detection", published in Ann. Oper. Res. 168 (1) (2009) pp.151–168. The paper introduces the clustering-based approach to detect not just single point outliers (noise), but also small clusters of outliers.
cbof = CBOF() # create instance of model
cbof.fit(X) # fit model on X
cbof.get_outliers() # retrieve outliers
The evaluation
directory contains Jupyter Notebooks with code used to evaluate the performance of the ROCF model on several datasets. We tested the ROCF algorithm on 3 types of datasets, namely:
- Resampled IRIS (replicated from paper)
- Synthetic D1, D2 and D3 (replicated from paper)
- Bank Card Fraud Transactions (https://www.kaggle.com/ninads/kernel3b5cdd2865/data)
The utils
directory includes code to reproduce the same synthetic datasets used for our evaluation.