Experimental Code for KBS: Scalable KDE-based Top-n Local Outlier Detection over Large-Scale DataStreams

Overview

The detection of local outliers over high-volume data streams is critical for diverse real-time applications in the real world, where the distributions in different subsets of the data tend to be skewed. However, existing methods are not scalable to large-scale high-volume data streams owing to the high complexity of the re-detection of data updates. In this work, we propose a top-n local outlier detection method based on Kernel Density Estimation (KDE) over large-scale high-volume data streams.

Main Methods

The proposed method consists two versions: UKOF and LUKOF method.

Main Class for UKOF method: cellpruning.lof.pruning.ComputeTopNKDE

Main Class for LUKOF method: cellpruning.lof.pruning.ComputeTopNKDE_LazyUpdate

Environment

Eclipse
Build and Use the Software Artifact

1.Open Eclipse

2.Import the code named "TopNKOF"

3.Set parameters in "util.SQConfig", such as the number of nearest neighbors k, top outliers n, window size w and slide size s.

4.Run the corresponding main methods for UKOF and LUKOF, namely "cellpruning.lof.pruning.ComputeTopNKDE" and "cellpruning.lof.pruning.ComputeTopNKDE_LazyUpdate"

Dataset

extensive experiments are conducted on ten real-world and synthetic datasets. The real-world datasets are extracted from UCI Machine Learning Repository.

synthetic dataset: Interchanging RBF, Moving Squares, Mixture RBF

real-word datasets: Vowels, KDDCup, Subhttp, Smtp, ForestCover, Mobike, GeoLife

xtWang-for-github/TopNKOF

Experimental Code for KBS: Scalable KDE-based Top-n Local Outlier Detection over Large-Scale DataStreams

Overview

Main Methods

Environment

Dataset