Experimental Code for KBS: Scalable KDE-based Top-n Local Outlier Detection over Large-Scale DataStreams
Overview
The detection of local outliers over high-volume data streams is critical for diverse real-time applications in the real world, where the distributions in different subsets of the data tend to be skewed. However, existing methods are not scalable to large-scale high-volume data streams owing to the high complexity of the re-detection of data updates. In this work, we propose a top-n local outlier detection method based on Kernel Density Estimation (KDE) over large-scale high-volume data streams.
Main Methods
The proposed method consists two versions: UKOF and LUKOF method.
Main Class for UKOF method: cellpruning.lof.pruning.ComputeTopNKDE
Main Class for LUKOF method: cellpruning.lof.pruning.ComputeTopNKDE_LazyUpdate
Environment
-
Eclipse
-
Build and Use the Software Artifact
1.Open Eclipse
2.Import the code named "TopNKOF"
3.Set parameters in "util.SQConfig", such as the number of nearest neighbors k, top outliers n, window size w and slide size s.
4.Run the corresponding main methods for UKOF and LUKOF, namely "cellpruning.lof.pruning.ComputeTopNKDE" and "cellpruning.lof.pruning.ComputeTopNKDE_LazyUpdate"
Dataset
extensive experiments are conducted on ten real-world and synthetic datasets. The real-world datasets are extracted from UCI Machine Learning Repository.
synthetic dataset: Interchanging RBF, Moving Squares, Mixture RBF
real-word datasets: Vowels, KDDCup, Subhttp, Smtp, ForestCover, Mobike, GeoLife