C++ implementation of
- Real-time Streaming Anomaly Detection in Dynamic Graphs. Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos. (Under Review)
- MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams. Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos. AAAI 2020.
The old implementation is in another branch OldImplementation
, it should be considered as being archived and will hardly receive feature updates.
- Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
- Detects Microcluster Anomalies (suddenly arriving groups of suspiciously similar edges e.g. DoS attack)
- Theoretical Guarantees on False Positive Probability
- Constant Memory (independent of graph size)
- Constant Update Time (real-time anomaly detection to minimize harm)
- Up to 55% more accurate and 929 times faster than the state of the art approaches
- Some experiments are performed on the following datasets:
If you use Windows:
- Open a Visual Studio developer command prompt, we want their toolchain
cd
to the project rootMIDAS/
cmake -DCMAKE_BUILD_TYPE=Release -G "NMake Makefiles" -S . -B build/release
cmake --build build/release --target Demo
cd
toMIDAS/build/release/src
.\Demo.exe
If you use Linux/macOS systems:
- Open a terminal
cd
to the project rootMIDAS/
cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release
cmake --build build/release --target Demo
cd
toMIDAS/build/release/src
./Demo
The demo runs on MIDAS/data/DARPA/darpa_processed.csv
, which has 4.5M records, with the filtering core.
The scores will be exported to MIDAS/temp/Score.txt
, higher means more anomalous.
All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double-click on the executable file.
Cores are instantiated at MIDAS/example/Demo.cpp:64-66
, uncomment the chosen one.
You need to prepare three files:
- Meta file
- Only includes an integer
N
, the number of records in the dataset - Use its path for
pathMeta
- Only includes an integer
- Data file
- A header-less csv format file of shape
[N,3]
- Columns are sources, destinations, timestamps
- Use its path for
pathData
- A header-less csv format file of shape
- Label file
- A header-less csv format file of shape
[N,1]
- The corresponding label for data records
- 0 means normal record
- 1 means anomalous record
- Use its path for
pathGroundTruth
- A header-less csv format file of shape
- Include the header
MIDAS/CPU/NormalCore.hpp
,MIDAS/CPU/RelationalCore.hpp
orMIDAS/CPU/FilteringCore.hpp
- Instantiate cores with required parameters
- Call
operator()
on individual data records, it returns the anomaly score for the input record.
- KDnuggets: Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs
- Towards Data Science: Controlling Fake News using Graphs and Statistics
- Towards Data Science: Anomaly detection in dynamic graphs using MIDAS
- Towards AI: Anomaly Detection with MIDAS
- AIhub Interview
- Golang by Steve Tan
- Ruby by Andrew Kane
- Rust by Scott Steele
- R by Tobias Heidler
- Python by Ritesh Kumar
If you use this code for your research, please consider citing our paper.
@inproceedings{bhatia2020midas,
title="MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams",
author="Siddharth {Bhatia} and Bryan {Hooi} and Minji {Yoon} and Kijung {Shin} and Christos {Faloutsos}",
booktitle="AAAI 2020 : The Thirty-Fourth AAAI Conference on Artificial Intelligence",
year="2020"
}