/midas

Go implementation of MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams

Primary LanguageC++Apache License 2.0Apache-2.0

MIDAS

C++ implementation of

The old implementation is in another branch OldImplementation, it should be considered as being archived and will hardly receive feature updates.

Table of Contents

Features

  • Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
  • Detects Microcluster Anomalies (suddenly arriving groups of suspiciously similar edges e.g. DoS attack)
  • Theoretical Guarantees on False Positive Probability
  • Constant Memory (independent of graph size)
  • Constant Update Time (real-time anomaly detection to minimize harm)
  • Up to 55% more accurate and 929 times faster than the state of the art approaches
  • Some experiments are performed on the following datasets:

Demo

If you use Windows:

  1. Open a Visual Studio developer command prompt, we want their toolchain
  2. cd to the project root MIDAS/
  3. cmake -DCMAKE_BUILD_TYPE=Release -G "NMake Makefiles" -S . -B build/release
  4. cmake --build build/release --target Demo
  5. cd to MIDAS/build/release/src
  6. .\Demo.exe

If you use Linux/macOS systems:

  1. Open a terminal
  2. cd to the project root MIDAS/
  3. cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release
  4. cmake --build build/release --target Demo
  5. cd to MIDAS/build/release/src
  6. ./Demo

The demo runs on MIDAS/data/DARPA/darpa_processed.csv, which has 4.5M records, with the filtering core.

The scores will be exported to MIDAS/temp/Score.txt, higher means more anomalous.

All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double-click on the executable file.

Customization

Switch Cores

Cores are instantiated at MIDAS/example/Demo.cpp:64-66, uncomment the chosen one.

Custom Dataset + Demo.cpp

You need to prepare three files:

  • Meta file
    • Only includes an integer N, the number of records in the dataset
    • Use its path for pathMeta
  • Data file
    • A header-less csv format file of shape [N,3]
    • Columns are sources, destinations, timestamps
    • Use its path for pathData
  • Label file
    • A header-less csv format file of shape [N,1]
    • The corresponding label for data records
      • 0 means normal record
      • 1 means anomalous record
    • Use its path for pathGroundTruth

Custom Dataset + Custom Runner

  1. Include the header MIDAS/CPU/NormalCore.hpp, MIDAS/CPU/RelationalCore.hpp or MIDAS/CPU/FilteringCore.hpp
  2. Instantiate cores with required parameters
  3. Call operator() on individual data records, it returns the anomaly score for the input record.

Online Articles

  1. KDnuggets: Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs
  2. Towards Data Science: Controlling Fake News using Graphs and Statistics
  3. Towards Data Science: Anomaly detection in dynamic graphs using MIDAS
  4. Towards AI: Anomaly Detection with MIDAS
  5. AIhub Interview

MIDAS in Other Languages

  1. Golang by Steve Tan
  2. Ruby by Andrew Kane
  3. Rust by Scott Steele
  4. R by Tobias Heidler
  5. Python by Ritesh Kumar

Citation

If you use this code for your research, please consider citing our paper.

@inproceedings{bhatia2020midas,
    title="MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams",
    author="Siddharth {Bhatia} and Bryan {Hooi} and Minji {Yoon} and Kijung {Shin} and Christos {Faloutsos}",
    booktitle="AAAI 2020 : The Thirty-Fourth AAAI Conference on Artificial Intelligence",
    year="2020"
}