/DeepOD

Deep learning-based outlier/anomaly detection

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Python Deep Outlier/Anomaly Detection (DeepOD)

testing2 Documentation Status codacy coveralls downloads

license

DeepOD is an open-source python library for Deep Learning-based Outlier Detection and Anomaly Detection. DeepOD supports tabular anomaly detection and time-series anomaly detection.

DeepOD includes 27 deep outlier detection / anomaly detection algorithms (in unsupervised/weakly-supervised paradigm). More baseline algorithms will be included later.

DeepOD is featured for:

  • Unified APIs across various algorithms.
  • SOTA models includes reconstruction-, representation-learning-, and self-superivsed-based latest deep learning methods.
  • Comprehensive Testbed that can be used to directly test different models on benchmark datasets (highly recommend for academic research).
  • Versatile in different data types including tabular and time-series data (DeepOD will support other data types like images, graph, log, trace, etc. in the future, welcome PR 🔭).
  • Diverse Network Structures can be plugged into detection models, we now support LSTM, GRU, TCN, Conv, and Transformer for time-series data. (welcome PR as well ✨)

If you are interested in our project, we are pleased to have your stars and forks 👍 🍻 .

Installation

The DeepOD framework can be installed via:

pip install deepod

install a developing version (strongly recommend)

git clone https://github.com/xuhongzuo/DeepOD.git
cd DeepOD
pip install .

Usages

Directly use detection models in DeepOD:

DeepOD can be used in a few lines of code. This API style is the same with Sklean and PyOD.

for tabular anomaly detection:

# unsupervised methods
from deepod.models.tabular import DeepSVDD
clf = DeepSVDD()
clf.fit(X_train, y=None)
scores = clf.decision_function(X_test)

# weakly-supervised methods
from deepod.models.tabular import DevNet
clf = DevNet()
clf.fit(X_train, y=semi_y) # semi_y uses 1 for known anomalies, and 0 for unlabeled data
scores = clf.decision_function(X_test)

# evaluation of tabular anomaly detection
from deepod.metrics import tabular_metrics
auc, ap, f1 = tabular_metrics(y_test, scores)

for time series anomaly detection:

# time series anomaly detection methods
from deepod.models.time_series import TimesNet
clf = TimesNet()
clf.fit(X_train)
scores = clf.decision_function(X_test)

# evaluation of time series anomaly detection
from deepod.metrics import ts_metrics
from deepod.metrics import point_adjustment # execute point adjustment for time series ad
eval_metrics = ts_metrics(labels, scores)
adj_eval_metrics = ts_metrics(labels, point_adjustment(labels, scores))

Testbed usage:

Testbed contains the whole process of testing an anomaly detection model, including data loading, preprocessing, anomaly detection, and evaluation.

Please refer to testbed/

  • testbed/testbed_unsupervised_ad.py is for testing unsupervised tabular anomaly detection models.
  • testbed/testbed_unsupervised_tsad.py is for testing unsupervised time-series anomaly detection models.

Key arguments:

  • --input_dir: name of the folder that contains datasets (.csv, .npy)
  • --dataset: "FULL" represents testing all the files within the folder, or a list of dataset names using commas to split them (e.g., "10_cover*,20_letter*")
  • --model: anomaly detection model name
  • --runs: how many times running the detection model, finally report an average performance with standard deviation values

Example:

  1. Download ADBench datasets.
  2. modify the dataset_root variable as the directory of the dataset.
  3. input_dir is the sub-folder name of the dataset_root, e.g., Classical or NLP_by_BERT.
  4. use the following command in the bash
cd DeepOD
pip install .
cd testbed
python testbed_unsupervised_ad.py --model DeepIsolationForest --runs 5 --input_dir ADBench

Implemented Models

Tabular Anomaly Detection models:

Model Venue Year Type Title
Deep SVDD ICML 2018 unsupervised Deep One-Class Classification [1]
REPEN KDD 2018 unsupervised Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection [2]
RDP IJCAI 2020 unsupervised Unsupervised Representation Learning by Predicting Random Distances [3]
RCA IJCAI 2021 unsupervised RCA: A Deep Collaborative Autoencoder Approach for Anomaly Detection [4]
GOAD ICLR 2020 unsupervised Classification-Based Anomaly Detection for General Data [5]
NeuTraL ICML 2021 unsupervised Neural Transformation Learning for Deep Anomaly Detection Beyond Images [6]
ICL ICLR 2022 unsupervised Anomaly Detection for Tabular Data with Internal Contrastive Learning [7]
DIF TKDE 2023 unsupervised Deep Isolation Forest for Anomaly Detection [19]
SLAD ICML 2023 unsupervised Fascinating Supervisory Signals and Where to Find Them: Deep Anomaly Detection with Scale Learning [20]
DevNet KDD 2019 weakly-supervised Deep Anomaly Detection with Deviation Networks [8]
PReNet KDD 2023 weakly-supervised Deep Weakly-supervised Anomaly Detection [9]
Deep SAD ICLR 2020 weakly-supervised Deep Semi-Supervised Anomaly Detection [10]
FeaWAD TNNLS 2021 weakly-supervised Feature Encoding with AutoEncoders for Weakly-supervised Anomaly Detection [11]
RoSAS IP&M 2023 weakly-supervised RoSAS: Deep semi-supervised anomaly detection with contamination-resilient continuous supervision [21]

Time-series Anomaly Detection models:

Model Venue Year Type Title
DCdetector KDD 2023 unsupervised DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly Detection [14]
TimesNet ICLR 2023 unsupervised TIMESNET: Temporal 2D-Variation Modeling for General Time Series Analysis [13]
AnomalyTransformer ICLR 2022 unsupervised Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy [12]
NCAD IJCAI 2022 unsupervised Neural Contextual Anomaly Detection for Time Series [16]
TranAD VLDB 2022 unsupervised TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data [15]
COUTA arXiv 2022 unsupervised Calibrated One-class Classification for Unsupervised Time Series Anomaly Detection [18]
USAD KDD 2020 unsupervised USAD: UnSupervised Anomaly Detection on Multivariate Time Series
DIF TKDE 2023 unsupervised Deep Isolation Forest for Anomaly Detection [19]
TcnED TNNLS 2021 unsupervised An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series [17]
Deep SVDD (TS) ICML 2018 unsupervised Deep One-Class Classification [1]
DevNet (TS) KDD 2019 weakly-supervised Deep Anomaly Detection with Deviation Networks [8]
PReNet (TS) KDD 2023 weakly-supervised Deep Weakly-supervised Anomaly Detection [9]
Deep SAD (TS) ICLR 2020 weakly-supervised Deep Semi-Supervised Anomaly Detection [10]

NOTE:

  • For Deep SVDD, DevNet, PReNet, and DeepSAD, we employ network structures that can handle time-series data. These models' classes have a parameter named network in these models, by changing it, you can use different networks.
  • We currently support 'TCN', 'GRU', 'LSTM', 'Transformer', 'ConvSeq', and 'DilatedConv'.

Citation

If you use this library in your work, please cite this paper:

Hongzuo Xu, Guansong Pang, Yijie Wang and Yongjun Wang, "Deep Isolation Forest for Anomaly Detection," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2023.3270293.

You can also use the BibTex entry below for citation.

@ARTICLE{xu2023deep,
   author={Xu, Hongzuo and Pang, Guansong and Wang, Yijie and Wang, Yongjun},
   journal={IEEE Transactions on Knowledge and Data Engineering},
   title={Deep Isolation Forest for Anomaly Detection},
   year={2023},
   volume={},
   number={},
   pages={1-14},
   doi={10.1109/TKDE.2023.3270293}
}

Star History

Current stars:

GitHub Repo stars

https://api.star-history.com/svg?repos=xuhongzuo/DeepOD&type=Date

Reference

[1](1, 2) Ruff, Lukas, et al. "Deep one-class classification." ICML. 2018.
[2]Pang, Guansong, et al. "Learning representations of ultrahigh-dimensional data for random distance-based outlier detection". KDD (pp. 2041-2050). 2018.
[3]Wang, Hu, et al. "Unsupervised Representation Learning by Predicting Random Distances". IJCAI (pp. 2950-2956). 2020.
[4]Liu, Boyang, et al. "RCA: A Deep Collaborative Autoencoder Approach for Anomaly Detection". IJCAI (pp. 1505-1511). 2021.
[5]Bergman, Liron, and Yedid Hoshen. "Classification-Based Anomaly Detection for General Data". ICLR. 2020.
[6]Qiu, Chen, et al. "Neural Transformation Learning for Deep Anomaly Detection Beyond Images". ICML. 2021.
[7]Shenkar, Tom, et al. "Anomaly Detection for Tabular Data with Internal Contrastive Learning". ICLR. 2022.
[8](1, 2) Pang, Guansong, et al. "Deep Anomaly Detection with Deviation Networks". KDD. 2019.
[9](1, 2) Pang, Guansong, et al. "Deep Weakly-supervised Anomaly Detection". KDD. 2023.
[10](1, 2) Ruff, Lukas, et al. "Deep Semi-Supervised Anomaly Detection". ICLR. 2020.
[11]Zhou, Yingjie, et al. "Feature Encoding with AutoEncoders for Weakly-supervised Anomaly Detection". TNNLS. 2021.
[12]Xu, Jiehui, et al. "Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy". ICLR, 2022.
[13]Wu, Haixu, et al. "TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis". ICLR. 2023.
[14]Yang, Yiyuan, et al. "DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly Detection". KDD. 2023
[15]Tuli, Shreshth, et al. "TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data". VLDB. 2022.
[16]Carmona, Chris U., et al. "Neural Contextual Anomaly Detection for Time Series". IJCAI. 2022.
[17]Garg, Astha, et al. "An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series". TNNLS. 2021.
[18]Xu, Hongzuo et al. "Calibrated One-class Classification for Unsupervised Time Series Anomaly Detection". arXiv:2207.12201. 2022.
[19](1, 2) Xu, Hongzuo et al. "Deep Isolation Forest for Anomaly Detection". TKDE. 2023.
[20]Xu, Hongzuo et al. "Fascinating supervisory signals and where to find them: deep anomaly detection with scale learning". ICML. 2023.
[21]Xu, Hongzuo et al. "RoSAS: Deep semi-supervised anomaly detection with contamination-resilient continuous supervision". IP&M. 2023.