DeepOD is an open-source python library for Deep Learning-based Outlier Detection
and Anomaly Detection. DeepOD supports tabular anomaly detection and time-series anomaly detection.
DeepOD includes 27 deep outlier detection / anomaly detection algorithms (in unsupervised/weakly-supervised paradigm).
More baseline algorithms will be included later.
DeepOD is featured for:
Unified APIs across various algorithms.
SOTA models includes reconstruction-, representation-learning-, and self-superivsed-based latest deep learning methods.
Comprehensive Testbed that can be used to directly test different models on benchmark datasets (highly recommend for academic research).
Versatile in different data types including tabular and time-series data (DeepOD will support other data types like images, graph, log, trace, etc. in the future, welcome PR 🔭).
Diverse Network Structures can be plugged into detection models, we now support LSTM, GRU, TCN, Conv, and Transformer for time-series data. (welcome PR as well ✨)
If you are interested in our project, we are pleased to have your stars and forks 👍 🍻 .
Installation
The DeepOD framework can be installed via:
pip install deepod
install a developing version (strongly recommend)
git clone https://github.com/xuhongzuo/DeepOD.git
cd DeepOD
pip install .
Usages
Directly use detection models in DeepOD:
DeepOD can be used in a few lines of code. This API style is the same with Sklean and PyOD.
for tabular anomaly detection:
# unsupervised methodsfromdeepod.models.tabularimportDeepSVDDclf=DeepSVDD()
clf.fit(X_train, y=None)
scores=clf.decision_function(X_test)
# weakly-supervised methodsfromdeepod.models.tabularimportDevNetclf=DevNet()
clf.fit(X_train, y=semi_y) # semi_y uses 1 for known anomalies, and 0 for unlabeled datascores=clf.decision_function(X_test)
# evaluation of tabular anomaly detectionfromdeepod.metricsimporttabular_metricsauc, ap, f1=tabular_metrics(y_test, scores)
for time series anomaly detection:
# time series anomaly detection methodsfromdeepod.models.time_seriesimportTimesNetclf=TimesNet()
clf.fit(X_train)
scores=clf.decision_function(X_test)
# evaluation of time series anomaly detectionfromdeepod.metricsimportts_metricsfromdeepod.metricsimportpoint_adjustment# execute point adjustment for time series adeval_metrics=ts_metrics(labels, scores)
adj_eval_metrics=ts_metrics(labels, point_adjustment(labels, scores))
Testbed usage:
Testbed contains the whole process of testing an anomaly detection model, including data loading, preprocessing, anomaly detection, and evaluation.
Please refer to testbed/
testbed/testbed_unsupervised_ad.py is for testing unsupervised tabular anomaly detection models.
testbed/testbed_unsupervised_tsad.py is for testing unsupervised time-series anomaly detection models.
Key arguments:
--input_dir: name of the folder that contains datasets (.csv, .npy)
--dataset: "FULL" represents testing all the files within the folder, or a list of dataset names using commas to split them (e.g., "10_cover*,20_letter*")
--model: anomaly detection model name
--runs: how many times running the detection model, finally report an average performance with standard deviation values
For Deep SVDD, DevNet, PReNet, and DeepSAD, we employ network structures that can handle time-series data. These models' classes have a parameter named network in these models, by changing it, you can use different networks.
We currently support 'TCN', 'GRU', 'LSTM', 'Transformer', 'ConvSeq', and 'DilatedConv'.
Citation
If you use this library in your work, please cite this paper:
Hongzuo Xu, Guansong Pang, Yijie Wang and Yongjun Wang, "Deep Isolation Forest for Anomaly Detection," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2023.3270293.
You can also use the BibTex entry below for citation.
@ARTICLE{xu2023deep,
author={Xu, Hongzuo and Pang, Guansong and Wang, Yijie and Wang, Yongjun},
journal={IEEE Transactions on Knowledge and Data Engineering},
title={Deep Isolation Forest for Anomaly Detection},
year={2023},
volume={},
number={},
pages={1-14},
doi={10.1109/TKDE.2023.3270293}
}
Pang, Guansong, et al. "Learning representations of ultrahigh-dimensional data for random distance-based outlier detection". KDD (pp. 2041-2050). 2018.