KDD 2021: Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding
InterFusion is an unsupervised MTS anomaly detection and interpretation method. It's core idea is to model the normal patterns of MTS using HVAE with jointly trained hierarchical stochastic latent variables, each of which explicitly learns low-dimensional inter-metric or temporal embeddings. You may refer to our paper for more details.
Clone the repo
git clone https://github.com/zhhlee/InterFusion.git && cd InterFusion
Get data
The datasets used in this paper are in folder data
. You may refer to data/Dataset Description
for more details.
Install dependencies (with python 3.6+)
(virtualenv is recommended)
pip install -r requirements.txt
The code is tested under the following basic environments:
OS: Ubuntu 18.04
GPU: GTX 1080 Ti
Cuda: 9.0.176
Python: 3.6.6
Run the code
Please set the root directory of the project as your Python path.
For dataset ASD and SMD:
python algorithm/stack_train.py --dataset=omi-1 # training
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/ # evaluation
For dataset SWaT and WADI (Note: you need to acquire these datasets first following data/Dataset Description
and explib/raw_data_converter
):
SWaT:
python algorithm/stack_train.py --dataset=SWaT --train.train_start=21600 --train.valid_portion=0.1 --model.window_length=30 '--model.output_shape=[15, 15, 30]' --model.z2_dim=8 # training
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/ --mcmc_track=False # evaluation
WADI:
python algorithm/stack_train.py --dataset=WADI --train.train_start=259200 --train.max_train_size=789371 --train.valid_portion=0.1 --model.window_length=30 '--model.output_shape=[15, 15, 30]' --model.z2_dim=8 # training
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/ --mcmc_track=False # evaluation
The default model configurations are in algorithm/InterFusion.py
, train configs in algorithm/stack_train.py
, and evaluation configs in algorithm/stack_predict.py
. You may overwrite the configs using command line args. For example:
python algorithm/stack_train.py --dataset=omi-1 --model.z_dim=5 --train.batch_size=128
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/ --test_batch_size=100
Run on your own dataset
- Put your train/test/label files under
data/processed
folder. e.g.,ds_train.pkl
,ds_test.pkl
,ds_test_label.pkl
with shape(train_length, feature_dim)
,(test_length, feature_dim)
,(test_length,)
, respectively. - Put the interpretation files (optional) under
data/interpretation_label
folder. - Edit
get_data_dim
inalgorithm/utils.py
to add your dataset info. - Run the code following the instructions above.
Results
After running the algorithm, the results are shown in the results
folder. The main results are:
Model: results/stack_train/result_params/
Training config: results/stack_train/config.json
Testing config: results/stack_predict/config.json
Testing statistics: results/stack_predict/result.json
If you find this code useful for your research, please cite our paper:
@inproceedings{li2021multivariate,
title={Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding},
author={Li, Zhihan and Zhao, Youjian and Han, Jiaqi and Su, Ya and Jiao, Rui and Wen, Xidao and Pei, Dan},
booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
pages={3220--3230},
year={2021}
}