/PUMAD

Project - Semi-supervised learning Anomaly detection in Multivariate Timeseries

PU Learning for Multivariate Timeseries Anomaly Detection

Author: Wooheon Hong, Minsoo Kim with Samsung Electronics

Date: 2021.05.10 ~ 2022. 04. 29

This is the pytorch implementation of the PUMAD

pumad

Requirements

  • python 3.8
  • see requirements.txt
conda create –n env_name python=3.8
conda activate env_name
pip install --upgrade pip
pip install -r requirements.txt

Datasets

X: input data [ndarray] *shape (data, # of features, window length) 
y: input labels [ndarray] *shape : (data) 

Train & Evaluation

Measure the AUC at each epochs

python train.py --device <gpu number> --dataset <name of .npy file> --data_path <location of files> --window <length of sliding window> 

The detailed descriptions about the parameters are as following: Note : There is

Parameter name Description of parameter
device device that the code works on, 'cpu' or 'cuda:#'
data_name file name of input .npy
data_path train data and test data path
window length of sliding window, default 10
positive_ratio labeled anomalies ratio, default 1
n_bits # of bits of each hash tables, default 5
n_hash_tables # of hash tables, default 5
model embedding model, default "wavenet"
n_epochs epoch size during training, default 10
n_batch batch size, default 512
lr learning rate, default 1e-4
hidden_size hyper parameter of hidden layers channel, default 128
margin margin of triplet loss, default 2

Evaluation

Evaluation metrics are precision, recall and F1-score.

Saliency map

python test.py save/[CHECKPOINT].pth
python heatmap.py --savedir save --dataset unsw-nb15 --window_size 10 --plotting

Run test.py first, then run heatmap.py.
The parameters of the test.py are same as train.py, except the checkpoint path.
The parameters of the heatmap.py are:
heatmap.py [-h] [--save_dir SAVE_DIR] [--dataset DATASET] [--window_size WINDOW_SIZE] [--_class {,_normal,_anomaly}] [--plotting] \

Code Structure

├── heatmap.py
├── model.py
├── puamd_structure.PNG
├── README.md
├── requirements.txt
├── save
├── test.py
├── train.py
└── utils
    ├── utils_dataset.py
    ├── utils_dhf.py
    └── utils.py

model.py

Embedding Model : WaveNet

train.py

A file with the main function has an argument setting and a training function.

utils_dataset.py

class TripletDataset

Make dataset for PUMAD

  • Parameters

    • X_data : [ndarray] *shape (data, # of features, window length)
    • y_data : [ndarray] *shape : (data)
    • n_htables : # of the hash tables in DHF
    • n_bits : # of bits in hash
    • transform : embedding model
  • Outputs

    • (anchor, pos, neg) : tuple of anchor tensor, positive tensor, negative tensor

class TestDataset

Pytorch Dataset class for test dataset

function get_preprocessed_data

Function for creating data with diverse experiment settings

  • Parameters

    • X_train: input train data [ndarray]
    • y_train: input train labels [ndarray]
    • X_test: input test data [ndarray]
    • y_test: input test labels [ndarray]
    • n_labels: # of partially observed anomalies [int]
  • Outputs

    • preprocessed_data : [list] *[X_train, y_train, y_train_pu, X_test, y_test]

utils_dhf.py

function dhf

It is a function of finding reliable normal and potentail abnormal through Distance Hashing based Filtering(DHF) with embedded data.

  • Parameters

    • embedded_anom : embedding anomaly data
    • embedded_noise : embedding unlabeled data
    • n_htables : # of the hash tables in DHF
    • n_bits : # of bits in hash
  • Outputs

    • reliable normal index
    • potential abnormal index
    • avg_n_buckets : average for the # of hash table buckets