The dataset in train_water.py

Question

The dataset in train_water.py

Closed this issue 2 years ago · 1 comments

Hi, the dataset of train GANF in train_water.py is SWaT_Dataset_Attack_v0.csv. When running train_water.py, SWaT_Dataset_Attack_v0.csv is splitted train/val/test dataloader. I can't understand why this model was trained in SWaT_Dataset_Attack_v0.csv. I think this model is more reasonable to train on SWaT_Dataset_Normal_v1.csv that is not attacked, and to test on SWaT_Dataset_Attack_v0.csv. I think this training method will make the attacked points more likely to be located in areas of low probability density. Thank you very much!

Answer 1 · 2022-08-30T19:43:19.000Z

Hi
In the real-world applications, the anomalies are generally mixed with the normal points. Thus, it is more realistic to utilize the dataset that contain both normal points and a small fraction of anomalies as the training set.