happy-little-zhang/Bit-Scanner

dataset

cailv opened this issue · 3 comments

cailv commented

Can you share data sets with different types of attacks? I will cite your paper

Hello, friends! I appreciate your interest in my work. I have also experienced the challenges of attack simulations. I hope the following description will be helpful to you.

We have not provided a detailed description of the dataset used in the tests for the following reasons: the synthetic abnormal data used in the evaluation is not fixed and unchanging. We intentionally introduce random characteristics to simulate different types of attacks. As a result, each researcher who replicates our work will obtain a unique dataset of anomalies. Furthermore, the quantity of synthetic abnormal data is quite large, reaching hundreds of gigabytes, which would make it inconvenient to include in our test platform.

In our source code, there are functions for generating synthetic abnormal data. To access them, you can set the option "DELETE_TEMP_FILE" to "0" in the file "common/common.h". Afterward, simply run the main function, and the program will store the synthetic abnormal data, labels, and ground truth.

Please note that running the program with the current parameter configuration may take a long time and require significant storage. If you would like to conduct a quick demonstration test, you can set related parameters to their minimum values in the file "common/common.h", such as setting "MAX_FILE_NUM" to "1" and "ATTACK_NUM" to "1".

If you are only interested in the synthetic abnormal data, you can set the option "MAX_SLIDING_WINDOW_SIZE" to "1".

To run the program, follow these steps:

Step 1: Place the raw data files (001.csv~035.csv) in the folder/dataset/raw.

Step 2: Uncomment several functions in the main.cpp file:

split_data_to_train_and_test();
global_model_train_and_model_detect_attack_free();
global_model_detect_under_various_attack();
Step 3: Run the program.

By performing these steps, you will be able to execute the program and observe the results related to the synthetic abnormal data.