Tensorflow version: 1.15.0
Python 2.7.18
cd dp_sgd
If below code reports error "IOError: [Errno 2] No such file or directory: '../datasets/hdfs/normalBlk_allREs_10w.logSeqs'", just repeatedly run it several times and it'll find the data file.
python train_dp.py
python train_dp.py --dpsgd True --noise_multiplier 1
Training a model will automatically create a folder under "../models/hdfs" with a filename format similar to:
dp_clip1.0_delta1e-05_sigma1.0_clip1.0
(Training a DP model might take hours, you can use the pre-trained models for testing, which are available at "../models/hdfs/"):
After a model is trained, suppose the model name is "../models/hdfs/dp_clip1.0_delta1e-05_sigma1.0_clip1.0/epoch100_validLoss0.507_validAcc88.11_eps0.9585177585005272", to test this model and save the per-log detection results to "detection_result_of_new_model":
Because the entire test dataset is very big, you could vary "--sampleRate" to sample a small dataset for testing, for example, 0.001/0.01/0.1/1.0.
python load_and_test.py --sampleRate 1 --model ../models/hdfs/dp_clip1.0_delta1e-05_sigma1.0_clip1.0/epoch100_validLoss0.507_validAcc88.11_eps0.9585177585005272 > detection_result_of_new_model
For the above generated per-log detection result "detection_result_of_new_model", get the topK resutls under different threshold k:
python calcAcc.py detection_result_of_new_model
#################################################################################
It contains parsed HDFS log dataset used for training and testing.
It contains all the code.
It could be unpacked using "tar -xzvf final_results.tar.gz" to be foler "./final_results", inside this folder there are the following result files.
results_*******: They are the per-log prediction result files by LSTM, sample0.1 or sample1 indicates the sample rate of the total test log, sample1 means all test data are used for detection.
calcAcc.py: It reads all result files having sample1, process the top-k results and produce the anomaly detection metrics under different top-k criteria, stored in file "final_full_metrics"
drawFig.py: It reads the full metrics from final_full_metrics, and generate two experimental figures in the paper: fpfn.pdf and fm.pdf.
stores previously trained models; unpack it using the command "tar -xzvf models.tar.gz", and then you could use all models inside folder "./models/hdfs/" for testing. Note that the unpacked folder could be up to 2GB.