Issue in interpretaion of labels
enthu-sh opened this issue · 1 comments
I want to use realAWS dataset for anomaly detection. But the labels for anomalies are not clear. In the labels folder, there are two files: combined_labels.json and combined_windows.json. In these two files for the system the entries do not match. For example:
"realAWSCloudwatch/ec2_cpu_utilization_24ae8d.csv": [
[
"2014-02-26 13:45:00.000000",
"2014-02-27 06:25:00.000000"
],
[
"2014-02-27 08:55:00.000000",
"2014-02-28 01:35:00.000000"
]
],
It is the entry in combined_windows.json and
"realAWSCloudwatch/ec2_cpu_utilization_24ae8d.csv": [
"2014-02-26 22:05:00",
"2014-02-27 17:15:00"
],
this is the entry in combined_labels.json.
Why is there such a mismatch. Which file is correct to be used?
The labels represent individual ranges labeled by humans.
NAB uses anomaly windows for scoring because the anomalies are temporal and can span a period of time. These windows are in combined_windows.json and calculated from the individual labels as described in the appendix 'Appendix B: Label combining algorithm' in the NAB whitepaper
Sometimes if two labels are close together, they will be combined into one window, so that might have happened here.