numenta/NAB

Issue in interpretaion of labels

enthu-sh opened this issue · 1 comments

I want to use realAWS dataset for anomaly detection. But the labels for anomalies are not clear. In the labels folder, there are two files: combined_labels.json and combined_windows.json. In these two files for the system the entries do not match. For example:

"realAWSCloudwatch/ec2_cpu_utilization_24ae8d.csv": [
[
"2014-02-26 13:45:00.000000",
"2014-02-27 06:25:00.000000"
],
[
"2014-02-27 08:55:00.000000",
"2014-02-28 01:35:00.000000"
]
],

It is the entry in combined_windows.json and

"realAWSCloudwatch/ec2_cpu_utilization_24ae8d.csv": [
"2014-02-26 22:05:00",
"2014-02-27 17:15:00"
],

this is the entry in combined_labels.json.

Why is there such a mismatch. Which file is correct to be used?

The labels represent individual ranges labeled by humans.

NAB uses anomaly windows for scoring because the anomalies are temporal and can span a period of time. These windows are in combined_windows.json and calculated from the individual labels as described in the appendix 'Appendix B: Label combining algorithm' in the NAB whitepaper

Sometimes if two labels are close together, they will be combined into one window, so that might have happened here.