numenta/NAB

Labels(ground truth) mismatch

pramitchoudhary opened this issue · 2 comments

Just to be sure on the reported ground truth. There seem to be some confusion in regards to which is the recommended .json file to use for Ground Truth Labels

  1. combined_labels.json
    or
  2. known_labels_v1.0.json

The difference noted so far:
For realAdExchange/exchange-2_cpc_results.csv: As per data description does not contain any outliers or anomalies; however combined_labels.json seems to report one.

Any suggestions or recommendations?

I believe the description in the README is incorrect. If you look at the raw human labels, two of the three labelers noted an anomaly in the same spot.

I'm also not sure why it's in the known_labels_v1.0.json file. The only datasets which we knew labels beforehand are in the realKnownCause and artificial subdirectories.

Thanks for the catch - to avoid future confusion, we should fix the README and the json file.

Thanks for the update @subutai.
So to be sure, you would suggest using combined_labels.json.
Will it be possible to confirm the reported anomalies for other datasets such as exchange-3_cpc_results.csv as detailed in combined_labels.json?