Labels(ground truth) mismatch
pramitchoudhary opened this issue · 2 comments
Just to be sure on the reported ground truth. There seem to be some confusion in regards to which is the recommended .json file to use for Ground Truth Labels
- combined_labels.json
or - known_labels_v1.0.json
The difference noted so far:
For realAdExchange/exchange-2_cpc_results.csv: As per data description does not contain any outliers or anomalies; however combined_labels.json
seems to report one.
Any suggestions or recommendations?
I believe the description in the README is incorrect. If you look at the raw human labels, two of the three labelers noted an anomaly in the same spot.
I'm also not sure why it's in the known_labels_v1.0.json file. The only datasets which we knew labels beforehand are in the realKnownCause and artificial subdirectories.
Thanks for the catch - to avoid future confusion, we should fix the README and the json file.
Thanks for the update @subutai.
So to be sure, you would suggest using combined_labels.json
.
Will it be possible to confirm the reported anomalies for other datasets such as exchange-3_cpc_results.csv
as detailed in combined_labels.json
?