Duplicated records in Harm-P dataset
Closed this issue · 1 comments
mingshanhee commented
Hi authors, thanks for releasing the dataset. I am keen in using the Harm-C and Harm-P dataset for my own research experiments. However, I found out that the Harm-P dataset contains many duplicates across the training, validation and test dataset splits. To be precise, there are 1489, 5 and 155 duplicates in the training, validation and testing dataset splits (Refer to the screenshots attached).
Hence, can you please check out the dataset via the GitHub's download link and kindly provide an accurate dataset? Thank you in advance!
shiv6891 commented
Hi @mingshanhee, thanks for the heads up. The files are now updated. Please se the udpated readme and repo for further details.