A question about the data range of negative sampling
renli1024 opened this issue · 1 comments
renli1024 commented
Hi, thanks for such a good job first!
I observe when training, you generate negative samples based on train set, so for triples only appearing in valid or test set, the model will treat them negative and these "false negative" samples will influence the model performance when evaluating. From my opinion, maybe the valid set should be introduced for negative sampling in training?
Thanks for your interpretation.
Edward-Sun commented
Hi Ren,
If the valid set is used for negative sampling in training, this is similar to using valid set as positive samples. This will make an unfair comparison to previous work, which only uses the training set for positive samples.
So we don't use the valid set even in negative sampling.