Why do you separate negative head samples and negative tail samples?
renli1024 opened this issue · 5 comments
Thanks first for such a good job.
I observe in code that you implement two data iterators named train_dataloader_head
and train_dataloader_tail
, which respectively generate negative head samples and negative tail samples. And when training, these two iterators are alternatively fed into the model. If what I understand above is right, the model will train one positive sample twice, respectively for neg head and neg tail samples. I want to know why you do negative sampling this way, instead train the neg head and neg tail samples together and back propagate one positive sample once, which I think is a more intuitive way?
Thanks a lot for your reply.
Hi, thanks for the good question.
This separation is merely an easier way to implement negative sampling. What you proposed is also a valid way for negative sampling, but I haven't tried it. Therefore, maybe the method you described will have higher performance than the current implementation and I believe it's worth a try.
Have you observed any differences in implementing only one compared to the other? The Local Closed-World Assumption (LCWA) only holds for what you call the tail-batch method of generating false triples, and the head-batch method may not be a justified way of generating false triples. (It may be that that problem is taken care of by the self-adversarial negative sampling weight attached to each sample though)
Hi, thanks for the good question. We use both head-batch and tail-batch only because this is how the MRR calculated. If MRR is only calculated for the tail-batch (following the LCWA), then I think it's reasonable to only train the model on the tail-batch.
Thanks for the interesting detail. However, your recent ACL 2020 publication on re-evaluating baselines seems to refer to only tail-batch corrupted triples for ranking in the notation. Is that correct? So for that did you use only tail-batch corruption during training?
No, although we might only describe the case for tail-batch, both head-batch and tail-batch are evaluated in that paper. Because head-batch is just tail-batch with the reversed relations.