DeepGraphLearning/KnowledgeGraphEmbedding

Why do you separate negative head samples and negative tail samples?

renli1024 opened this issue · 5 comments

Thanks first for such a good job.

I observe in code that you implement two data iterators named train_dataloader_head and train_dataloader_tail, which respectively generate negative head samples and negative tail samples. And when training, these two iterators are alternatively fed into the model. If what I understand above is right, the model will train one positive sample twice, respectively for neg head and neg tail samples. I want to know why you do negative sampling this way, instead train the neg head and neg tail samples together and back propagate one positive sample once, which I think is a more intuitive way?

Thanks a lot for your reply.

Hi, thanks for the good question.

This separation is merely an easier way to implement negative sampling. What you proposed is also a valid way for negative sampling, but I haven't tried it. Therefore, maybe the method you described will have higher performance than the current implementation and I believe it's worth a try.

Have you observed any differences in implementing only one compared to the other? The Local Closed-World Assumption (LCWA) only holds for what you call the tail-batch method of generating false triples, and the head-batch method may not be a justified way of generating false triples. (It may be that that problem is taken care of by the self-adversarial negative sampling weight attached to each sample though)

Hi, thanks for the good question. We use both head-batch and tail-batch only because this is how the MRR calculated. If MRR is only calculated for the tail-batch (following the LCWA), then I think it's reasonable to only train the model on the tail-batch.

Thanks for the interesting detail. However, your recent ACL 2020 publication on re-evaluating baselines seems to refer to only tail-batch corrupted triples for ranking in the notation. Is that correct? So for that did you use only tail-batch corruption during training?

No, although we might only describe the case for tail-batch, both head-batch and tail-batch are evaluated in that paper. Because head-batch is just tail-batch with the reversed relations.