Asking questions about the fairness of data augmentation
dbbice opened this issue · 2 comments
Thank you for your work. I read your work carefully, analyzed the bias of the KGC task from a new perspective, and used KG-Mixup for data augmentation.
I have a question. In the mixing criteria, we select triples that share the tail entity to be predicted as candidate triples for mixing. This means that we know in advance what the tail entity is. Will this cause data leakage or be unfair?
Looking forward to your reply.
Hi,
Because the augmentation is only done on the training data there is no leakage. We already know all the positive training samples from beforehand. Also, since we are mixing the tail entity with another entity, the result can be very different from what the model has seen before.
Furthermore, it may be helpful to image an extreme scenario where the random value
Regards,
Harry
Thank you for your reply. I have understood the content of the paper.