Why masking the 1st hop?

Hi,

Thanks for releasing the code! I have a minor question about this piece of code:

multihop_dense_retrieval/mdr/retrieval/criterions.py

Lines 127 to 130 in 62eb242

    
           # mask the 1st hop 
        
           bsize = outputs["q"].size(0) 
        
           scores_1_mask = torch.cat([torch.zeros(bsize, bsize), torch.eye(bsize)], dim=1).to(outputs["q"].device) 
        
           scores_1_hop = scores_1_hop.float().masked_fill(scores_1_mask.bool(), float('-inf')).type_as(scores_1_hop)

I'm wondering what's the purpose of masking the 1st hop? Does it help the final experimental results? Thanks!

Hi @yangky11, the reason behind this was to avoid labeling the 2-hop supporting passage as negatives. Sometimes, the hop order might not be obvious and this is especially true for comparison questions. This gave some improvements on some initial experiments.

That makes sense. Thank you!

	# mask the 1st hop
	bsize = outputs["q"].size(0)
	scores_1_mask = torch.cat([torch.zeros(bsize, bsize), torch.eye(bsize)], dim=1).to(outputs["q"].device)
	scores_1_hop = scores_1_hop.float().masked_fill(scores_1_mask.bool(), float('-inf')).type_as(scores_1_hop)