jadore801120/attention-is-all-you-need-pytorch

why dropping last example with patch_src&patch_trg function @train.py

pluspluswu opened this issue · 1 comments

hi,
I'm trying to train this model on my own dataloader, but when my data iter get batch first data of size [batch_size, seq_len ], function patch_trg @train.py returns trg_seq /with size [seq_len, batch_size - 1]/ and gold /*with size seq_len (batch(size-1)/; I can take the transpose to better fit matrix calcluate. BUT, why does this function drop last example for each batch ? or am I wrong with the usage of these patch functions?

def patch_src(src, pad_idx):
    src = src.transpose(0, 1)
    return src
def patch_trg(trg, pad_idx):
    trg = trg.transpose(0, 1)
    trg, gold = trg[:, :-1], trg[:, 1:].contiguous().view(-1)
    return trg, gold

perhaps, is it possible that i should do this change to better fit my own data which is batch first?

def new_patch_trg(trg, pad_idx):
    trg, gold = trg[:, :-1], trg[:, 1:].contiguous().view(-1)
    trg = trg.transpose(0, 1)
    return trg, gold

Thanks

Explanation is here