wilile26811249/Fastformer-PyTorch

How to Mask ?

Closed this issue · 3 comments

How to mask the subword & padding infos in this attention if I want to use it in GPT?

Emm,I can't understand that the global_key just mask in decode_dim.what's mean?
And I think the mask method is not effective on sequence tasks.
By the way.Are you sure that the masked_fill will work?

mask_value = torch.finfo(x.dtype).min
...
global_key = p * beta_weight

The Element-wise product will all Nan in result.

@xesdiny
Yes, the result are all NaN.
I fix the error about the implement of the mask part.
Can you check again? Thank you.