lucidrains/local-attention

A bug of torch.arange for long sequence with fp16 data type

renll opened this issue · 1 comments

renll commented

pytorch/pytorch#81926

This line should use dtype=torch.long to support long sequence:

ticker = torch.arange(t, device=device, dtype=dtype)[None, :]

@renll thanks for identifying this bug!