The loss is nan

Question

The loss is nan

rabbicat30 opened this issue 2 years ago · 1 comments

In some cases, such as the sequence length is short or the value of mask_prob is small, there will be a situation where the whole training sequence is not masked, and the loss at this time will be the value of nan, how to solve this situation? I don't want the loss to be a nan value, can I only adjust the value of prob?

Answer 1 · 2023-05-22T06:00:20.000Z

Because I want to do a comparison experiment, I want to make sure the value of mask_prob is the same