S4 Listops have nan loss

Question

S4 Listops have nan loss

lthilnklover opened this issue 7 months ago · 3 comments

First of all, thank you for the comprehensive code base for all variants of S4 models.

However, as I try to run the Listops experiments with S4 (HYYT version), the losses for train, test and val all become nan after 1 epoch.

I ran the following script:

python -m train experiment=lra/s4-listops wandb=null

The final accuracy is also way below the reported accuracy (train=0.17).

Is there something that I have done wrong..?

Answer 1 · 2024-03-24T07:28:09.000Z

I came across the same problem and decreasing learning rate by 10 cannot solve this problem.

Answer 2 · 2024-06-13T16:05:35.000Z

Same problem here. I am using a completely different dataset for audio processing. I extracted the S4ND and S4 layers into a different neural network architecture and I also got NaN after one epoch because the self.log_dt in SSKernelNPLR is nan. This must have happened during backpropagation because it is not updated otherwise (I believe)?

Answer 3 · 2024-08-21T20:04:50.000Z

Sorry for not responding to this. I don't know why this is happening. I haven't revisited these experiments in a long time, but I'm quite confident that they were reproducible in the past. Perhaps something has changed in the libraries or perhaps there are some numerical issues on certain hardware