state-spaces/s4

S4 Listops have nan loss

lthilnklover opened this issue · 3 comments

First of all, thank you for the comprehensive code base for all variants of S4 models.

However, as I try to run the Listops experiments with S4 (HYYT version), the losses for train, test and val all become nan after 1 epoch.

I ran the following script:

python -m train experiment=lra/s4-listops wandb=null

The final accuracy is also way below the reported accuracy (train=0.17).

Is there something that I have done wrong..?

I came across the same problem and decreasing learning rate by 10 cannot solve this problem.

Same problem here. I am using a completely different dataset for audio processing. I extracted the S4ND and S4 layers into a different neural network architecture and I also got NaN after one epoch because the self.log_dt in SSKernelNPLR is nan. This must have happened during backpropagation because it is not updated otherwise (I believe)?

Sorry for not responding to this. I don't know why this is happening. I haven't revisited these experiments in a long time, but I'm quite confident that they were reproducible in the past. Perhaps something has changed in the libraries or perhaps there are some numerical issues on certain hardware