Large difference of inference result between forward and step

Question

Large difference of inference result between forward and step

billshoo opened this issue 7 months ago · 0 comments

Albert -

Thank you for the wonderful S4 model you've invented and kept improving.

I am getting very big difference of inference result between forward() and step(), for models trained with parameters like this:

Depth: 20-30
kernel_size: 400-800
mode_init: 'diag-inv'
discretization: 'zoh'
ar_transform: 'softplus'
dt_transform: 'relu'

My test sequences are time series of tens of millions of time steps and I just keep running step() function on it, one step at a time. And I just see its predictions deviates from forward more and more, and starts to loose predictiveness altogether. However, the forward maintain its predictiveness with a fixed receptive field of roughly 25 (depth) x 500 (kernel size).

I wonder:

Is there any diagnostics to make sure I am not having any bugs?

E.g. for a model of kernel size 500, I've tried to verify that the 500th output of step() matches the forward() output. As my mental model is that the step() function is effectively having a variable receptive field that gets longer and longer, while the forward() function is having a fixed kernel size that cuts off at 500. This method seems to only work for depth=1, since at depth 2, a step function is already facing inputs from the previous layer's step function where the receptive field is variable, depending on its position in the sequence. In contrast, if I use forward, every layer's receptive field will be fixed at 500.

If the difference turn out to be real, Is there anything I can do to promote forward-step agreement? The step() inference have huge performance advantage, for my use case. Since the real part of the diag-inv matrix has been initialized at -1/2, and softplus constraint is imposed on the real part of the diagonal elements of the matrix A during training, I don't get how it can go unstable in autoregressive generation.

Best Rgds,
Bill