lisa-lab/pylearn2

NanGuardMode false positive in GPU_mrg_uniform

Opened this issue · 3 comments

TNick commented

This resulted from the discussion on pylearn-users mailing list.

Long story short NanGuardMode detects Nans with every invocation of GPU_mrg_uniform.

@lamblin provided following insight:

In more detail, GPU_mrg_uniform actually has an input, which is a shared variable containing the current state of the random number generator. When computing a new sample, the Op actually generates two outputs: a new value for the state, and the sample itself. There is a mechanism to automatically update the value of the shared state to the new expression every time a function using it is called.

For MRG, the random state is a matrix of int32. However, when using the GPU, we want the state to be kept and updated in GPU memory for speed, but the current GPU back-end of Theano only supports float32 containers. Therefore, Theano cheats by masquerading a matrix of int32 as a matrix of float32, and the random generator itself knows that it should interpret them as int32.

The unfortunate side effect is that it looks like a matrix of float32, and that some legitimate values for int32 will have the same binary representation as a NaN in float32, which is probably what is going on in your case.

TNick commented

What I need to find out is: is the sample output also int32? Or it is a float32?
I mean - is it safe to filter out GPU_mrg_uniform completly or we should filter out only the state (both as input and output)?
From what I understand from GPU_mrg_uniform they are all int32s.

TNick commented

Pull request #1466 fixes this

The sample output is a float32 or float64, so it should not be ignored.