EdinburghNLP/nematus

translate.py outputs probabilities greater than one

Avmb opened this issue · 4 comments

Avmb commented

python $nematus/translate.py
-m $prefix.dev.npz
-i $file_base.$src -o $file_base.$src.output.dev -k 1 -n -p 5 --suppress-unk --print-word-probabilities

results in something like:

ein Kampf der Republikaner gegen die Wiederwahl Obamas
1.98620128632 0.375202327967 0.935490012169 0.990142166615 5.79434633255 0.540984451771 0.961822271347 1.74049687386 0.97704654932

Any idea why this happens? Is it related to the length normalization?

Length normalization doesn't affect the word probabilities. This is strange, and printing the vectors before and after softmax (next_probs and logit on line 531) should help see what's going on.

Avmb commented

Apparently it's some bug in Theano implementation of softmax.

Changing

next_probs = tensor.nnet.softmax(logit)

to

logit_exp = tensor.exp(logit - logit.max(axis=1, keepdims=True))
next_probs = logit_exp / logit_exp.sum(axis=1, keepdims=True)

solves the issue, even though this is supposed to be the reference implementation of softmax according to Theano documentation. I've also checked the logit values and they don't seem particularly large or small. Go figure.

hm, can you quickly check if you encounter the same bug on CPU, or when CUDNN is disabled (with dnn.enabled=False)? Just want to be sure whether it is a Theano or a CUDNN bug (if it is the latter, we can try upgrading to the newest version on our machines).

If I understood you correctly, this seems to be a problem in Theano 0.9.0, but fixed in Theano 0.10.b2. And it only affects the new CUDA backend, but not the CPU version or the old backend. Reopen this if this isn't correct.