translate.py outputs probabilities greater than one

Question

translate.py outputs probabilities greater than one

Avmb opened this issue 7 years ago · 4 comments

python $nematus/translate.py
-m $prefix.dev.npz
-i $file_base.$src -o $file_base.$src.output.dev -k 1 -n -p 5 --suppress-unk --print-word-probabilities

results in something like:

ein Kampf der Republikaner gegen die Wiederwahl Obamas
1.98620128632 0.375202327967 0.935490012169 0.990142166615 5.79434633255 0.540984451771 0.961822271347 1.74049687386 0.97704654932

Any idea why this happens? Is it related to the length normalization?

Answer 1 · 2017-09-14T19:44:03.000Z

Length normalization doesn't affect the word probabilities. This is strange, and printing the vectors before and after softmax (next_probs and logit on line 531) should help see what's going on.

Answer 2 · 2017-09-14T21:25:34.000Z

Apparently it's some bug in Theano implementation of softmax.

Changing

next_probs = tensor.nnet.softmax(logit)

to

logit_exp = tensor.exp(logit - logit.max(axis=1, keepdims=True))
next_probs = logit_exp / logit_exp.sum(axis=1, keepdims=True)

solves the issue, even though this is supposed to be the reference implementation of softmax according to Theano documentation. I've also checked the logit values and they don't seem particularly large or small. Go figure.

Answer 3 · 2017-09-15T07:57:32.000Z

hm, can you quickly check if you encounter the same bug on CPU, or when CUDNN is disabled (with dnn.enabled=False)? Just want to be sure whether it is a Theano or a CUDNN bug (if it is the latter, we can try upgrading to the newest version on our machines).

Answer 4 · 2017-09-19T18:39:38.000Z

If I understood you correctly, this seems to be a problem in Theano 0.9.0, but fixed in Theano 0.10.b2. And it only affects the new CUDA backend, but not the CPU version or the old backend. Reopen this if this isn't correct.