beam_attention_decoder ValueError
jmugan opened this issue · 7 comments
Cool code. In beam_attention_decoder, I get an error at line 615
s = math_ops.reduce_sum(
v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])
It looks like maybe y
takes the beams into account but hidden_features[a]
does not.
The stack trace looks like
File "/Users/jmugan/Box Sync/workspace/DeepLearning/Git/21CT_Translate/ObjectTranslatorTF/ncm_seq2seq.py", line 615, in attention
v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 518, in binary_op_wrapper
return func(x, y, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 44, in add
return _op_def_lib.apply_op("Add", x=x, y=y, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2156, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1612, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1390, in _BroadcastShape
% (shape_x, shape_y))
ValueError: Incompatible shapes for broadcasting: (32, 2, 1, 300) and (320, 1, 1, 300)
Did you do it with ubuntu dataset ?
You use beam decoder only for evaluation
Different dataset. Thanks for the response. I'll try it on just the evaluation.
Sure. I try it in the next few days and let you know how it comes out.
How do you tell which of the beams returned the best sequence? In line 253 of neural_conversation_model.py
the output to step
is
path, symbol , output_logits = model.step(sess, encoder_inputs, decoder_inputs,
target_weights, bucket_id, True,beam_search )
Here, output_logits
is not used, as far as I can tell. It is a list of length output_size
where each entry is a length of beam_size
but how is it different from symbol
?
Using symbol
and path
you put together the beam_size
responses, but which is best? Why doesn't output_logits
have their cost?
You do depending on the task. In Smart Reply they find diverse replies depending on how different replies are using semi supervised clustering . Right now. they are sorted on lowest perplexity but that might not be best replies you are looking for