beam_attention_decoder ValueError

Question

beam_attention_decoder ValueError

jmugan opened this issue 8 years ago · 7 comments

Cool code. In beam_attention_decoder, I get an error at line 615

s = math_ops.reduce_sum(
              v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])

It looks like maybe y takes the beams into account but hidden_features[a] does not.

The stack trace looks like

File "/Users/jmugan/Box Sync/workspace/DeepLearning/Git/21CT_Translate/ObjectTranslatorTF/ncm_seq2seq.py", line 615, in attention
    v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 518, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 44, in add
    return _op_def_lib.apply_op("Add", x=x, y=y, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2156, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1612, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1390, in _BroadcastShape
    % (shape_x, shape_y))
ValueError: Incompatible shapes for broadcasting: (32, 2, 1, 300) and (320, 1, 1, 300)

Answer 1 · 2016-07-29T02:14:41.000Z

Did you do it with ubuntu dataset ?

Answer 2 · 2016-07-29T02:44:42.000Z

You use beam decoder only for evaluation

Answer 3 · 2016-07-29T03:05:49.000Z

Different dataset. Thanks for the response. I'll try it on just the evaluation.

Answer 4 · 2016-07-31T12:55:17.000Z

@jmugan Could you please share your results?

Answer 5 · 2016-07-31T16:18:57.000Z

Sure. I try it in the next few days and let you know how it comes out.

Answer 6 · 2016-08-04T23:06:42.000Z

How do you tell which of the beams returned the best sequence? In line 253 of neural_conversation_model.py the output to step is

path, symbol , output_logits = model.step(sess, encoder_inputs, decoder_inputs,
             target_weights, bucket_id, True,beam_search )

Here, output_logits is not used, as far as I can tell. It is a list of length output_size where each entry is a length of beam_size but how is it different from symbol?

Using symbol and path you put together the beam_size responses, but which is best? Why doesn't output_logits have their cost?

Answer 7 · 2016-09-08T04:05:18.000Z

You do depending on the task. In Smart Reply they find diverse replies depending on how different replies are using semi supervised clustering . Right now. they are sorted on lowest perplexity but that might not be best replies you are looking for