pbhatia243/Neural_Conversation_Models

beam_attention_decoder ValueError

jmugan opened this issue · 7 comments

Cool code. In beam_attention_decoder, I get an error at line 615

s = math_ops.reduce_sum(
              v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])

It looks like maybe y takes the beams into account but hidden_features[a] does not.

The stack trace looks like

File "/Users/jmugan/Box Sync/workspace/DeepLearning/Git/21CT_Translate/ObjectTranslatorTF/ncm_seq2seq.py", line 615, in attention
    v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3])
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 518, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 44, in add
    return _op_def_lib.apply_op("Add", x=x, y=y, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2156, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1612, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1390, in _BroadcastShape
    % (shape_x, shape_y))
ValueError: Incompatible shapes for broadcasting: (32, 2, 1, 300) and (320, 1, 1, 300)

Did you do it with ubuntu dataset ?

You use beam decoder only for evaluation

Different dataset. Thanks for the response. I'll try it on just the evaluation.

@jmugan Could you please share your results?

Sure. I try it in the next few days and let you know how it comes out.

How do you tell which of the beams returned the best sequence? In line 253 of neural_conversation_model.py the output to step is

path, symbol , output_logits = model.step(sess, encoder_inputs, decoder_inputs,
             target_weights, bucket_id, True,beam_search )

Here, output_logits is not used, as far as I can tell. It is a list of length output_size where each entry is a length of beam_size but how is it different from symbol?

Using symbol and path you put together the beam_size responses, but which is best? Why doesn't output_logits have their cost?

You do depending on the task. In Smart Reply they find diverse replies depending on how different replies are using semi supervised clustering . Right now. they are sorted on lowest perplexity but that might not be best replies you are looking for