julianser/hed-dlg-truncated

A bug in the code?

Closed this issue · 8 comments

A bug in the code?

E

E

If I understood it correctly, there are these two approaches:

  1. You always initialize the decoder hidden state with a projected context-embedding (reset=True). This means you feed a newly generated response back to the encoders to generate a new context embedding for the new turn. This embedding is, after projection, used to initialize the decoder for the next turn.

  2. You initialize the decoder only when generating the first turn and from then on the decoder continues with its own hidden state for each consecutively generated response (reset=False).

I would say that those important inter-turn dependencies are encoded during the utterance- and context-encoder steps and that the decoder has not that much to do with that.

Did you also train the model with the same flag reset_utterance_decoder_at_end_of_utterance?

E

Oh, I thought that when (reset=True) that means you reset the hidden states and so the next turn will not be affected by previous turns (which is a bad thing) and generates the next turn like a completely independent response or something like that.

In both cases, turns are never independent. You have to distinguish between resetting the decoder hidden state and resetting the utterance-encoder hidden state.

Even if you reset the utterance-encoder hidden state between turns, inter-turn dependencies are still captured by the context-encoder (as it uses the utterance-embeddings as input).

The context encoders hidden state is then used (after a single NN-layer) to initialize the decoder hidden state. This one can then go by either approach 1 or 2 and there generation is also not turn-independent.

E

@dimeldo It looks like there is one "zero_mask" too many in https://github.com/julianser/hed-dlg-truncated/blob/master/search.py#L147-151.

Does it work if you try this instead:

new_hd = self.compute_decoder_encoding(enlarged_context, enlarged_reversed_context, self.max_len, zero_mask, numpy.zeros((self.model.bs), dtype='float32'), ran_vector, ones_mask)

Thanks, it's working now. Closed.