A bug in the code?
Closed this issue · 8 comments
E
E
If I understood it correctly, there are these two approaches:
-
You always initialize the decoder hidden state with a projected context-embedding (reset=True). This means you feed a newly generated response back to the encoders to generate a new context embedding for the new turn. This embedding is, after projection, used to initialize the decoder for the next turn.
-
You initialize the decoder only when generating the first turn and from then on the decoder continues with its own hidden state for each consecutively generated response (reset=False).
I would say that those important inter-turn dependencies are encoded during the utterance- and context-encoder steps and that the decoder has not that much to do with that.
Did you also train the model with the same flag reset_utterance_decoder_at_end_of_utterance
?
E
Oh, I thought that when (reset=True) that means you reset the hidden states and so the next turn will not be affected by previous turns (which is a bad thing) and generates the next turn like a completely independent response or something like that.
In both cases, turns are never independent. You have to distinguish between resetting the decoder hidden state and resetting the utterance-encoder hidden state.
Even if you reset the utterance-encoder hidden state between turns, inter-turn dependencies are still captured by the context-encoder (as it uses the utterance-embeddings as input).
The context encoders hidden state is then used (after a single NN-layer) to initialize the decoder hidden state. This one can then go by either approach 1 or 2 and there generation is also not turn-independent.
E
@dimeldo It looks like there is one "zero_mask" too many in https://github.com/julianser/hed-dlg-truncated/blob/master/search.py#L147-151.
Does it work if you try this instead:
new_hd = self.compute_decoder_encoding(enlarged_context, enlarged_reversed_context, self.max_len, zero_mask, numpy.zeros((self.model.bs), dtype='float32'), ran_vector, ones_mask)
Thanks, it's working now. Closed.