self_attention in decoder for inference
Closed this issue · 2 comments
I have trained the model without any errors. Next I did inference work by running:
python translate.py -model experiments/checkpoints/MIT_mixed_augm/MIT_mixed_augm_model_step_300000.pt -src data/${dataset}/src-test.txt -output experiments/results/test.txt -batch_size 64 -replace_unk -max_length 200 -fast
In my understanding, the following codes mean that concatenating previous input token with current input to form all_input, and then feed into self.self_attn. The shape of all_input should be [batch_size x current_step x model_dim].
if previous_input is not None:
all_input = torch.cat((previous_input, input_norm), dim=1)
dec_mask = None
if self.self_attn_type == "scaled-dot":
query, attn = self.self_attn(all_input, all_input, input_norm,
mask=dec_mask,
layer_cache=layer_cache,
type="self")
But in my experiment, the shape of all_input always be [batch_size x 1 x model_dim] and didn't change with steps. I don't know why and I confused me for long time. Have you ever met this problem before? Thank you.
I got the reason for this. I used -fast in my predict command. So what't the meaning of -fast in this command?
The underlying version of OpenNMT-py had two beam search implementations:
https://github.com/pschwllr/MolecularTransformer/blob/master/onmt/opts.py#L486
With -fast
, you are using the one that should be faster in decoding the outputs.