self_attention in decoder for inference

Question

self_attention in decoder for inference

Closed this issue 3 years ago · 2 comments

I have trained the model without any errors. Next I did inference work by running:

python translate.py -model experiments/checkpoints/MIT_mixed_augm/MIT_mixed_augm_model_step_300000.pt -src data/${dataset}/src-test.txt -output experiments/results/test.txt -batch_size 64 -replace_unk -max_length 200 -fast

In my understanding, the following codes mean that concatenating previous input token with current input to form all_input, and then feed into self.self_attn. The shape of all_input should be [batch_size x current_step x model_dim].

    if previous_input is not None:
        all_input = torch.cat((previous_input, input_norm), dim=1)
        dec_mask = None
    if self.self_attn_type == "scaled-dot":
        query, attn = self.self_attn(all_input, all_input, input_norm,
                                     mask=dec_mask,
                                     layer_cache=layer_cache,
                                     type="self")

But in my experiment, the shape of all_input always be [batch_size x 1 x model_dim] and didn't change with steps. I don't know why and I confused me for long time. Have you ever met this problem before? Thank you.

Answer 1 · 2021-08-18T08:08:19.000Z

I got the reason for this. I used -fast in my predict command. So what't the meaning of -fast in this command?

Answer 2 · 2021-08-19T07:47:03.000Z

The underlying version of OpenNMT-py had two beam search implementations:
https://github.com/pschwllr/MolecularTransformer/blob/master/onmt/opts.py#L486

With -fast, you are using the one that should be faster in decoding the outputs.