encode() missing 1 required positional argument: 'src_mask' - Beam Decode

Hi, Thank you for making the code open-source!

I am trying to train a g2p based model with beam-decoding. Unfortunately, I am getting the following error. Please refer to the logs below for complete details.

FYI, the code works fine with greedy decoding. Kindly advice.

(base) [aagarwal@ip-0A000427 neural-transducer]$ python src/train.py --train data/100hrs-youtube.train --dev data/100hrs-youtube.dev --test data/100hrs-youtube.test --epochs 100 --dataset g2p --arch transformer --model models/v2-beam-search-decoding/v2 --decode beam
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: seed - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: train - ['data/100hrs-youtube.train']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dev - ['data/100hrs-youtube.dev']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: test - ['data/100hrs-youtube.test']
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: model - 'models/v2-beam-search-decoding/v2'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: load - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bs - 20
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: epochs - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_steps - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: warmup_steps - 4000
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: total_eval - -1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: optimizer - <Optimizer.adam: 'adam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: scheduler - <Scheduler.reducewhenstuck: 'reducewhenstuck'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: lr - 0.001
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: min_lr - 1e-05
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: momentum - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta1 - 0.9
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: beta2 - 0.999
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: estop - 1e-08
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cooldown - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: patience - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: discount_factor - 0.5
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_norm - 0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: gpuid - []
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: loglevel - 'info'
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: saveall - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: shuffle - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: cleanup_anyway - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dataset - <Data.g2p: 'g2p'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_seq_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: max_decode_len - 128
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: init - ''
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: dropout - 0.2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: embed_dim - 100
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_heads - 4
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_layer - 1
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: src_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: trg_hs - 200
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: label_smooth - 0.0
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: tie_trg_embed - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: arch - <Arch.transformer: 'transformer'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: nb_sample - 2
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: wid_siz - 11
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: indtag - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: decode - <Decode.beam: 'beam'>
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: mono - False
INFO - 10/18/20 14:31:37 - 0:00:00 - command line argument: bestacc - False
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab size 45
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab size 44
INFO - 10/18/20 14:31:37 - 0:00:00 - src vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', '"b', '"g', '"h', '"i', '"j', '"k', '"m', '"n', '"s', '"z', "'", 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ß', 'ä', 'ö', 'ü']
INFO - 10/18/20 14:31:37 - 0:00:00 - trg vocab ['<PAD>', '<s>', '<\\s>', '<UNK>', "'", ',"', '-', '.', '\\', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '¨', 'ß', 'ä', 'ç', 'è', 'é', 'ö', 'ü', 'ș']
INFO - 10/18/20 14:31:37 - 0:00:00 - model: Transformer(
                                       (src_embed): Embedding(45, 100, padding_idx=0)
                                       (trg_embed): Embedding(44, 100, padding_idx=0)
                                       (position_embed): SinusoidalPositionalEmbedding()
                                       (encoder): TransformerEncoder(
                                         (layers): ModuleList(
                                           (0): TransformerEncoderLayer(
                                             (self_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (linear1): Linear(in_features=100, out_features=200, bias=True)
                                             (dropout): Dropout(p=0.2, inplace=False)
                                             (linear2): Linear(in_features=200, out_features=100, bias=True)
                                             (norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (activation_dropout): Dropout(p=0.2, inplace=False)
                                           )
                                         )
                                         (norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                       )
                                       (decoder): TransformerDecoder(
                                         (layers): ModuleList(
                                           (0): TransformerDecoderLayer(
                                             (self_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (multihead_attn): MultiheadAttention(
                                               (out_proj): _LinearWithBias(in_features=100, out_features=100, bias=True)
                                             )
                                             (linear1): Linear(in_features=100, out_features=200, bias=True)
                                             (dropout): Dropout(p=0.2, inplace=False)
                                             (linear2): Linear(in_features=200, out_features=100, bias=True)
                                             (norm1): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm2): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (norm3): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                             (activation_dropout): Dropout(p=0.2, inplace=False)
                                           )
                                         )
                                         (norm): LayerNorm((100,), eps=1e-05, elementwise_affine=True)
                                       )
                                       (final_out): Linear(in_features=100, out_features=44, bias=True)
                                       (dropout): Dropout(p=0.2, inplace=False)
                                     )
INFO - 10/18/20 14:31:37 - 0:00:00 - number of parameter 216544
INFO - 10/18/20 14:31:37 - 0:00:00 - maximum training 269700 steps (100 epochs)
INFO - 10/18/20 14:31:37 - 0:00:00 - evaluate every 1 epochs
INFO - 10/18/20 14:31:37 - 0:00:00 - At 0-th epoch with lr 0.001000.
100%|| 2697/2697 [01:10<00:00, 38.40it/s]
INFO - 10/18/20 14:32:47 - 0:01:11 - Running average train loss is 1.5452647511058266 at epoch 0
INFO - 10/18/20 14:32:47 - 0:01:11 - At 1-th epoch with lr 0.001000.
100%|| 2697/2697 [01:06<00:00, 40.65it/s]
INFO - 10/18/20 14:33:54 - 0:02:17 - Running average train loss is 1.218658867061779 at epoch 1
100%|| 338/338 [00:02<00:00, 128.70it/s]
INFO - 10/18/20 14:33:56 - 0:02:19 - Average dev loss is 0.9772854196073035 at epoch 1
  0%|| 0/6741 [00:00<?, ?it/s]
Exception ignored in: <generator object StandardG2P.read_file at 0x2af3a8d8b3d0>
RuntimeError: generator ignored GeneratorExit
Traceback (most recent call last):
  File "src/train.py", line 350, in <module>
    main()
  File "src/train.py", line 346, in main
    trainer.run(start_epoch, decode_fn=decode_fn)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/trainer.py", line 373, in run
    eval_res = self.evaluate(DEV, epoch_idx, decode_fn)
  File "src/train.py", line 255, in evaluate
    decode_fn)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/util.py", line 194, in evaluate_all
    pred, _ = decode_fn(model, src)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 64, in __call__
    trg_eos=self.trg_eos)
  File "/share/pretzel1/exp1/aagarwal/neural-transducer/src/decoding.py", line 364, in decode_beam_search
    enc_hs = transducer.encode(src_sentence)
TypeError: encode() missing 1 required positional argument: 'src_mask'

Even adding "src_mask" to line 364. also didn't help. New errors are poping up.

enc_hs = transducer.encode(src_sentence, src_mask)

This is an example from my training data:

a c h z i g     a c h z g
v e r g l e i c h       v e r l i i c h
j o d l e r f e s t     j o d l e r f e s t
r o h r z u c k e r     g u t s c h

Hi! It did not support beam search decoding with transformer at the moment due to the naive implementation of transformer with beam search would be much slower, and the gain is relatively small in preliminary experiment.

Thank you @shijie-wu for the information.

It would be helpful if you could please help me with the below queries:

Any suggestions, what model from this list (soft,hard,approxihard,softinputfeed,largesoftinputfeed,approxihardinputfeed,hardmono,hmm,hmmfull,transformer,universaltransformer,tagtransformer,taguniversaltransformer) can I use for transliterations task (I have mentioned the example in the above comment)?
The current implementation for greedy decoding for transformers gives only one output (it takes only max probability). Is there any way if I can use the code of beam decoding implemented for other architectures and integrate with transformer pipeline?
At the end I want to have multiple possible outputs for my input (3-4 would good). Any suggestions how can I achieve so, with the current implementation?

Check out the new master to get support of beam search with transformer. For transliteration, I would recommend using the transformer model. To speed up beam search, try using smaller decoding length --max_decode_len 32 or smaller beam size --decode_beam_size 3. To get the top-k output, you would need modify the return of beam search function to get the top-k prediction (instead of the best prediction)

neural-transducer/src/decoding.py

Lines 469 to 470 in aeb9e60

    
           max_output = sorted(finish_beams, key=score)[0] 
        
           return list(map(int, max_output.partial_sent.split())), []

and write that to file in the following function.

neural-transducer/src/train.py

Lines 272 to 284 in aeb9e60

    
           pred, _ = decode_fn(self.model, src) 
        
           dist = util.edit_distance(pred, trg.view(-1).tolist()[1:-1]) 
        
           src_mask = dummy_mask(src) 
        
           trg_mask = dummy_mask(trg) 
        
           data = (src, src_mask, trg, trg_mask) 
        
           loss = self.model.get_loss(data).item() 
        
           trg = self.data.decode_target(trg)[1:-1] 
        
           pred = self.data.decode_target(pred) 
        
           fp.write( 
        
               f'{" ".join(pred)}\t{" ".join(trg)}\t{loss}\t{dist}\n') 
        
           cnt += 1

Thank you for the pointer. I will try to implement it.

I am closing the ticket for time-being. Will open it again, in case I get stuck. Thanks again! 👍

	max_output = sorted(finish_beams, key=score)[0]
	return list(map(int, max_output.partial_sent.split())), []

	pred, _ = decode_fn(self.model, src)
	dist = util.edit_distance(pred, trg.view(-1).tolist()[1:-1])

	src_mask = dummy_mask(src)
	trg_mask = dummy_mask(trg)
	data = (src, src_mask, trg, trg_mask)
	loss = self.model.get_loss(data).item()

	trg = self.data.decode_target(trg)[1:-1]
	pred = self.data.decode_target(pred)
	fp.write(
	f'{" ".join(pred)}\t{" ".join(trg)}\t{loss}\t{dist}\n')
	cnt += 1