hirofumi0810/neural_sp

conformer transducer

jinggaizi opened this issue · 1 comments

hi,
Have you ever run a example of conformer transducer, i try to run on aishell but have some problem.
first ,i copy the configuration of librispeech -- lstm_rnnt_bpe1000.yaml and it's ok
then i replace the transformer decoder with lstm in examples/aishell/s5/conf/asr/conformer_m.yaml as flower :
conv_batch_norm: false
conv_layer_norm: false
enc_type: conv_conformer
conformer_kernel_size: 3
enc_n_layers: 12
transformer_enc_pe_type: relative ###
transformer_enc_d_model: 256
transformer_enc_d_ff: 2048 ###
transformer_enc_n_heads: 4
dec_type: lstm_transducer
dec_n_units: 1024
dec_n_projs: 0
dec_n_layers: 1
dec_bottleneck_dim: 512
emb_dim: 512
tie_embedding: false
ctc_fc_list: "512"
it's error when i run scripts:

============================================================================
ASR Training stage (stage:4)

Original utterance num: 360294
Removed 1 utterances (threshold)
Removed 0 utterances (for CTC)
Original utterance num: 14326
Removed 0 utterances (threshold)
Removed 0 utterances (for CTC)
Original utterance num: 7176
Removed 0 empty utterances
0%| | 0/360293 [00:00<?, ?it/s]/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/rnn.py:179: RuntimeWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
self.dropout, self.training, self.bidirectional, self.batch_first)
21%|██████████████████████████████████████ | 75792/360293 [31:07<1:16:18, 62.14it/s][]
[]
[]
[]
[]
[]
[]
[]
Traceback (most recent call last):
File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 547, in
save_path = pr.runcall(main)
File "/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 321, in main
teacher, teacher_lm)
File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 400, in train
teacher=teacher, teacher_lm=teacher_lm)
File "/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/search/speech/jingbojun/exp/neural_sp/tools/neural_sp/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/search/speech/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 264, in forward
loss, observation = self._forward(batch, task, teacher, teacher_lm)
File "/search/speech/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 274, in forward
eout_dict = self.encode(batch['xs'], 'all')
File "/search/speech/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 398, in encode
xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.)
File "/search/speech/jingbojun/exp/neural_sp/neural_sp/models/torch_utils.py", line 73, in pad_list
xs_pad = xs[0].new_zeros(bs, max_time, * xs[0].size()[1:]).fill
(pad_value)
IndexError: list index out of range

i set the training sentences as a multiple of N*batchsize, N is the number of GPU, then it's work