onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 211 by 421

when I try to use inference demo, I got an error
import librosa
from espnet_onnx import Speech2Text

speech2text = Speech2Text(model_dir='/root/autodl-tmp/.cache/espnet_onnx/librispeech_100-asr-conformer-aed')

wav_file = '../wav_test/121-121726-0000.wav'
y, sr = librosa.load(wav_file, sr=16000)
nbest = speech2text(y)

error info
/root/miniconda3/envs/espnet-onnx/lib/python3.8/site-packages/espnet_onnx/utils/abs_model.py:63: UserWarning: Inference will be executed on the CPU. Please provide gpu providers. Read How to use GPU on espnet_onnx in readme in detail.
warnings.warn(
2023-04-15 19:34:26.289677552 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running Add node. Name:'Add_398' Status Message: /hdd/doc/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:523 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 211 by 421

Traceback (most recent call last):
File "inference.py", line 9, in
nbest = speech2text(y)
File "/root/miniconda3/envs/espnet-onnx/lib/python3.8/site-packages/espnet_onnx/asr/asr_model.py", line 79, in call
enc, _ = self.encoder(speech=speech, speech_length=lengths)
File "/root/miniconda3/envs/espnet-onnx/lib/python3.8/site-packages/espnet_onnx/asr/model/encoders/encoder.py", line 70, in call
self.forward_encoder(feats, feat_length)
File "/root/miniconda3/envs/espnet-onnx/lib/python3.8/site-packages/espnet_onnx/asr/model/encoders/encoder.py", line 87, in forward_encoder
self.encoder.run(["encoder_out", "encoder_out_lens"], {
File "/root/miniconda3/envs/espnet-onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'Add_398' Status Message: /hdd/doc/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:523 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 211 by 421

please help me. thank you very much!

I found that it was not the problem of wav file loading, because when I switched to the timit dataset in the demo, for the same wav file, timit could work normally.

I guess there is a problem with the export of librispeech's conformer model, please help me！

After trying to load 'kamo-naoyuki/timit_asr_train_asr_raw_word_valid.acc.ave' and 'kamo-naoyuki/librispeech_asr_train_asr_conformer5_raw_bpe5000_frontend_confn_fft400_frontend_confhop_length160_scheduler_confwarmup_steps25000_batch_bins14000 After 0000_optim_conflr0.0015_initnone_sp_valid.acc.ave' from espnet_model_zoo , I guessed the problem was with espnet-onnx export or export_from_zip, because when I ues from_pretrained, That's not going to happen

Hi @QMZ321, thank you for reporting your issue.
Am I right that you could successfully export the model kamo-naoyuki/timit_asr_train_asr_... with export_from_pretrained method? Basically, export_from_pretrained and export_from_zip do the same process, so I don't think we have an issue with export_from_zip if export_from_pretrained works.
The issue happens with the Add_398 node, so it might be an issue with positional encoding. If possible, would you install netron and visualize the model structure around Add_398 for further debugging? We need to figure out exactly what operation is causing this problem.(You can use Ctrl + F to search the node in netron)

Hi @Masao-Someki, thank you for your reply!
This is the result of the netron visualization.

Thank you, it seems that the sequence length is different in the following part. I think the sequence length of matrix_ac is 211 while matrix_bd is 421.
Would you check your rel_shift version? It seems that the model uses the latest version of relative position embedding while legacy_rel_shift is used during attention calculation.

espnet_onnx/espnet_onnx/export/asr/models/multihead_att.py

Lines 86 to 96 in c074393

    
           # compute matrix b and matrix d 
        
           # (batch, head, time1, time1) 
        
           matrix_bd = torch.matmul(q_with_bias_v, p.transpose(-2, -1)) 
        
           if self.is_legacy: 
        
               matrix_bd = self.legacy_rel_shift(matrix_bd) 
        
           else: 
        
               matrix_bd = self.rel_shift(matrix_bd) 
        
           scores = (matrix_ac + matrix_bd) / math.sqrt( 
        
               self.d_k 
        
           )  # (batch, head, time1, time2)

@Masao-Someki Sorry, I'm a newbie and don't know how to check the rel_shift version. This is the config I used during espnet training, I don't know if it can help.

This is the config I used
https://github.com/espnet/espnet/blob/master/egs2/librispeech_100/asr1/conf/tuning/train_asr_conformer_lr2e-3_warmup15k_amp_nondeterministic.yaml

is this one？

@QMZ321
It seems that the configuration is correct. Just for clarification, which version of PyTorch, onnx, onnxruntime and espnet_onnx do you use?
If you installed espnet_onnx via pip, then would you clone this repository and check if the issue still happens with the latest script?

@Masao-Someki

It seems that the configuration is correct. Just for clarification, which version of PyTorch, onnx, onnxruntime and espnet_onnx do you use?

my version is: pytorch=1.13.1, onnx=1.11.0, onnxruntime=1.11.1.espnet, espnet_onnx=0.1.10

If you installed espnet_onnx via pip, then would you clone this repository and check if the issue still happens with the latest script?

After I cloned this repository, I run the command python setup.py install.
I try again, but still the same problem.

@QMZ321
Would you check the type of class for position embedding and multihead attention?
In that configuration, the position embedding should be the RelPositionalEncoding class, and attention should be the RelPositionMultiHeadedAttention class.
I think this issue happens with the LegacyRelPositionMultiHeadedAttention class.

@Masao-Someki
For debug, I extracted the train part separately:

I inserted a breakpoint to see the type of rel_pos_type:

The result is as follows:

@Masao-Someki
I found that as long as the optimize option is disabled,

from espnet_onnx.export import ASRModelExport

m = ASRModelExport()
# m = ASRModelExport('/root/autodl-tmp/.cache/espnet_onnx')
m.export_from_zip(
  '/root/autodl-tmp/projects/espnet_onnx_project/model/espnet/asr_train_asr_conformer_lr2e-3_warmup15k_amp_nondeterministic_raw_en_bpe5000_sp_valid.acc.ave.zip',
  tag_name='librispeech_100-asr-conformer-aed',
  quantize=True,
  # optimize=True
)

it works fine.

But I think the optimize option is very necessary for me, please help me.

@Masao-Someki
Maybe it's a problem with custom onnxruntime? Because the optimize option is related to custom onnxruntime.
And I'm not sure if my custom onnxruntime is correct, because the link below here doesn't work.

instead I used the latest version of custom onnxruntime in releases

@QMZ321 Sorry for the late reply.
Would you set use_ort_for_espnet configuration to True, if not set it?

from espnet_onnx.export import ASRModelExport
m = ASRModelExport()
m.set_export_config(
    max_seq_len=5000,
    use_ort_for_espnet=True,
)
m.export_from_pretrained(tag_name, quantize=False, optimize=True)

If this fixes your issue, then I will modify some documents to mention this configuration...

@Masao-Someki
Thank you for your reply!
Through your method, I successfully solved the problem!
Thank you very much!
👍 👍 👍 :)

	# compute matrix b and matrix d
	# (batch, head, time1, time1)
	matrix_bd = torch.matmul(q_with_bias_v, p.transpose(-2, -1))
	if self.is_legacy:
	matrix_bd = self.legacy_rel_shift(matrix_bd)
	else:
	matrix_bd = self.rel_shift(matrix_bd)

	scores = (matrix_ac + matrix_bd) / math.sqrt(
	self.d_k
	) # (batch, head, time1, time2)