hitachi-speech/EEND

about the input_dim in infer stage

Wang-Zeyu opened this issue · 2 comments

Thank for the open source eend code.

There are something wrong about my model in infer stage. The data I used for training is 16KHz, so I use the feature "logmel". But when in the infer stage, the input_dim by calculated is 3855(should be 40*15).

In "get_input_dim" function, if feature type is “logmel”,run the "else" branch. Because I can't understand the Calculation process about "else" branch ,so I ask for help(Now I just enforce the input_dim).


infer.yarml
sampling_rate: 16000
frame_size: 400
frame_shift: 160
input_transform: logmel

def get_input_dim(
frame_size,
context_size,
transform_type,
):
if transform_type.startswith('logmel23'):
frame_size = 23
else:
fft_size = 1 << (frame_size - 1).bit_length()
frame_size = int(fft_size / 2) + 1
input_dim = (2 * context_size + 1) * frame_size
return input_dim

Thanks for the report. Yes, it's a bug. In the "else", we assume STFT features, so we have to set frame_size=40 when logmel features are used. I fixed the code as follows. Could you try it?
95216f0

It works, thanks a lot :) !