hao-qiang opened this issue 8 months ago · 1 comments
https://github.com/alibaba-damo-academy/3D-Speaker/blob/9a455b3e429519aae91a63f36ae82f9b41423ad5/egs/3dspeaker/speaker-diarization/local/prepare_subseg_json.py#L47
在划分片段时,当取到音频末尾时,片段时长小于subseg_dur,是否应该从后往前取subseg_dur,即subseg_st = min(ed-subseg_dur, subseg_st)。如果按照当前的代码取到的片段时长极短时, embedding模型是否会报错呢?
已经修改,感谢建议! https://github.com/alibaba-damo-academy/3D-Speaker/blob/main/egs/3dspeaker/speaker-diarization/local/prepare_subseg_json.py#L47-L48