modelscope/3D-Speaker

For ERes2NetV2 performance on short-duration wavs

JiJiJiang opened this issue · 2 comments

Thank you for your well design of the ERes2Net model and make it open-source.

As you mention, the V2 version of ERes2Net improves the short-duration feature extraction capability of ERes2Net.
Are there any experimental results that support this conclusion?

If so, the ERes2Net model could be better for diarization task using the traditional clustering-based system. In this case, we usually extract speaker embeddings using a sliding-window, e.g., 1.5s.

Our experiments have thoroughly validated this conclusion, and the paper will be open-sourced in June. Thank you for your interest. You are invited to join the 3D-Speaker technical sharing session tonight at 8 pm. You can access the meeting through this link: https://mp.weixin.qq.com/s/uwvVUIDb0eaAHlfWiuwEoQ.

OK I see, thank you for your answer. Looking forward to your talk and paper.