For ERes2NetV2 performance on short-duration wavs

Question

For ERes2NetV2 performance on short-duration wavs

JiJiJiang opened this issue 7 months ago · 2 comments

Thank you for your well design of the ERes2Net model and make it open-source.

As you mention, the V2 version of ERes2Net improves the short-duration feature extraction capability of ERes2Net.
Are there any experimental results that support this conclusion?

If so, the ERes2Net model could be better for diarization task using the traditional clustering-based system. In this case, we usually extract speaker embeddings using a sliding-window, e.g., 1.5s.

Answer 1 · 2024-05-22T08:08:21.000Z

Our experiments have thoroughly validated this conclusion, and the paper will be open-sourced in June. Thank you for your interest. You are invited to join the 3D-Speaker technical sharing session tonight at 8 pm. You can access the meeting through this link: https://mp.weixin.qq.com/s/uwvVUIDb0eaAHlfWiuwEoQ.

Answer 2 · 2024-05-22T08:13:08.000Z

OK I see, thank you for your answer. Looking forward to your talk and paper.