Using encoder as speaker embedding extractor 关于使用编码器作为说话人嵌入提取器

Question

Using encoder as speaker embedding extractor 关于使用编码器作为说话人嵌入提取器

ehsienmu opened this issue a year ago · 4 comments

Is it possible to obtain a speaker embedding without timbre using just the encoder? For example, the utterances are first extracted using the encoder, and then using a classification model to make it a speaker representation.

请问是有可能只用encoder来获得不含音色的speaker embedding吗？
例如，先使用编码器提取话语，然后使用分类模型使其成为说话者表示等方法

ehsienmu commented a year ago

谢谢你

Answer 1 · 2023-10-21T19:12:54.000Z

抱歉，读了十几遍，还是没看懂问题。不含音色怎么还能成为speaker embedding呢，分类模型怎么能让它成为speaker embedding呢……

Answer 2 · 2023-10-22T04:30:05.000Z

你好，感谢你的回复。我可能之前描述得不够清楚，我想探索是否可以通过pitch和rhythm来判断一段声音是属于哪位语者，进而看到您的这篇论文。所以我想问是否可以通过您设计的编码器先得到pitch和rhythm的embedding，我再使用speaker embedding或speaker verification进行任务。不知道是否可行。

Answer 3 · 2023-10-22T05:42:16.000Z

不是不可以，但我这个pitch和rhythm的embedding并不能保证完全不包含timbre信息