[Feature request] Speakers should be grouped by languages

Question

[Feature request] Speakers should be grouped by languages

Closed this issue 5 months ago · 1 comments

Some models, such as XTTS are both multi-speaker and multi-lingual. However, when you retrieve a list of speakers a model has, you can only get the names. As far as I can tell from reading through the code of both SpeakerManager and LanguageManager, this data isn't present in the code, if it's even loaded at all? Some users could infer that a speaker with a name like Gilberto Mathias is probably good for Spanish, but why leave that up to assumption? If these are tagged with languages somewhere, it would be nice to access it from the code, otherwise perhaps they could be manually labeled somehow?

Answer 1 · 2024-11-06T08:15:56.000Z

Technically you can use any speaker with any language, but you have a point. Ideally model authors choose speaker names accordingly.

Since XTTS was originally trained by Coqui, we have no access to that information here and can't add it. In theory the XTTS speakers could also have been created from multiple actual speakers, including from different languages.