CosyVoice2-0.5B does not have spk2info.pt

Question

CosyVoice2-0.5B does not have spk2info.pt

Opened this issue 7 days ago · 7 comments

I try to run webui.py, prompting that the variable sft_spk is an empty list, and then I find that spk2info.pt is not in the model

Answer 1 · 2024-12-16T11:43:43.000Z

It appears that you're encountering an issue with the absence of the spk2info.pt file in the pretrained_models\CosyVoice2-0.5B directory, which is causing the webui.py script to report that the sft_spk variable is an empty list.

To resolve this, you should unzip the provided spk2info.zip file to obtain the spk2info.pt file. After extracting it, place the spk2info.pt file within the pretrained_models/CosyVoice2-0.5B directory. This file is essential for the model, as it contains critical speaker information required for its proper operation.

Answer 2 · 2024-12-16T16:44:38.000Z

2,0 doesnt have speakers ..

Answer 3 · 2024-12-17T08:12:09.000Z

To provide a more seamless out-of-the-box experience like cosyvoice1, would it be possible to include a spk2info.pt file in the cosyvoice2 model files to handle built-in voices?
cosyvoice2's model files: https://modelscope.cn/models/iic/CosyVoice2-0.5B/files
cosyvoice1's model files: https://modelscope.cn/models/iic/CosyVoice-300M/files

Answer 4 · 2024-12-17T08:15:10.000Z

I have downloaded spk2info.zip and run webui.py successfully. However, in the process of copying speech, it is found that the effect of copying speech is not as good as the previous model.

Answer 5 · 2024-12-17T09:49:02.000Z

the model isnt trained on speaker - you directly zero shot copy it the wav you pass in .. v1 and v2 work different

Answer 6 · 2024-12-17T10:03:17.000Z

Thank you. We observed that CosyVoice2 inherits from CosyVoice. Given this inheritance, are the speaker-related functions (like list_avaliable_spks, inference_sft, inference_instruct...) still available in CosyVoice2 as long as the spk2info.zip file is present?

Answer 7 · 2024-12-17T14:46:20.000Z

read the paper - they are different / also in architecture .. even tho alot of the code is shared .. the readme points out how to inference the parts for v2 - there are no speakers