Audio samples?
blx0102 opened this issue · 2 comments
blx0102 commented
Great work here!
It seem you have already combined voicebox with spear-tts, could you provide some result audio samples?
lucidrains commented
@blx0102 Lucas already has shared some early audio samples with me. Seems to work
lucasnewman commented
@blx0102 It's still early days here as we dial in the training and inference, but here's an early sample with the prompt that was used for the semantic tokens. This is a 93M param model that's done about ~200k training steps @ effective batch size of 16 on LibriTTS-R.