collabora/WhisperSpeech

7. Train the final models

jpc opened this issue · 4 comments

jpc commented

Once all the bugs are ironed out (#4), we have a text to semantic model (#9), we improve the speech codec (#10) and we have more high-quality data (#11) we will train final models that should match (or even exceed) the quality Google showed in their SPEAR TTS demo page.

zolero commented

Hi @jpc, How's the process going? I've been following you for a while now. Do you have achieved some cool results already? Looking forward!

just fyi this zolero person is looking to voraciously monetise your models. You may want to release checkpoints under some non-commercial license if that bothers you.

jpc commented

@zolero Have you seen the new JFK speech resynthesized from just text (in the "wrong" voice for now) in the README? We are working on multi-speaker support and on scaling the models so in the next two weeks we should show a lot better results.

@152334H Thanks for the heads up but at Collabora we are into Open Source exactly because other people can also benefit from our work. We'd love to support them with a commercial contracts but it's not in our mission to stop them using non-commercial licenses or by switching to an open-core model.

jpc commented

We are no longer embarrassed by the quality of the models so we've achieved the MVP stage. :)

But stay tuned, we'll continue improving models, performance, controllability, API, etc.