7. Train the final models

Question

7. Train the final models

jpc opened this issue a year ago · 4 comments

Once all the bugs are ironed out (#4), we have a text to semantic model (#9), we improve the speech codec (#10) and we have more high-quality data (#11) we will train final models that should match (or even exceed) the quality Google showed in their SPEAR TTS demo page.

Answer 1 · 2023-04-14T11:34:53.000Z

Hi @jpc, How's the process going? I've been following you for a while now. Do you have achieved some cool results already? Looking forward!

Answer 2 · 2023-04-15T07:53:40.000Z

just fyi this zolero person is looking to voraciously monetise your models. You may want to release checkpoints under some non-commercial license if that bothers you.

Answer 3 · 2023-04-17T11:47:08.000Z

@zolero Have you seen the new JFK speech resynthesized from just text (in the "wrong" voice for now) in the README? We are working on multi-speaker support and on scaling the models so in the next two weeks we should show a lot better results.

@152334H Thanks for the heads up but at Collabora we are into Open Source exactly because other people can also benefit from our work. We'd love to support them with a commercial contracts but it's not in our mission to stop them using non-commercial licenses or by switching to an open-core model.

Answer 4 · 2024-01-09T12:34:20.000Z

We are no longer embarrassed by the quality of the models so we've achieved the MVP stage. :)

But stay tuned, we'll continue improving models, performance, controllability, API, etc.