kyutai-labs/hibiki

Training Code

Opened this issue · 6 comments

hey, sorry if im missing something, but i can't find training code in it

We do not release the training code at the moment. We are planning on releasing training or fine tuning code for Moshi and Hibiki in the near future.

We do not release the training code at the moment. We are planning on releasing training or fine tuning code for Moshi and Hibiki in the near future.

Hi I’m a graduate researcher at Johns Hopkins. Would it make sense to wait for your training code or attempt to reproduce from your paper? I thought it was very well written! Thanks for your contributions!

Regards,
Caz

Just to mention that we've now released a codebase for finetuning moshi kyutai-labs/moshi-finetune. This could be adapted to train an hibiki like model, the most complicated part will be to find or generate the translation dataset for the use case that you want to consider.

@LaurentMazare In the Fine-tuning code, MIMI was kept frozen. But MIMI was only trained on the English language. And WavLM was also trained in English language. Should fine-tuning only the LM blend semantics of that language to audio tokens? I suppose MIMI can be used for multilanguage then? As you've mentioned j-moshi adapting moshi for Japanease language.

Indeed j-moshi uses mimi for Japanese without any fine-tuning afaik despite mimi not having been trained on it. Mimi seems fairly resilient to using different languages and usable for multilanguage. If in doubt you could easily encode/decode some audio snippets to check that it sounds ok.

Thank you for the suggestion.

Is it possible to share total training time for the model? In HF repo, found that 48 H100 were used. But total timing wasn't mentioned.

Especially for the Audio pretraining and the Speech translation training. If not, at least total time of the training (excluding fine-tuning) please?

I want to adopt it for my local language and open-source the version. So I need a little estimate cause probably need to change the temporal transformer.