TTS
yukiarimo opened this issue · 8 comments
Hello. Do you know how to turn this: https://github.com/nivibilla/build-nanogpt into TTS instead of audio-to-audio?
Hey @yukiarimo , I am trying todo that too, is there any progress on you side on this? I made some progress on audio to audio
- at first it was just noise
- then reduced noise
- now, no noise but bird voices I guess.
- working on next thing to upgrade it, so might be posting here about it...,
if you are interested to work on it with me, let me know.
thanks
So, I also found 2 things
- STT https://keras.io/examples/audio/transformer_asr/
- TTS https://github.com/tttzof351/SimpleTransfromerTTS
Enjoy!! :)
Gonna try it out! But how is that “without tokenizer”?
I think you are talking about audio-to-audio, so for that I build my own tokenizer hehe :'D
So, the concept behind the tokenizer is batches of data. Convert the combined audio say for 50MB for now; to mel spectrogram, encode the mel spectrogram into a sequence of integers and decode the sequence of integers back into the mel spectrogram. The mel spectrogram values are scaled and quantized to a range of integers. The encoding and decoding process maps these integers back and forth between the mel spectrogram values.
and in more general words, like at sec 1 we have encoded some kind of Mel spectrogram data. like we had for:
input: print(encode("hii there"))
output: [46, 47, 47, 1, 58, 46, 43, 56, 43]
input: print(decode(encode("hii there")))
output: hii there
Let me know if you can contribute on top of this, thanks.
I will send you the Colab link on this, where it’s working for me . Thanks
Hi, @yukiarimo here is the link: https://colab.research.google.com/drive/1NHFi8y1GCIUR4Nv0yguGVwOk2q0-JOEu?usp=sharing
.
But take a look on attached images of train and test loss etc on this https://github.com/tttzof351/SimpleTransfromerTTS. It shows you nearly take 400K iteration to generate good results.
If still issues just let me know.
Thanks,