karpathy/build-nanogpt

TTS

yukiarimo opened this issue · 8 comments

Hello. Do you know how to turn this: https://github.com/nivibilla/build-nanogpt into TTS instead of audio-to-audio?

Hey @yukiarimo , I am trying todo that too, is there any progress on you side on this? I made some progress on audio to audio

  • at first it was just noise
  • then reduced noise
  • now, no noise but bird voices I guess.
  • working on next thing to upgrade it, so might be posting here about it...,

if you are interested to work on it with me, let me know.

thanks

Gonna try it out! But how is that “without tokenizer”?

I think you are talking about audio-to-audio, so for that I build my own tokenizer hehe :'D

So, the concept behind the tokenizer is batches of data. Convert the combined audio say for 50MB for now; to mel spectrogram, encode the mel spectrogram into a sequence of integers and decode the sequence of integers back into the mel spectrogram. The mel spectrogram values are scaled and quantized to a range of integers. The encoding and decoding process maps these integers back and forth between the mel spectrogram values.

and in more general words, like at sec 1 we have encoded some kind of Mel spectrogram data. like we had for:

input: print(encode("hii there"))
output: [46, 47, 47, 1, 58, 46, 43, 56, 43]
input: print(decode(encode("hii there")))
output: hii there

Let me know if you can contribute on top of this, thanks.

I will send you the Colab link on this, where it’s working for me . Thanks

Hi, @yukiarimo here is the link: https://colab.research.google.com/drive/1NHFi8y1GCIUR4Nv0yguGVwOk2q0-JOEu?usp=sharing.

But take a look on attached images of train and test loss etc on this https://github.com/tttzof351/SimpleTransfromerTTS. It shows you nearly take 400K iteration to generate good results.

If still issues just let me know.

Thanks,