nerdaxic/glados-voice-assistant

GLaDOS' Voice

Closed this issue ยท 23 comments

Hey, I love your project and am working on turning my pi into GLaDOS.... just not as elegantly as you have. I love the cleanness of your code, however feel like the voice is a little off. I found uberduck.ai which has a voice generator for GLaDOS as well as an API. I've managed to get a working script, its fairly quick and the voice is well balanced. I'm not sure how to contribute yet as I'm still pretty new to git but would love to help if your interested.

Would be awesome to get something like uberduck to run locally with the proper model. Current set up is only a prototype, takes way too long to process. Local caching helps but its not a good solution.

Help is welcome here!

I would love to know how to create a local model! I did a lot of reading to try and figure it out, but it would mean creating our own model, which I also did way too much reading about and sadly it was a little over my head.

Are you talking about your current setup being a prototype? Local caching certainly helps but your right, better option would be a local model. Still, it is an AWESOME prototype that you have created!

I see that uberduck has development software on git to build your own models. Sadly it's still over my head. They've already generated the glados model so maybe they could bundle it for this project?

Thank you. Multiple people have informed me about existence of Larynx - I have a lot to learn here, but seems like I'm going to continue this. I wish to try to get the TTS engine out as it's own lego.

@eternalliving local GLaDOS TTS model has been now implemented. Unfortunately it no longer runs on Raspberry Pi due to the CPU missing some instruction sets needed to run PyTorch.

If you want to play with stand-alone version check out:
๐Ÿ”— https://github.com/nerdaxic/glados-tts

Sweet I'll look into it and do some testing. Doing a quick bit of reading it looks like PyTorch can run on the Pi, but needs a 64-bit OS to do it. It also looks like there is now an active release of Raspberry Pi OS 64-bit. I might have to grab an extra SD card and do some testing, if you haven't already exhausted that option. For now, I'll load your tts on my windows machine and see how I make out.

I tried running it on Raspberry Pi 4 V1.4 for shits and giggles, on 64 bit OS and it turns out that script won't run, because the CPU is missing support for avx2 instruction set.

RuntimeError: Unknown qengine

Also I would imagine it being quite slow on RPI, my i7 takes like 10 seconds to generate a weather forecast.

Thanks for the info, I guess I'll leave it alone and stick to running it on a PC. I really didn't know TTS would take that long on a local setup, using Uberduck gives me quicker results (although not offline). Would've thought you would be able to get close to real-time on PC and that the RPi would come in around the 5-10 sec mark.

On desktop with a GPU the current TTS runs pretty much in real time. My voice assistant runs on an old laptop with a beefy CPU - but it's still not as powerful as GPU.

Felt kind of stupid having a full desktop PC running 24/7 just to turn the lights off now and then ๐Ÿ˜

So I gave it a try.... I'd only done a base install of python on my laptop so had a few extras I had to install. Was able to get it to run via the CPU after adding an init.py file to the subfolder utils. The code is set to only use the CPU. I've modified it to try and get it to run on CUDA but am having no luck. I have a 2070 in the laptop so hoping to get more real time then what the CPU gives.

I'm hitting this error:
Could not run 'aten::quantized_gru.input' with arguments from the 'CUDA' backend

My system has CUDA 11.6 but PyTorch is only at 11.3, not sure if this is what's causing the issue, or something else? Have you had luck running the TTS on a GPU?

Runs fine on my desktop with RTX2080 and laptop (CPU only).
Have not come across that error before, please let me know how if you manage to solve it!

Yes, I am able to get it to run just on the CPU. I was hoping to get it to run on the GPU though. Using vocoder.to('cuda') does not result in the GPU being engaged. With a long sentence I can watch the CPU amp up, but the GPU stays silent through it all.

I think the error I was getting was because I changed glados.cpu() to glados.cuda() in hopes to help. I also had to update the tools.py to use the cuda devices so there was a different error but that didn't work either.

It does work on CPU only so I guess I'm sticking to that for now.
Thanks

What kind of GPU does your machine have? Does it support vulkan or cuda?
My laptop GPU does not and thus always runs on CPU.

I have a RTX 2070 that supports cuda...
NVIDIA-SMI 511.79 Driver Version: 511.79 CUDA Version: 11.6
However watching the performance monitor it clearly only uses the cpu for both the tacotron forwarding and the HifiGAN.
I'm guessing it could be a version incompatability? PyTorch is currently built for 11.3. I haven't tried downgrading my CUDA as I didn't think it would matter.

I had some weird issues with RTX 2080, reinstalling pytorch solved the issue. Could be worth the try? CUDA version 10.something

I think I figured it out....

The model was loading to cpu as it seems that's where it was saved from.
Changing : vocoder = torch.jit.load('models/vocoder-gpu.pt')
to: vocoder = torch.jit.load('models/vocoder-gpu.pt', map_location='cuda')

That got vocoder loaded in the right place. And then I updated the following so the tts_output also loads on cuda when available.
mel = tts_output['mel_post'].to(device)

This got me running off the GPU. I saw some very interesting things though. The more you use it the faster tacotron gets, and hifiGAN gets slower until it pops into super speed:
Input: this is just the beginning
Forward Tacotron took 421.36406898498535ms
HiFiGAN took 4153.507947921753ms
Input: and the more we go on the faster you get
Forward Tacotron took 119.78602409362793ms
HiFiGAN took 3635.7500553131104ms
Input: but it seems at some point it tops out
Forward Tacotron took 137.9852294921875ms
HiFiGAN took 3906.981945037842ms
Input: if i can get it to go a little more
Forward Tacotron took 115.82279205322266ms
HiFiGAN took 37.75596618652344ms
Input: then we get almost realtime speeds
Forward Tacotron took 147.10712432861328ms
HiFiGAN took 15.622138977050781ms
Input: and I don't understand that at all
Forward Tacotron took 115.81277847290039ms
HiFiGAN took 31.24213218688965ms
Input: but thats what happens
Forward Tacotron took 115.86642265319824ms
HiFiGAN took 15.621423721313477ms

If I repeat a line or after many inputs, it crashes with
Error 259 for command:
play output.wav wait
The driver cannot recognize the specified command parameter.

Those are my observations. I like it when it's running it at 15ms!!! but it doesn't seem to last long though :(

These modifications might only be needed for cuda 11.x but might be worth a try to see if they are backwards compatible with cuda 10.x

Interesting... Thanks for your post!
I shall play around with that a bit later to see if I can make mine super speed ๐Ÿ˜†

I played around for way too long on this but got it to work flawlessly at around 15ms!! I'll try and create a pull request for it later today. There are a few changes that I made and a few observations that I can share.

Many thanks @eternalliving!
I tested out your fork and it worked flawlessly as you said, I merged it to main and replicated your changes into the engine.py.

I also ran into HiFiGAN running in few milliseconds after it had processed few samples - and then going back to slow. It is probably something to do with model being loaded into GPU memory and then dumped. Would be interesting to find a way to keep it in VRAM. If that is the issue.

Are you still having problems with HifiGAN going back to slow? I've ran over a hundred samples and it stays between 6ms-20ms. It does use over 2GB of RAM and over 500MB of VRAM! But it doesn't seem to unload it at any point. Not sure what would be causing it to offload the VRAM, unless you're running out, but I highly doubt that.

Yeah, if I keep generating samples it keeps going quickly.
But if I take a 30 second break or so, it takes over >1500ms again.

Hmmm... I'm not having that issue, even after leaving it for an hour or so, and even after hibernating the first run only takes around 300ms then it's back to 20ms. I wonder if there are power settings or ram-saving settings on your GPU that's offloading it?

I got the TTS running systematically 100-300ms quick on my workstation with RTX2080. Secret was to install pytorch with CUDA 11.3

The spell:
pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

Yea, thats the version I had in my tests as well. Although I still get 40-60ms for all executions... (that's now including both taco and hifi) Running a 2070 in a laptop. I think 100-300ms is still an acceptable amount of time though, especially if its hosted locally.