VoiceCraft API

This is a API for the extremely awesome VoiceCraft repository. The API allows you to generate voices from a single example .wav file and generate text using those voices.

Installing

Currently, the API is only supported on Ubuntu/Debian systems but that may change in the future if enough people want other distributions.

Automatic

The VoiceCraft API can easily be installed (on ubuntu/debian) by running the following command:

curl -fsSL https://raw.github.com/CalvesGEH/VoiceCraftAPI/main/install.sh | sh

This will download and run the installation script which will create a new user voicecraftapi and configure the API, VoiceCraft and a systemd service that will run on startup. You need SUDO access to run this as it needs to install packages and create the systemd service.

Manual/Development

The VoiceCraft API can also be built manually for either development or whatever other reason.

git clone https://github.com/CalvesGEH/VoiceCraftAPI.git
cd VoiceCraftAPI
./install_voicecraftapi.sh
# you can then run the api if desired
./run_api.sh

Once installed, you can also manually edit and install the systemd service.

Uninstalling

I have also provided a script which will uninstall VoiceCraftAPI except for the APT packages required, those can be manually uninstalled if you'd like.

curl -fsSL https://raw.github.com/CalvesGEH/VoiceCraftAPI/main/uninstall.sh | sh

API Calls

The API only exposes a couple of endpoints and all of these endpoints can be tested/accessed from http://<YOUR_SERVER_IP>:8245/docs. This /docs endpoint also includes information about each endpoint and is usually a good place to start if you are confused.

/newvoice

This endpoint creates a new voice from a given .wav audio file and optional configuration parameters. The .wav file should be following be ~6-12s and be 16000Hz. The name of the new voice will be exactly matched the name of the .wav file (frank.wav will create voice frank).

You can include a transcript but if one is not given, the API will automatically transcribe it for you. I've never not had it be correct but you can check the logs if you think it's wrong.

/voicelist

This endpoint simply returns a list of all available voices for inference.

/editvoice/{voice}

This endpoint will edit the saved inference parameters of a voice. Check out your server's /docs endpoint for available parameters.

/generateaudio/{voice}

This endpoint will generate audio as the given voice using the target_text given. It will return a .wav audio file streaming response.

Examples

It is extremely simple to use once the API is up and running. I start by finding a suitable voice clip for the character I want to clone and then edit the file to ensure it is a .wav at 16000Hz. Then, I navigate to http://<YOUR_SERVER_IP>:8245/docs and test the /newvoice endpoint directly in the browser, giving it my .wav file and then executing the request. Now that the voice is generated, I can make a request to the server to generate text.

curl --output generated.wav -d 'target_text=It is crazy how easy it is to use VoiceCraft api!' http://<YOUR_SERVER_IP>:8245/generateaudio/<VOICE>

art-from-the-machine/VoiceCraftAPI