/NeuralTextToAudio

Text prompt steered synthetic audio generators

Primary LanguageJupyter Notebook

Colab notebooks for text-to-audio generators

User-friendly Colab notebooks for various text prompt steered synthetic audio generators.

Available notebooks:


AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

Open In Colab

Paper: Text-to-Audio Generation with Latent Diffusion Models

Colab for AudioLDM. Generates audio based on text description. This is probably the beginning of "Stable Diffusion of audio". Currently capable of producing 16 kHz audio only.


TorToiSe: Text-to-speech

Open In Colab

Paper: TorToiSe - Spending Compute for High Quality TTS

Colab for TorToiSe text-to-speech voice-cloning. This notebook takes a text string and an audio file (or files) of a speaker's voice, and attempts to synthesize the text using the given voice. Currently works with English text only.


MubertAI Text-to-Music

UPDATE: it seems like Mubert API now requires (paid) API key.

Open In Colab

Colab for MubertAI Text-to-Music. Generates music using predefined blocks created by the community (afaik) based on text description. See the source repository for information, such as licensing.


TTS Voice Cloning

Open In Colab

Paper: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Colab for Real-Time-Voice-Cloning text-to-speech voice-cloning. This notebook takes a text string and an audio file of a speaker's voice, and attempt to synthesize the text using the given voice. Fair warning: results are not great.