Audio AI Telegram Bot

This is a Telegram Bot frontend for processing audio with several AI tools:

Coqui AI for text to speech
Whisper for speech to text
MDX23v2 for music separation
RVC WebUI for retrieval based voice conversion
Audio WebUI for retrieval based voice conversion training
AudioCraft for generating music and audio

The bot displays the progress (if available) and further information during processing by responding to the message with the prompt. Requests are queued, only one gets processed at a time.

The bot uses the Telegram Bot API. Rendered data are not saved on disk. Tested on Linux, but should be able to run on other operating systems.

Compiling

You'll need Go installed on your computer. Install a recent package of golang. Then:

go get github.com/nonoo/audio-ai-telegram-bot
go install github.com/nonoo/audio-ai-telegram-bot

This will typically install audio-ai-telegram-bot into $HOME/go/bin.

Or just enter go build in the cloned Git source repo directory.

Prerequisites

Create a Telegram bot using BotFather and get the bot's token.

Coqui AI

Follow the installations steps and make sure the tts command is available.
Create a shell script in the Coqui AI directory with the following contents:
Copy the scripts/tts.sh shell script to the repo directory
Set this shell script as the TTS binary for the bot using the -tts-bin command line argument.

Whisper

Follor the installation steps and make sure the whisper command is available.
Copy the scripts/whisper.sh shell script to the repo directory
Set this shell script as the STT binary for the bot using the -stt-bin command line argument.

MDX23v2

Clone the MDX23v2 repo
Enter into the cloned directory
python3 -m venv env
pip install -r requirements.txt
Copy the scripts/mdx.sh shell script to the repo directory
Set this shell script as the MDX binary for the bot using the -mdx-bin command line argument.

RVC WebUI

Clone the RVC WebUI repo
Enter into the cloned directory
python3 -m venv env
pip install -r requirements.txt
Copy the scripts/rvc.sh shell script to the repo directory
Set this shell script as the RVC binary for the bot using the -rvc-bin command line argument.
Set the RVC model path directory using the -rvc-model-path command line argument. This is usually located at rvc/assets/weights

Audio WebUI

Clone the Audio WebUI repo
Follow the installation instructions
Copy the scripts/rvc-train.py and scripts/rvc-train.sh to the Audio WebUI directory
Set the rvc-train.sh shell script as the RVC train binary for the bot using the -rvc-train-bin command line argument.

AudioCraft

Follow the installation steps

Musicgen

Set the scripts/musicgen.sh shell script as the Musicgen binary for the bot using the -musicgen-bin command line argument.

Audiogen

Set the scripts/audiogen.sh shell script as the Audiogen binary for the bot using the -audiogen-bin command line argument.

Running

You can get the available command line arguments with -h. Mandatory arguments are:

-bot-token: set this to your Telegram bot's token
-tts-bin: path of the TTS binary -stt-bin: path to the STT binary -mdx-bin: path to the MDX binary -rvc-bin: path to the RVC binary -rvc-model-path: path to the RVC weights directory -musicgen-bin: path to the Musicgen binary -audiogen-bin: path to the Audiogen binary

Set your Telegram user ID as an admin with the -admin-user-ids argument. Admins will get a message when the bot starts.

Other user/group IDs can be set with the -allowed-user-ids and -allowed-group-ids arguments. IDs should be separated by commas.

You can get Telegram user IDs by writing a message to the bot and checking the app's log, as it logs all incoming messages.

All command line arguments can be set through OS environment variables. Note that using a command line argument overwrites a setting by the environment variable. Available OS environment variables are:

BOT_TOKEN
ALLOWED_USERIDS
ADMIN_USERIDS
ALLOWED_GROUPIDS
TTS_BIN
TTS_DEFAULT_MODEL
STT_BIN
MDX_BIN
RVC_BIN
RVC_MODEL_PATH
RVC_DEFAULT_MODEL
RVC_TRAIN_BIN
RVC_TRAIN_DEFAULT_BATCH_SIZE
RVC_TRAIN_DEFAULT_EPOCHS
MUSICGEN_BIN
AUDIOGEN_BIN

Supported commands

/aaitts (-m [model]) [prompt] - text to speech
/aaitts-models - list text to speech models
/aaistt (-lang [language]) - speech to text
/aaimdx (-f) - music and voice separation (-f enables full output including instrument and bassline tracks)
/aairvc (model) (-m [model]) (-p [pitch]) (-method [method]) (-filter-radius [v]) (-index-rate [v]) (-rms-mix-rate [v]) - retrieval based voice conversion
/aairvc-train (model) (-m [model]) (-method [method]) (-batch-size [v]) (-epochs [v]) (-delete) - retrieval based voice conversion training
/aairvc-models - list rvc models
/aaimusicgen (-l [sec]) [prompt] - generate music based on given audio file and prompt
/aaiaudiogen (-l [sec]) [prompt] - generate audio
/aaicancel - cancel current req
/aaihelp - show this help

You can also use the ! command character instead of /.

You don't need to enter the /aaitts command if you send a prompt to the bot using a private chat.

Donations

If you find this bot useful then buy me a beer. :)