/piper-recording-studio

Local voice recording for creating Piper datasets

Primary LanguageJavaScriptMIT LicenseMIT

Piper Recording Studio

Local tool for recording yourself to train a Piper text to speech voice.

Screen shot

Sponsored by Nabu Casa

Tutorial

See a video tutorial by Thorsten Müller

Docker

docker run -it -p 8000:8000 -v '/path/to/output:/app/output' rhasspy/piper-recording-studio

Visit http://localhost:8000 to select a language and start recording.

Add --help to see more options.

Building

docker build . -t rhasspy/piper-recording-studio

Installing without Docker

git clone https://github.com/rhasspy/piper-recording-studio.git
cd piper-recording-studio/

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt

Running without Docker

python3 -m piper_recording_studio

Visit http://localhost:8000 to select a language and start recording.

Prompts are in the prompts/ directory with the following format:

  • Language directories are named <language name>_<language code>
  • Each .txt in a language directory contains lines with:
    • <id>\t<text> or
    • text (id is automatically assigned based on line number)

Output audio is written to output/

See --debug for more options.

Exporting

Install ffmpeg:

sudo apt-get install ffmpeg

Install exporting dependencies:

python3 -m pip install -r requirements_export.txt

Export recordings for a language to a Piper-compatible dataset (LJSpeech format):

python3 -m export_dataset output/<language>/ /path/to/dataset

Requires a non-Docker install. If you used Docker to record your dataset, you may need to adjust the permissions of the output directory:

sudo chown -R "$(id -u):$(id -u)" output/

See --help for more options. You may need to adjust the silence detection parameters to correctly remove button clicks and keypresses.

Multi-User Mode

python3 -m piper_recording_studio --multi-user

Now a "login code" will be required to record. A directory output/user_<code>/<language> must exist for each user and language.