Clone a voice and output speech in another language with the original voice.
Python 3.7 is recommended. Python 3.7 is REQUIRED, due to the version of tensorflow being used in this project.
python3 -m venv pyvenv
Activate virtual environment:
Windows: ./pyvenv/Scripts/activate
MacOS/Linux: source pyvenv/bin/activate
Deactivating the virtual environment:
deactivate
Note: Your python virtual environment may cause issues when running the UI.
3. Install ffmpeg.
Once installed, extract the folder and add <ffmpeg folder path>/bin
to path.
4. Install PyTorch:
- Pytorch Build: Stable (1.11.0).
- Your OS: Pick the OS your environment is running CogNative in (Windows or Linux recommended).
- Package: Pick what package installer you are using (pip recommended).
- Language: Python.
- Compute Platform: CUDA 11.3 recommended. If you don't have a GPU pick CPU.
pip3 install -r requirements.txt
6. Install models.
Once downloaded, add the models (*.pt) to CogNative/CogNative/models/RTVC/saved_models/default
The taco_pretrained folder (including the folder itself) needs to be downloaded and added to CogNative/CogNative/models/RTVCSwedish/synthesizer/saved_models/swedish
- Follow steps to setup Google Cloud credentials.
- Add Google Credentials to
credentials.json
in the top-level directory. There is currently a file namedcredentials.json.template
, yourcredentials.json
should match the key/value pairs shown there.
Start from the CogNative root directory.
To launch GUI, run python -m CogNative.testUI.UI
Any necessary flags which are not specified will cause a prompt to be generated which must be answered before continuing. Examples follow.
- Display Help Message:
python -m CogNative.main -help
CogNative CLI FLags:
-sampleAudio <PATH>: audio file of voice to clone
-synType <text, audio>: synthesis mode either given input text or by transcribing audio file
[-dialogueAudio] <PATH>: for audio synType, audio file of dialogue to speak
[-dialogueText] <PATH>: for text synType, text string of dialogue to speak
-out <PATH>: output audio file path
-useExistingEmbed <y/yes/n/no>: Uses saved embedding of previously used voice samples if enabled and present.
- Generate cloned voice from sample voice and text input:
python -m CogNative.main -sampleAudio CogNative/examples/MatthewM66.wav -synType text -dialogueText "The turbo-encabulator has now reached a high level of development, and it's being successfully used in the operation of novertrunnions." -out cmdExampleText.wav -useExistingEmbed y
Loaded encoder "english_encoder.pt" trained to step 1564501
Synthesizer using device: cuda
Building Wave-RNN
Trainable Parameters: 4.481M
Loading model weights at CogNative\models\RTVC\saved_models\default\vocoder.pt
Synthesizing...
Clone output to cmdExampleText.wav
- Generate cloned voice from sample voice and audio input file:
python -m CogNative.main -sampleAudio CogNative\examples\MatthewM66.wav -synType audio -dialogueAudio CogNative\examples\BillMaher22.wav -out cmdExampleAudio.wav -useExistingEmbed n
Loaded encoder "english_encoder.pt" trained to step 1564501
Synthesizer using device: cuda
Building Wave-RNN
Trainable Parameters: 4.481M
Loading model weights at CogNative\models\RTVC\saved_models\default\vocoder.pt
Loading requested file...
Synthesizing...
Clone output to cmdExampleAudio.wav
This script will translate audio from a supported language to English. To use the AutoTranslate script on Windows, drag and drop an audio file onto the script or place a SHORTCUT to the script in %AppData%\Microsoft\Windows\SendTo\
and use the "Send To" context menu function on an audio file to be translated. In both cases a new .wav file with the orginal filename followed by "_ + destination language" will be placed in the same folder. For other platforms, the same CLI flags should be used but details on context menu integration will vary by what packages are installed.
- Create your own branch
git branch yourname-feature-name
- Pull request with a good explanation of your branch
- Include issues that your pull request is addressing
- Squash and Merge, always.
This style guide is important to make sure that all style matches throughout the project. To style your code, please use the Black Python styler.
Single file: black <python-file-name>
All files: black .
This github repository serves as the foundation of our voice cloning module.
See license here.
This github repository trained the Swedish synthesizer.
Real-Time-Voice-Cloning Swedish
- Andrew Fennell
- Austin Currington
- Xingjian Hao
- Connor Tisdel
- Jacob Smith
- Aref Sadeghi