This Repository can speak Japanese even if you train with Korean dataset, and can speak Korean even if you train with Japanese dataset.
By transcribing pronunciation from Japanese to Korean and Korean to Japanese, the unstable voice produced when using the existing multilingual ipa cleaners has been improved.
- A Windows/Linux system with a minimum of
16GB
RAM. - A GPU with at least
12GB
of VRAM. - Python >= 3.8
- Anaconda installed.
- PyTorch installed.
- CUDA 11.7 installed.
Pytorch install command:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
CUDA 11.7 Install:
https://developer.nvidia.com/cuda-11-7-0-download-archive
CUDNN 11.x Install:
https://developer.nvidia.com/rdp/cudnn-archive
- Create an Anaconda environment:
conda create -n jk python=3.8
- Activate the environment:
conda activate jk
- Clone this repository to your local machine:
git clone https://github.com/kdrkdrkdr/JK-VITS.git
- Navigate to the cloned directory:
cd JK-VITS
- Install the necessary dependencies:
pip install -r requirements.txt
pip install -U pyopenjtalk==0.2.0 --no-build-isolation
-
Place the audio files as follows. .wav files are okay. The sample rate of the audio must be 44100 Hz.
-
Set configs.
- If you train with japanese dataset, refer configs/ja.json
- If you train with korean dataset, refer configs/ko.json
- Make a config file by referring to these two files.
-
Write Transcripts.
- If you train with japanese dataset / reference
path/to/XXX.wav|[JA]こんいちわ。[JA]
- If you train with korean dataset / reference
path/to/XXX.wav|[KO]안녕하세요.[KO]
-
Preprocessing (g2p) for your own datasets. Preprocessed phonemes for your dataset.
python preprocess.py --filelists filelists/train.txt filelists/val.txt
- You can download and use pretrained_model to finetuning.
- If you train with japanese dataset, use japanese_pretrained_dataset (Completed)
- If you train with korean dataset, use korean_pretrained_dataset (Completed)
python train.py -c configs/ko.json -m ko
See inference.ipynb
Also, You can listen korean samples and japanese samples.
For more information, please refer to the following repositories: