This is an implementation of the following paper. 《FCTALKER: FINE AND COARSE GRAINED CONTEXT MODELING FOR EXPRESSIVE CONVERSATIONAL SPEECH SYNTHESIS》
Yifan Hu, Rui Liu *, Haizhou Li.
You can download dataset from DailyTalk.
This project uses conda
to manage all the dependencies, you should install anaconda if you have not done so.
# Clone the repo
git clone https://github.com/walker-hyf/FCTalker.git
cd $PROJECT_ROOT_DIR
Install dependencies
conda env create -f ./environment.yaml
Activate the installed environment
conda activate FCTalker
Run
python3 prepare_align.py --dataset DailyTalk
for some preparations.
For the forced alignment, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences.
Pre-extracted alignments for the datasets are provided here.
You have to unzip the files in preprocessed_data/DailyTalk/TextGrid/
. Alternately, you can run the aligner by yourself. Please note that our pretrained models are not trained with supervised duration modeling (they are trained with learn_alignment: True
).
After that, run the preprocessing script by
python3 preprocess.py --dataset DailyTalk
Train your model with
python3 train.py --dataset DailyTalk
Useful options:
- Currently only single GPU training is supported.
Only the batch inference is supported as the generation of a turn may need contextual history of the conversation. Try
python3 synthesize.py --source preprocessed_data/DailyTalk/val_*.txt --restore_step RESTORE_STEP --mode batch --dataset DailyTalk
to synthesize all utterances in preprocessed_data/DailyTalk/val_*.txt
.
To cite this repository:
@article{hu2022fctalker,
title={FCTalker: Fine and coarse grained context modeling for expressive conversational speech synthesis},
author={Hu, Yifan and Liu, Rui and Gao, Guanglai and Li, Haizhou},
journal={arXiv preprint arXiv:2210.15360},
year={2022}
}
E-mail:hyfwalker@163.com