A compact audio-to-phoneme aligner for singing voice.
The available datasets in our experiments are: Opencpop, NamineRitsu. One can experiment on your own datasets.
Once the data is prepared, you should just do:
- Create a virtual environment.
- Define the dataloder and the collate function in utils/data_utils.py. You can inherit the existing classes.
- Import your dataloader to train.py and change trainset, valset and collate_fn in prepare_dataloaders function.
- Prepare a file named phone_set.json which contains the phone set of your dataset and put it at root of data_dir.
- Change the data_dir to your data path in hparams.py
- Run this command to start training:
CUDA_VISIBLE_DEVICES=0 python train.py --output_directory experiments/exp_name/ --log_directory tensorboard_logs
- Run this command to start inferring:
CUDA_VISIBLE_DEVICES=0 python infer_prob.py --checkpoint_path experiments/exp_name/checkpoint_name \
--output_dir experiments/exp_name/
@inproceedings{zheng2023compact,
title={A Compact Phoneme-To-Audio Aligner for Singing Voice},
author={Zheng, Meizhen and Bai, Peng and Shi, Xiaodong},
booktitle={International Conference on Advanced Data Mining and Applications},
pages={183--197},
year={2023},
organization={Springer}
}