This is the phoneme duration extractor for Hi-Fi TTS dataset. The scripts are modified from the LJSpeech data processing scripts provided in NEMO.
python reorganize_hifitts.py --hifitts_dir=datasets/hi_fi_tts_v0 --reorg_hifitts_dir=datasets/hifitts_reorg
./extract_hifitts_phonemes_and_durs.sh datasets/hifitts_reorg
The extracted phone durations will be saved to alignments
dir.