Feb 10th Meeting
realzza opened this issue · 0 comments
realzza commented
Todos
-
Check: Does VAD change speech data in data prep (P1)
No. The VAD step computes the VAD information only, and store it in the dumpdir, in the file
vad.scp
. The VAD step is used to mark to non-speech segments, and then exclude those segment information from training. However, it is true that these missing blanks could affect our reconstruction loss. But it can improve the quality of synthesized audios. It is a tradeoff we need to be aware. -
Keep VITS with xvector and VAD training
-
- No, the decoded wav sample rate is still
22050
. Trying the following steps.- check the training process
- check
tts_inference.py
file on sample rate usage.
- Inference jobs are not eligible to submit since Feb 13th. Couldn't decode to see if meet correct requirement.
- Applied retrained model. Speaker information is integrated!
/ocean/projects/cis210027p/zzhou5/espnet/egs2/librispeech_100/tts_vits/exp/16k_xvector/tts_beta_lib100_vits_tts_all16k_char_xvector/decode_with_trained_16k_vocoder
- No, the decoded wav sample rate is still
-
If 3 does not work, consult Jiatong (p2)
-
Run inference w/o trained vocoder
-
Integrate VITS model in cyclic systems (p3)