carpedm20/multi-speaker-tacotron-tensorflow

How can we exploit forced alignments?

yoosif0 opened this issue · 0 comments

Thank you so much for the work you have done in your tacotron implementation. I have a question if you may.
I have a speech corpus with time alignments. For each audio sample, I have a file that looks like this.

0.471000 121 sil
0.618000 121 Z
0.666000 121 i
0.716750 121 n
0.852974 121 a:
0.910125 121 z
0.987444 121 a
1.070000 121 t
1.130000 121 u
1.182000 121 l

What is the best tacotron implementation that can exploit this information?