lumaku/ctc-segmentation

What do timings denote

zgerrard opened this issue · 1 comments

Hi, do timings that are returned from the ctc_segmentation() function denote the time when the corresponding character starts or the time when it is in the middle of that character (highest probability).

Thank you.

In traditional hybrid DNN/HMM ASR, phoneme classes have a duration over multiple time frames. In CTC-based ASR, characters "occur". So, timings denote the most probable time of "occurrence" of a character. This corresponds to a_t in the paper.