parlance/ctcdecode

How to use timesteps?

blankspark opened this issue · 1 comments

I have noticed the output of ctcdecode includes timesteps, which the description says it can be used as alignment.
But I just get shape (Batchsize,N_beams,N_timesteps). I don't know how to use it.

timesteps - Shape: BATCHSIZE x N_BEAMS

The timestep at which the nth output character has peak probability. Can be used as alignment between the audio and the transcript.

Thanks in advance.

@blankspark have you ever figured out how to use them? I am looking to get word-level time alignments, but I don't know how to calculate this information from the timesteps returned by ctcdecode.