revdotcom/speech-datasets

Ground truth transcriptions contain no timestamps

snakers4 opened this issue · 3 comments

Hi,

Ground truth transcriptions contain no timestamps, e.g.:

Also it is strange - the outputs of other systems (i.e. Google, Amazon) contain timestamps, whereas your system output are in different format.

Is all of this a bug, or a feature?
Can your dataset be just used as-is without pulling extra dependencies / tools?

Best,
Alex

Hi there,

With respect to the transcriptions that's actually a choice we made. Our method of generating timestamps wouldn't be perfect and we'd have to leave some tokens without timing information - as a result we decided to provide the ground truth transcriptions as is with out any timestamps.

With respect to our outputs that's a great catch - I'll make a PR to put them into the same format as the other systems for ease of use.

The dataset can definitely be used as-is without extra tools! We recommend to use our fstalign tool for the sake of reproducibility and the features it provides for WER calculation. For example, it'll facilitate the calculation of WER by entity class. But feel free to use the tool that works best for your use case

Best,
Miguel

We've updated the output directories of our models to include a directory with the nlp format to match the other system outputs.

I'll be closing this issue for the time being but if you have any more questions feel free to comment again or open up a new issue!

Best,
Miguel

Many thanks