Whisper Encoder (extracted from pretrained) with a Linear on top and solve using CTC criterion
Primary LanguagePython