PyTorch Implementation: Convolution-augmented Transformer for Speech Recognition
This is an ASR model called Conformer made by Google.
This paper introduces only encoder models. However, I implemented both encoder and decoder model using PyTorch.
Encoder was implemented as conformer according to the paper, and decoder was implemented as 'Something'.
(Decoder has not been decided which model to use)
from conformer.trainer import Trainer
Trainer().fit(...)
from conformer.predictor import Predictor
Predictor(model_path='path/to/').eval(...)