hitachi-speech/EEND

Smoothing the activations at the output of the transformer

zaouk opened this issue · 0 comments

zaouk commented

Hey there,
I was wondering if you encountered any issues related to smoothing the speaker activations predicted using the Transformer model. An encoder only transformer tends to output speaker activations which are not as smooth as the ones provided by other recurrent models (such as Bi-LSTMs and such).
Did you resort to some tricks for smoothing the output activations provided by the Transformer or this was not an issue at all?