janvainer/speedyspeech

About SSIM loss

Closed this issue · 2 comments

Hi, thanks for your work. Reading the paper, I have the doubt if your proposed loss (MAE+SSIM) produces more blurrier spectrograms or is it the other way around. Unfortunately it is not clear to me in the sentence:

"We compared our decoder to a variant trained without using
the SSIM loss. This produced blurrier spectrograms, but the
difference in audio quality was not noticeable."

Thank you in advance!

Hi, thanks for your interest in SpeedySpeech. The absence of SSIM loss resulted in blurrier spectrograms. So the combined MAE+SSIM gave sharper spectrograms and should sound a bit better. But the difference in voice clarity was not really noticeable for me.

Thanks for your fast answer. Indeed, adding SSIM loss resulted in less blurrier spectrograms. Thanks for your work!