bfs18/rfwave

How much of the performance boost is due to Rectified Flow?

Closed this issue · 2 comments

Hi, this model looks great. I was wondering whether you knew how much of the performance can be attributed to using Rectified Flow as opposed to the GAN Vocos is using.

Hi, this model looks great. I was wondering whether you knew how much of the performance can be attributed to using Rectified Flow as opposed to the GAN Vocos is using.

Hi there, I appreciate your curiosity. It's challenging to quantify the impact precisely as I haven't personally implemented a multi-band scheme and waveform equalization on GAN-based vocoders. Nevertheless, Rectified Flow is an area that definitely warrants attention. Stability AI is currently advancing Rectified Flow's capabilities in the realm of text-to-image applications. For a more comprehensive understanding, you might want to take a look at their paper available at https://arxiv.org/abs/2403.03206

Thank you for your reply! Maybe I'll try implementing a GAN version if I have time 😆