In the field of audio signal processing, source separation is a critical challenge with significant implications for applications such as music remixing, automatic transcription, and noise reduction. This challenge involves not only distinguishing different audio sources but also preserving their quality. Traditional approaches have primarily utilized Mean Squared Error (MSE) loss, focusing on mathematical precision. However, such methods do not necessarily align with human auditory perception. By integrating psycho-acoustic principles, our proposed methods aim to produce separations that sound more natural and are less affected by artifacts, even if they allow for greater mathematical deviations compared to traditional MSE-based methods. Through various implementation strategies of perceptual loss, our research seeks to bridge the gap between quantitative metrics and qualitative listening experience, significantly enhancing the practical applications of audio source separation.
Here's an example of using psycho-acoustic loss in PyTorch:
Here's a more detailed analysis of this loss: