AppleHolic/source_separation

logstft vs. linear stft

Closed this issue · 5 comments

Hi, this is a great implementation of the complex unet. Congrats. I wonder if why you chose to use the logstft instead of the linear stft as done here. Did you observe better performance?

Just a small note, You have used MUSDB18 instead of DSD100 for singing voice. Its a bit larger. By the way, did you evaluate your results using museval?

Cheers
Fabian

@faroit

  1. Linear stft has too large numbers for training model in stably. When I made first model, I got faced gradient explosion on using linear stft, so I thought simply to solve them using log space.

  2. Not yet do that and thanks for awakening them. I will make a new model on MUSDB18 dataset and evaluate them with museval soon.

I will follow up second issue and notice the progress on this issue.

Thanks

Linear stft has too large numbers for training model in stably. When I made first model, I got faced gradient explosion on using linear stft, so I thought simply to solve them using log space.

I see. Have you considered using mean/std normalization instead/additionally?

No, I didn't consider using mean/std normalization. That also seems like can help the result. When I will have next experiment, I additionally try that.

On Testing, Simple comment

  1. with audioset : 2.4
  2. without audioset : 2.37

But, I got more better result on test data with audioset.

Continues following issue on #16 and other, close it.

  • Adding MUSDB