logstft vs. linear stft

Question

logstft vs. linear stft

Closed this issue 5 years ago · 5 comments

Hi, this is a great implementation of the complex unet. Congrats. I wonder if why you chose to use the logstft instead of the linear stft as done here. Did you observe better performance?

Just a small note, You have used MUSDB18 instead of DSD100 for singing voice. Its a bit larger. By the way, did you evaluate your results using museval?

Cheers
Fabian

Answer 1 · 2019-11-14T14:32:22.000Z

@faroit

Linear stft has too large numbers for training model in stably. When I made first model, I got faced gradient explosion on using linear stft, so I thought simply to solve them using log space.
Not yet do that and thanks for awakening them. I will make a new model on MUSDB18 dataset and evaluate them with museval soon.

I will follow up second issue and notice the progress on this issue.

Thanks

Answer 2 · 2019-11-14T15:15:01.000Z

Linear stft has too large numbers for training model in stably. When I made first model, I got faced gradient explosion on using linear stft, so I thought simply to solve them using log space.

I see. Have you considered using mean/std normalization instead/additionally?

Answer 3 · 2019-11-14T15:50:51.000Z

No, I didn't consider using mean/std normalization. That also seems like can help the result. When I will have next experiment, I additionally try that.

Answer 4 · 2019-11-28T11:35:34.000Z

On Testing, Simple comment

Evaluation PESQ will be coded, reported
- reference repo : https://github.com/ludlows/python-pesq
validation score (it is resampled from 22.05k to 16k)

with audioset : 2.4
without audioset : 2.37

But, I got more better result on test data with audioset.

Answer 5 · 2019-11-28T16:18:14.000Z

Continues following issue on #16 and other, close it.

Adding MUSDB