Questions with regards to reproducing training and inference result

Question

Questions with regards to reproducing training and inference result

Closed this issue 4 months ago · 2 comments

Dear authors,

Thank you for the great paper and for providing open source code. I would hope to clarify some detail with regards to your training and inference/evaluation.

inference/evaluation

I used your model checkpoint and the VB-demand dataset which you shared through google drive to perform evaluation on the VB-demand test set. Here are the results that I obtained:
pesq: 3.4957
csig: 4.65187
cbak: 3.86279
covl: 4.13774

The results seems to be slightly different from the results that you shared in #9 , which are:
pesq: 3.4957
csig: 4.72751
cbak: 3.95033
covl: 4.22494

I am unsure what causes the differences in the csig, cbak, covl scores, and wonder if you may have any clues about it?
For your information, I used librosa to load the test audios at 16k sampling rate, and used pysepm.composite to compute these scores. My pysepm version is 0.1.

Training,

In section 3.1 of your interspeech paper, it was written that " The learning rate was set initially to 0.0005 and halved every 30 epoch". With regards to this statement, may I clarify if you stopped training after every 30 epochs, halved the learning rate in the config file and then resumed training?

May I also clarify if the checkpoint you provided is the best checkpoint or the last checkpoint during the 100 epochs of training?

Thank you for reading and I hope to hear back from you.

Answer 1 · 2024-05-27T10:55:20.000Z

Hi,

For your first question, the tool I used in the code to calculate the objective metrics was directly inherited from CMGAN.
I also noticed that its results differ slightly from those calculated by the pysepm package, but to ensure a fair comparison with CMGAN, I used the tools they provided.

For your second question, in the conference version, we halved the learning rate every 30 epochs.
However, in subsequent work, we found that using an exponential decay method yields better results, so the method provided in this repository is the latter.

Lastly, the checkpoint I put in this repository is the best checkpoint.

Answer 2 · 2024-05-27T13:18:59.000Z

Thank you very much for addressing my questions.