Evaluation Protocol

Question

Evaluation Protocol

MoayedHajiAli opened this issue 5 months ago · 2 comments

Hello,
I noticed in your code that you have n_candidate_per_text to be set to 3 by default. I am wondering if that was used during the evaluation as it was not mentioned in the paper?
Additionally, what CLAP backbone did you use to calculate the CLAP Score in comparison with other methods such as Make-an-Audio and audio-ldm, as the scale of the number is very different. Thank you for your help.

Best,

Answer 1 · 2024-06-07T02:12:40.000Z

Hi,
For evaluation, neither we generated 3 samples nor we selected the best. The 630k-audioset-best checkpoint was used to report the scores.

Answer 2 · 2024-06-07T17:50:54.000Z

Thank you for your response!