
Evaluation Protocol

MoayedHajiAli opened this issue · 2 comments

I noticed in your code that you have n_candidate_per_text to be set to 3 by default. I am wondering if that was used during the evaluation as it was not mentioned in the paper?
Additionally, what CLAP backbone did you use to calculate the CLAP Score in comparison with other methods such as Make-an-Audio and audio-ldm, as the scale of the number is very different. Thank you for your help.


For evaluation, neither we generated 3 samples nor we selected the best. The 630k-audioset-best checkpoint was used to report the scores.

Thank you for your response!