haoheliu/AudioLDM-training-finetuning

Embed mode for AudioLDM model

NZqian opened this issue · 0 comments

NZqian commented

It seems that the the model is contitioned on text embedding in the config, while the paper concludes that it is better to use audio embedding, so which one is better?