I want to generate audio from image or text, which model should I use? Thanks
WilTay1 opened this issue · 1 comments
WilTay1 commented
I want to generate audio from image or text, which model should I use? Thanks
Zeqiang-Lai commented
I am sorry that this repo currently only contains models for generate image from audio, or other modality data.
For text to audio, you could use https://huggingface.co/docs/diffusers/api/pipelines/audio_diffusion