Zeqiang-Lai/Anything2Image

I want to generate audio from image or text, which model should I use? Thanks

WilTay1 opened this issue · 1 comments

I want to generate audio from image or text, which model should I use? Thanks

I am sorry that this repo currently only contains models for generate image from audio, or other modality data.

For text to audio, you could use https://huggingface.co/docs/diffusers/api/pipelines/audio_diffusion