Create realistic AI generated images from human voice
Leveraging open ai whisper and StableDiffusion in a cloud native application powered by Jina
Under the hood the whisper and stable diffusion models are wrapped into Executors that will make them self-contained microservices. Both of the microservices will be chained into a Flow. The Flow expose a gRPC endpoint which accept DocumentArray as input.
This is an example of a multi-modal application that can be built with jina
- Install requirements:
pip install -r requirements.txt
pip install -r executors/stablediffusion/requirements.txt
pip install -r executors/whisper/requirements.txt
- Start the jina Flow ( you need to get a HF token and accept the StableDiffusion terms to get the model weight. Otherwise you should provide it yourself to the Executor )
JINA_MP_START_METHOD=spawn HF_TOKEN=YOUR_FH_TOKEN python flow.py
- Alternatively you can deploy the Flow on Jcloud. To do so you should edit the flow.yml and put your HF token in it.
pip install jcloud
jc login
jc deploy flow.yml
- Start the gradio UI
python ui.py
or if you started the flow in Jcloud you can do
python ui.py --host grpcs://FLOW_ID.wolf.jina.ai
- Or just talk directly to the backend with the jina Client
from jina import Client
from docarray import Document
client = Client(host='localhost:54322')
docs = client.post('/', inputs=[Document(uri='audio.wav') for _ in range(1)])
for img in docs[0].matches:
img.load_uri_to_image_tensor()
docs[0].matches.plot_image_sprites()