Streaming intermediate images?
sabetAI opened this issue ยท 15 comments
Is it possible to publish an update of the model that supports streaming intermediate images during reverse diffusion ie with an iterator? Would greatly help UX if the user can see their image form while they're waiting for the process to finish.
This isn't a diffusion model so that wouldn't work
Diffusion models iteratively update the image over multiple steps. These iterates can be streamed out (ie see glide demo). 'Reverse diffusion' is simply the image generation step ('diffusion' is the noising process during training), which is what your model is doing during inference. Can you update the code to output intermediate images?
Using the term 'reverse diffusion' might have caused some confusion with what I was asking.
This model is not like glide or VQGAN+CLIP.
DALL-E works on a entirely different principle. The image is generated with tiny squares (tokens), square by square, from left to right and top to bottom. It does not change all image at once at every iteration like diffuse models. Every iteration it just fills another tiny bit of the empty area with the completely ready tiny portion of the final image.
Ah good point @iScriptLex , I made assumptions about the model architecture. Even if it's outputting autoregressively, tokens can still be streamed out to incrementally update a canvas a pixel at a time. The main use-case here is so to show intermediate results to the user, as waiting kills the UX.
It might be possible to generate the images each time a row of tokens is decoded, and use some kind of blank token for the missing rows
@kuprel yes exactly. Also would it be more efficient just to stream rows of tokens and have the client handle everything else? Want to minimize latency that streaming may add.
This model is not like glide or VQGAN+CLIP. DALL-E works on a entirely different principle. The image is generated with tiny squares (tokens), square by square, from left to right and top to bottom. It does not change all image at once at every iteration like diffuse models. Every iteration it just fills another tiny bit of the empty area with the completely ready tiny portion of the final image.
this would still look cool while it was loading but i worry about latency and bandwidth, wouldn't a loading bar or something work just as well?
Ok I got it working in the colab now I just have to figure out how to get it on replicate. An intermediate image count of 8 only adds a couple seconds to the overall decoding time on the P100
I merged it. You can try it in the colab. Hopefully will get it onto replicate by tomorrow
Ok it's live on replicate now