kuprel/min-dalle

Changing image size?

patrickjonesdotca opened this issue ยท 8 comments

Would love to know where to change the code to allow for larger image sizes than 256x256.
Better yet would be the ability to change them from the Colab.

Some people are using this model for upscaling: https://replicate.com/jingyunliang/swinir

It's possible to do a sliding window on image tokens, I have an implementation for ruDALL-E. We can go up to 1024x512

How does it look?

https://replicate.com/jingyunliang/swinir

I've looked at this but, it seems to lose a significant amount of details. I use a lot of photographic prompt modifiers and they end up looking smeared

@neverix, ruDALL-E doesn't use pass-through recurrent attention, so the result depends only on the input tokens sequence.
But DALL-E mini uses attention state as parameter, and this attention context changes recurrently during the generation. So I doubt it can process sliding windows effectively.

There are many ways to increase image size.

  1. You can just use ImageMagick with Jinc filter:
    convert input.png -filter jinc -resize 512 output.png
    Source image:
    img

Result:
img_j

  1. Use ffmpeg with xbr filter:
    ffmpeg -i input.png -vf "xbr=2" output.png
    Result:
    img_x

  2. Use any VQGAN model that supports decoding tokens into images of different sizes (just encode->decode to double size). Result:
    img_out

  3. Use RealESRGAN. They published several models and even compiled NCNN binaries for Windows and Linux, so you can run upscaler from command line even without any Python or CUDA environment.
    Result:
    img_r2

@iScriptLex Well, I implemented the functional caching so I would know ๐Ÿ˜…. It's passed around in a similar way, and nothing needs to be changed in the current DALL-E mini codebase to incorporate it (so it can be like a colab notebook).

But it is true that it's not as good as the one without caching, but that can only happen with #74

@kuprel Here's a sample 384px generation without upscaling
image