Changing image size?

Question

Changing image size?

patrickjonesdotca opened this issue 2 years ago · 8 comments

Would love to know where to change the code to allow for larger image sizes than 256x256.
Better yet would be the ability to change them from the Colab.

Answer 1 · 2022-07-14T15:44:28.000Z

Some people are using this model for upscaling: https://replicate.com/jingyunliang/swinir

Answer 2 · 2022-07-14T16:55:33.000Z

It's possible to do a sliding window on image tokens, I have an implementation for ruDALL-E. We can go up to 1024x512

Answer 3 · 2022-07-14T17:18:00.000Z

How does it look?

Answer 4 · 2022-07-14T17:24:53.000Z

https://replicate.com/jingyunliang/swinir

I've looked at this but, it seems to lose a significant amount of details. I use a lot of photographic prompt modifiers and they end up looking smeared

Answer 5 · 2022-07-15T08:27:15.000Z

@neverix, ruDALL-E doesn't use pass-through recurrent attention, so the result depends only on the input tokens sequence.
But DALL-E mini uses attention state as parameter, and this attention context changes recurrently during the generation. So I doubt it can process sliding windows effectively.

Answer 6 · 2022-07-15T14:51:55.000Z

There are many ways to increase image size.

You can just use ImageMagick with Jinc filter:
convert input.png -filter jinc -resize 512 output.png
Source image:

Result:

Use ffmpeg with xbr filter:
ffmpeg -i input.png -vf "xbr=2" output.png
Result:
Use any VQGAN model that supports decoding tokens into images of different sizes (just encode->decode to double size). Result:
Use RealESRGAN. They published several models and even compiled NCNN binaries for Windows and Linux, so you can run upscaler from command line even without any Python or CUDA environment.
Result:

Answer 7 · 2022-07-15T15:36:16.000Z

@iScriptLex Well, I implemented the functional caching so I would know 😅. It's passed around in a similar way, and nothing needs to be changed in the current DALL-E mini codebase to incorporate it (so it can be like a colab notebook).

But it is true that it's not as good as the one without caching, but that can only happen with #74

Answer 8 · 2022-07-15T15:37:57.000Z

@kuprel Here's a sample 384px generation without upscaling