Inpainting.mp4
-
The input image is enlarged to provide more room for the inpainting process. However, this step can be overlooked if the input image does not feature a close-up view of the object.
-
A mask is created for the input image. Given that the input images have a white background, the object is concealed with a black mask, leaving the background unaffected.
-
The input image, mask and text prompt are passed to the inpainting model
"stabilityai/stable-diffusion-2-inpainting"
using Hugging Facediffusers
.
-
The inpainted images, originally of size
512 X 512
, are expanded by adding50
pixels in all four directions. This results in an image of size562 X 562
, which is subsequently resized back to512 X 512
. -
The resized image is then masked.
-
The masked image, along with a text prompt, undergoes an inpainting process.
-
This entire procedure is repeated iteratively to generate a sequence of 9 images, thereby creating a zooming-out effect.
High-quality GIFs for each example are available here
Input Image | Inpainted Image |
---|---|
![]() |
![]() |
Product in a kitchen used in meal preparation
output.mp4
![]() |
![]() |
Bicycle on a street
output.mp4
![]() |
![]() |
Toaster placed on kitchen stand
output.mp4
![]() |
![]() |
Chair behind a table in a study room
output.mp4
![]() |
![]() |
Tent in a forest
output.mp4
![]() |
![]() |
A bottle of whisky on stand of a bar
output.mp4
The model fails to inpaint images involving humans.
Input Image | Inpainted image |
---|---|
![]() |
![]() |
Person standing in a hall meeting people
![]() |
![]() |
Person standing in a hall meeting people
conda env create -f inpainting.yml
conda create -n impainting python=3.11
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install diffusers
pip install transformers
pip install opencv-python
cd src
python main.py --prompt "<your text prompt>" --image_path "<path to your image>" --upscale True
By default, the upscale
parameter is set to False
. This is used for images where the main object occupies a large portion of the image. The purpose of setting upscale to True
is to reduce the size of the main object, thereby improving the inpainting process
Use the inpainted image to generate a zooming-out video.
video.py
generates frames for the videorender.py
generates GIF from frames. You can play withduration
to change smoothness of videos.
cd src
python video.py --prompt "<your text prompt>" --image_path "<path to inpainted image>"
python render.py --path "<path to folder containing generated frames>"
For the experiments, the prompts
used for video generation were same to those used during the inpainting process.
If you find this work useful, please cite:
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}