SHI-Labs/Versatile-Diffusion

Release VD-Basic

litevex opened this issue · 12 comments

For people only interested in the variation part, it would be good to have the VD-Basic checkpoint public.

exactly. just want to play with the variation tool-it makes much fun.

We will release the basic model in later updates. Please stay tuned.

Image variation part only is a 3gb+ unet just as justinpinkney's (I made it into JITs before: https://huggingface.co/Larvik/sd470k_imgemb/tree/main)
Load a 12gb weight file for that is not necessary.

Also the cond image will be resized to 224x224 anyway, make that a [1,3,512,512] thing is also not necessary.

A short script to make image cond:

import os
import torch
from PIL import Image
from torchvision import transforms
def load_im(im_path):
    if im_path.startswith("http"):
        response = requests.get(im_path)
        response.raise_for_status()
        im = Image.open(BytesIO(response.content))
    else:
        im = Image.open(im_path).convert("RGB")
    tforms = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
    ])
    inp = tforms(im).unsqueeze(0)
    return inp*2-1
if not os.path.isfile('imgemb.pt'):
  !wget https://huggingface.co/Larvik/imgemb_t1/resolve/main/imgemb.pt
  #https://huggingface.co/Larvik/imgemb/resolve/main/imgemb.pt
imgemb=torch.jit.load('imgemb.pt').float()
c=imgemb.norm( imgemb.proj_all(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) )

#for justinpinkney's
#c= imgemb.proj(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) 

Some results: https://imgur.com/a/7YvwUmI
First one is cond image, other are vars. Seems not work well when non-square (vars are 704x768).

Also I made a nb for free tier 12gb sys ram Colab:
https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Bascially, without meta device trick (https://pytorch.org/torchdistx/latest/fake_tensor.html), Pytorch requires 2x sys ram to load a model and its weights.
This one is likely even go 2.5x because it loads vae weights and (textual?) CLIP weights over again, which are already inside that 12gb weight file.

The notebook above extracts weight blobs to disk and just load the needful ones, and made it possible to run on 12gb sys ram colab (can even without colab gpu).

Image variation part only is a 3gb+ unet just as justinpinkney's (I made it into JITs before: https://huggingface.co/Larvik/sd470k_imgemb/tree/main) Load a 12gb weight file for that is not necessary.

Also the cond image will be resized to 224x224 anyway, make that a [1,3,512,512] thing is also not necessary.

A short script to make image cond:

import os
import torch
from PIL import Image
from torchvision import transforms
def load_im(im_path):
    if im_path.startswith("http"):
        response = requests.get(im_path)
        response.raise_for_status()
        im = Image.open(BytesIO(response.content))
    else:
        im = Image.open(im_path).convert("RGB")
    tforms = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
    ])
    inp = tforms(im).unsqueeze(0)
    return inp*2-1
if not os.path.isfile('imgemb.pt'):
  !wget https://huggingface.co/Larvik/imgemb_t1/resolve/main/imgemb.pt
  #https://huggingface.co/Larvik/imgemb/resolve/main/imgemb.pt
imgemb=torch.jit.load('imgemb.pt').float()
c=imgemb.norm( imgemb.proj_all(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) )

#for justinpinkney's
#c= imgemb.proj(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) 

Some results: https://imgur.com/a/7YvwUmI First one is cond image, other are vars. Seems not work well when non-square (vars are 704x768).

Also I made a nb for free tier 12gb sys ram Colab: https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Bascially, without meta device trick (https://pytorch.org/torchdistx/latest/fake_tensor.html), Pytorch requires 2x sys ram to load a model and its weights. This one is likely even go 2.5x because it loads vae weights and (textual?) CLIP weights over again, which are already inside that 12gb weight file.

The notebook above extracts weight blobs to disk and just load the needful ones, and made it possible to run on 12gb sys ram colab (can even without colab gpu).

hello! i just have to run this code to get variations of given image? anyway to contact you (discord?).

Image variation part only is a 3gb+ unet

Is it possible to split the other unets too? (Image+Text guidance, Image2Text, etc)

Image variation part only is a 3gb+ unet

Is it possible to split the other unets too? (Image+Text guidance, Image2Text, etc)

Yea, it is possible. You can cut out a subnetwork for text-to-image-only or image-variation-only from either vd-dc or vd-official.

your colab is giving error in cell 3 this is the error: NameError: name 'not_txtALL' is not defined

your colab is giving error in cell 3 this is the error: NameError: name 'not_txtALL' is not defined

I haven't got a chance to work and release colab, or do I get you wrong?

@xingqian2018 Sorry i meant this is the colab https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

clikc

Green ones are must-click, replace yellow one with your own image, click the red one when it finished.
Also not tested on gpu instances yet, if it breaks on somewhere when on gpu then you must fix by hand.

I expected anyone who use colab can read basic python tho, and know clicking "show code" on a cell it will show the codes.

@demosch
Not quite using discord tho, have some posts on LAION sever under name ThugaterDios.

In my main notebook (https://github.com/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_SR_jited.ipynb)
both justinpinkney's [1,768] and this [257,768] image-cond-SD are ready for use.
It supports custom width height, concating multiple cond images together, all k-samplers .etc, but errr... not quite user friendly.

By now you have to add _imgemb_vrs to the model entries and using pre-compiled binary prompt,
to use that [257,768] image-cond weights from vd-official.pth.
tbb0
tbb1
binaryprompts.zip

Some results, concating two cond images together (pooh toy and firefox logo), i.e. from a [514,768] image cond:
https://imgur.com/a/6KyeyM0

@lucasbr15 @TabuaTambalam
Thanks for sharing. This colab is not created by our team. I think they truncated some code from this Github. We are working to create a colab demo in parallel with the HuggingFace with all supported functions. Please stay tuned.

@litevxx A new codebase of Versatile Diffusion has been pushed and now you can easy segment out the single flow model you need. (i.e. VD-basic)