Release VD-Basic

Question

Release VD-Basic

litevex opened this issue 2 years ago · 12 comments

litevex commented 2 years ago

For people only interested in the variation part, it would be good to have the VD-Basic checkpoint public.

Answer 1 · 2022-11-17T14:29:16.000Z

exactly. just want to play with the variation tool-it makes much fun.

Answer 2 · 2022-11-17T17:51:08.000Z

We will release the basic model in later updates. Please stay tuned.

Answer 3 · 2022-11-18T06:38:51.000Z

Image variation part only is a 3gb+ unet just as justinpinkney's (I made it into JITs before: https://huggingface.co/Larvik/sd470k_imgemb/tree/main)
Load a 12gb weight file for that is not necessary.

Also the cond image will be resized to 224x224 anyway, make that a [1,3,512,512] thing is also not necessary.

A short script to make image cond:

import os
import torch
from PIL import Image
from torchvision import transforms
def load_im(im_path):
    if im_path.startswith("http"):
        response = requests.get(im_path)
        response.raise_for_status()
        im = Image.open(BytesIO(response.content))
    else:
        im = Image.open(im_path).convert("RGB")
    tforms = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
    ])
    inp = tforms(im).unsqueeze(0)
    return inp*2-1
if not os.path.isfile('imgemb.pt'):
  !wget https://huggingface.co/Larvik/imgemb_t1/resolve/main/imgemb.pt
  #https://huggingface.co/Larvik/imgemb/resolve/main/imgemb.pt
imgemb=torch.jit.load('imgemb.pt').float()
c=imgemb.norm( imgemb.proj_all(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) )

#for justinpinkney's
#c= imgemb.proj(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) )

Some results: https://imgur.com/a/7YvwUmI
First one is cond image, other are vars. Seems not work well when non-square (vars are 704x768).

Also I made a nb for free tier 12gb sys ram Colab:
https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Bascially, without meta device trick (https://pytorch.org/torchdistx/latest/fake_tensor.html), Pytorch requires 2x sys ram to load a model and its weights.
This one is likely even go 2.5x because it loads vae weights and (textual?) CLIP weights over again, which are already inside that 12gb weight file.

The notebook above extracts weight blobs to disk and just load the needful ones, and made it possible to run on 12gb sys ram colab (can even without colab gpu).

Answer 4 · 2022-11-18T10:46:39.000Z

Image variation part only is a 3gb+ unet just as justinpinkney's (I made it into JITs before: https://huggingface.co/Larvik/sd470k_imgemb/tree/main) Load a 12gb weight file for that is not necessary.

Also the cond image will be resized to 224x224 anyway, make that a [1,3,512,512] thing is also not necessary.

A short script to make image cond:
import os
import torch
from PIL import Image
from torchvision import transforms
def load_im(im_path):
    if im_path.startswith("http"):
        response = requests.get(im_path)
        response.raise_for_status()
        im = Image.open(BytesIO(response.content))
    else:
        im = Image.open(im_path).convert("RGB")
    tforms = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
    ])
    inp = tforms(im).unsqueeze(0)
    return inp*2-1
if not os.path.isfile('imgemb.pt'):
  !wget https://huggingface.co/Larvik/imgemb_t1/resolve/main/imgemb.pt
  #https://huggingface.co/Larvik/imgemb/resolve/main/imgemb.pt
imgemb=torch.jit.load('imgemb.pt').float()
c=imgemb.norm( imgemb.proj_all(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) )

#for justinpinkney's
#c= imgemb.proj(  imgemb(  imgemb.preproc(load_im('xipooh.jpg')) ) ) 
Some results: https://imgur.com/a/7YvwUmI First one is cond image, other are vars. Seems not work well when non-square (vars are 704x768).

Also I made a nb for free tier 12gb sys ram Colab: https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Bascially, without meta device trick (https://pytorch.org/torchdistx/latest/fake_tensor.html), Pytorch requires 2x sys ram to load a model and its weights. This one is likely even go 2.5x because it loads vae weights and (textual?) CLIP weights over again, which are already inside that 12gb weight file.

The notebook above extracts weight blobs to disk and just load the needful ones, and made it possible to run on 12gb sys ram colab (can even without colab gpu).

hello! i just have to run this code to get variations of given image? anyway to contact you (discord?).

Answer 5 · 2022-11-18T12:29:22.000Z

Image variation part only is a 3gb+ unet

Is it possible to split the other unets too? (Image+Text guidance, Image2Text, etc)

Answer 6 · 2022-11-18T17:35:36.000Z

Image variation part only is a 3gb+ unet

Is it possible to split the other unets too? (Image+Text guidance, Image2Text, etc)

Yea, it is possible. You can cut out a subnetwork for text-to-image-only or image-variation-only from either vd-dc or vd-official.

Answer 7 · 2022-11-18T17:45:38.000Z

your colab is giving error in cell 3 this is the error: NameError: name 'not_txtALL' is not defined

Answer 8 · 2022-11-18T18:56:07.000Z

your colab is giving error in cell 3 this is the error: NameError: name 'not_txtALL' is not defined

I haven't got a chance to work and release colab, or do I get you wrong?

Answer 9 · 2022-11-18T19:27:26.000Z

@xingqian2018
Sorry i meant this is the colab https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Answer 10 · 2022-11-19T03:21:51.000Z

@xingqian2018 Sorry i meant this is the colab https://colab.research.google.com/github/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_vrs.ipynb

Green ones are must-click, replace yellow one with your own image, click the red one when it finished.
Also not tested on gpu instances yet, if it breaks on somewhere when on gpu then you must fix by hand.

I expected anyone who use colab can read basic python tho, and know clicking "show code" on a cell it will show the codes.

@demosch
Not quite using discord tho, have some posts on LAION sever under name ThugaterDios.

In my main notebook (https://github.com/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_SR_jited.ipynb)
both justinpinkney's [1,768] and this [257,768] image-cond-SD are ready for use.
It supports custom width height, concating multiple cond images together, all k-samplers .etc, but errr... not quite user friendly.

By now you have to add _imgemb_vrs to the model entries and using pre-compiled binary prompt,
to use that [257,768] image-cond weights from vd-official.pth.

binaryprompts.zip

Some results, concating two cond images together (pooh toy and firefox logo), i.e. from a [514,768] image cond:
https://imgur.com/a/6KyeyM0

Answer 11 · 2022-11-19T04:37:05.000Z

@lucasbr15 @TabuaTambalam
Thanks for sharing. This colab is not created by our team. I think they truncated some code from this Github. We are working to create a colab demo in parallel with the HuggingFace with all supported functions. Please stay tuned.

Answer 12 · 2023-02-08T16:30:49.000Z

@litevxx A new codebase of Versatile Diffusion has been pushed and now you can easy segment out the single flow model you need. (i.e. VD-basic)