ali-vilab/Cones-V2

Can not reproduce the results

Kyfafyd opened this issue · 9 comments

Thanks for sharing the code!
I am trying to reproduce the composition of dog and mug by myself.
My training command is as the following:

accelerate launch train_cones2.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base"  \
  --instance_data_dir="./data/dog" \
  --instance_prompt=dog \
  --token_num=1 \
  --output_dir="cones_v2_output/dog_image" \
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=4000 \
  --loss_rate_first=1e-2 \
  --loss_rate_second=1e-3 \
  --enable_xformers_memory_efficient_attention

Once trained, I use the residual.pt at 4000th iter for inference, using the following inference config:

[
    {
        "prompt":"a mug and a dog on the beach",
        "residual_dict": {
            "dog":"cones_v2_output/dog_video/residual_3000.pt",
            "cat":"cones_v2_output/cat_image/residual.pt",
            "mug":"residuals/mug.pt",
            "flower":"residuals/flower.pt",
            "sunglasses":"residuals/sunglasses.pt",
            "lake":"residuals/lake.pt",
            "barn":"residuals/barn.pt"
        },
        "color_context":{
            "255,192,0":["mug",2.5],
            "255,0,0":["dog",2.5]
        },
        "guidance_steps":50,
        "guidance_weight":0.08,
        "weight_negative":-1e8,
        "layout":"layouts/layout_example.png",
        "subject_list":[["mug",2],["dog",5]]
    }
]

Then I get the generated image, and the dog is not promising (while the mug is good, using the provided mug.pt):
image

However, using the provided dog.pt, I can get good results:
image

Could you please help me figure this out? Or is there any tips for obtaining good results?

Sorry to late reply, have you tried other prompts, or generated single dog using layout guidance?

Thanks for your response! All of the following results are using layout guidance.
With the prompt A dog on the grass, I get the following:
image

With the prompt Photo of a dog, I get the following:
image

Also, I have tried to decrease the lr to 2e-5 and increase the steps to 10000, with the following dog and mug:
image

It seems that the images you uploaded are damaged? I will retrain “dog” locally according to your command and see the results.

Sorry for that, I do not know why this happen... But I can view the images on the phone. Thanks for your response

I received same question. Used your pt file got great,But train result is fail.

This's my result.(a cat and a dog on the beach)
all_images

Any updates on this? 👀

Sry to late reply, we tried retraining the white dog and obtained satisfied results, but we did not use 'enable_xformers_memory_efficient_attention'. Have you tried training without enabling memory-efficient training?

Sry to late reply, we tried retraining the white dog and obtained satisfied results, but we did not use 'enable_xformers_memory_efficient_attention'. Have you tried training without enabling memory-efficient training?

I have not tried trying without enabling memory-efficient because of OOM issue. Could you please show your re-trained results?

Here are some results:

5
12