Can not reproduce the results

Question

Can not reproduce the results

Kyfafyd opened this issue a year ago · 9 comments

Thanks for sharing the code!
I am trying to reproduce the composition of dog and mug by myself.
My training command is as the following:

accelerate launch train_cones2.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base"  \
  --instance_data_dir="./data/dog" \
  --instance_prompt=dog \
  --token_num=1 \
  --output_dir="cones_v2_output/dog_image" \
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=4000 \
  --loss_rate_first=1e-2 \
  --loss_rate_second=1e-3 \
  --enable_xformers_memory_efficient_attention

Once trained, I use the residual.pt at 4000th iter for inference, using the following inference config:

[
    {
        "prompt":"a mug and a dog on the beach",
        "residual_dict": {
            "dog":"cones_v2_output/dog_video/residual_3000.pt",
            "cat":"cones_v2_output/cat_image/residual.pt",
            "mug":"residuals/mug.pt",
            "flower":"residuals/flower.pt",
            "sunglasses":"residuals/sunglasses.pt",
            "lake":"residuals/lake.pt",
            "barn":"residuals/barn.pt"
        },
        "color_context":{
            "255,192,0":["mug",2.5],
            "255,0,0":["dog",2.5]
        },
        "guidance_steps":50,
        "guidance_weight":0.08,
        "weight_negative":-1e8,
        "layout":"layouts/layout_example.png",
        "subject_list":[["mug",2],["dog",5]]
    }
]

Then I get the generated image, and the dog is not promising (while the mug is good, using the provided mug.pt):

However, using the provided dog.pt, I can get good results:

Could you please help me figure this out? Or is there any tips for obtaining good results?

Answer 1 · 2023-10-03T07:41:30.000Z

Sorry to late reply, have you tried other prompts, or generated single dog using layout guidance?

Answer 2 · 2023-10-03T11:11:28.000Z

Thanks for your response! All of the following results are using layout guidance.
With the prompt A dog on the grass, I get the following:

With the prompt Photo of a dog, I get the following:

Also, I have tried to decrease the lr to 2e-5 and increase the steps to 10000, with the following dog and mug:

Answer 3 · 2023-10-03T12:33:26.000Z

It seems that the images you uploaded are damaged? I will retrain “dog” locally according to your command and see the results.

Answer 4 · 2023-10-03T12:41:20.000Z

Sorry for that, I do not know why this happen... But I can view the images on the phone. Thanks for your response

Answer 5 · 2023-10-05T11:50:50.000Z

I received same question. Used your pt file got great，But train result is fail.

This's my result.(a cat and a dog on the beach)

Answer 6 · 2023-10-10T08:27:56.000Z

Any updates on this? 👀

Answer 7 · 2023-10-10T11:52:05.000Z

Sry to late reply, we tried retraining the white dog and obtained satisfied results, but we did not use 'enable_xformers_memory_efficient_attention'. Have you tried training without enabling memory-efficient training?

Answer 8 · 2023-10-10T12:08:58.000Z

Sry to late reply, we tried retraining the white dog and obtained satisfied results, but we did not use 'enable_xformers_memory_efficient_attention'. Have you tried training without enabling memory-efficient training?

I have not tried trying without enabling memory-efficient because of OOM issue. Could you please show your re-trained results?

Answer 9 · 2023-10-10T12:13:51.000Z

Here are some results：