tumurzakov/AnimateDiff

How to train new motion patterns with large-domain-gap datasets?

sunnyHelen opened this issue · 4 comments

Hi, thanks a lot for sharing the training code. If I need to learn new motion patterns with my own datasets which are very different from normal realistic videos, how should I train the model? Do you have any insight? Is the train.py file used for fine-tuning the motion module? And the train_lora.py file is for training the Lora module which is presented as domain adapter Lora in the Animatediff file? Should I use train_lora.py to adapt to my own datasets' domain and then use train.py to learn the new motion patterns?

If you can give me some guidance, I would really appreciate it.

Thanks a lot.

Hello

Everything depends on your dataset.

Read how opensora and stability ai prepare their datasets:

https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_01.md
https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_02.md
https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_03.md
https://arxiv.org/pdf/2311.15127

carefully read sections about dataset preparation, it is multistep process

You must categorize your motions and make precise prompts.

I think it is better to train one concept per lora. Don't finetune motion module itself because forgetting. My experiments shows that you need 100 epochs to good learn. Take motion module v3 . It is best quality (for 24 frames). If you need more, then must train yourself but it is very expensive. I trained 48 and 96 models and it tooks weeks of train and quality was so so. Because my dataset is poor.

About traning, i train with my framework latentflow. Scripts on this repo obsolete quite a bit, because now diffusers train with peft. Take a look at train script I adding lora to all attention layers, unet and motion module.

Thanks a lot for your quick reply. I really appreciate it. Do you mean I should train separate loras for domain adaptation and then for one motion pattern? I think I may need a motion with 94 frames.

You must go step by step. First train one motion in lora. I trained 96 frames for 512x288 resolution. It took near 24GB. If you will get quite good results, generate many samples and increase your dataset with them. After n step with loras you will get much bigger dataset and could train on it. Look it as distillation.

Thanks a lot for your kind advice. I'll try.