/train_your_own_sora

Primary LanguagePythonApache License 2.0Apache-2.0

Latte Text to Video Training

Latte is by far the closest to SORA among the open-source video generation models.

Original Latte didn't provide text to video training code. We reproduced the paper and implemented the text to video training based on the paper.

Please find out more details from the paper:

Latte: Latent Diffusion Transformer for Video Generation

The architecture of Latte

Improments

The following improvements are implemented to the training code:

  • added the support of gradient accumulation (config: gradient_accumulation_steps)
  • added valiation samples generation to generate (config: validation) testing videos in the training process
  • added wandb support
  • added classifier-free guidance training (config: cfg_random_null_text_ratio)

Step 1: setup the environment

First, download and set up the repo:

git clone https://github.com/lyogavin/Latte_t2v_training.git
conda env create -f environment.yml
conda activate latte

If you find it too complicated to setup the environment and solve all the package versions, cuda drivers, etc, you can try our vast.ai template here.

Step 2: download pretrained model

You can download the pretrained model as follows:

sudo apt-get install git-lfs # or: sudo yum install git-lfs
git lfs install

git clone --depth=1 --no-single-branch  https://huggingface.co/maxin-cn/Latte /root/pretrained_Latte/

Step 4: prepare training data

Put video files in a directory and create a csv file to specify the prompt for each video.

The csv file format:

video_file_name prompt
VIDEO_FILE_001.mp4 PROMPT_001
VIDEO_FILE_002.mp4 PROMPT_002
... ...

Step 5: config

Config is in configs/t2v/t2v_img_train.yaml and it's pretty self-explanotary.

A few config entries to note:

  • point video_folder and csv_path to the path of training data
  • point pretrained_model_path to the t2v_required_models directory of downloaded model.
  • point pretrained to the t2v.pt file in the downloaded model
  • You can change text_prompt under validation section to the testing validation prompts. During the training process every ckpt_every steps, it'll test generating videos based on the prompts and publish to wandb for you to checkout.

Step 6: train!

./run_img_t2v_train.sh

Cloud GPUs

We recommend vast.ai GPUs for training.

We find it pretty good, low price, good network speed, wide range of GPUs to choose. Everything professionally optimized for AI training.

Feel free to use our template here where the environment is all ready to use.

Inference

Reference original repo for how to infer.

Stay Connected with Us

Wechat public account

group

Wechat group

group

Discord

Discord

Tech Blog

Website

Little RedBook

redbook

Contribution

Buy me a coffee please! 🙏

"Buy Me A Coffee"

By: Anima AI

aiwrite