Latte is by far the closest to SORA among the open-source video generation models.
Original Latte didn't provide text to video training code. We reproduced the paper and implemented the text to video training based on the paper.
Please find out more details from the paper:
The following improvements are implemented to the training code:
- added the support of gradient accumulation (config:
gradient_accumulation_steps
) - added valiation samples generation to generate (config:
validation
) testing videos in the training process - added wandb support
- added classifier-free guidance training (config:
cfg_random_null_text_ratio
)
First, download and set up the repo:
git clone https://github.com/lyogavin/Latte_t2v_training.git
conda env create -f environment.yml
conda activate latte
If you find it too complicated to setup the environment and solve all the package versions, cuda drivers, etc, you can try our vast.ai template here.
You can download the pretrained model as follows:
sudo apt-get install git-lfs # or: sudo yum install git-lfs
git lfs install
git clone --depth=1 --no-single-branch https://huggingface.co/maxin-cn/Latte /root/pretrained_Latte/
Put video files in a directory and create a csv file to specify the prompt for each video.
The csv file format:
video_file_name | prompt |
---|---|
VIDEO_FILE_001.mp4 | PROMPT_001 |
VIDEO_FILE_002.mp4 | PROMPT_002 |
... | ... |
Config is in configs/t2v/t2v_img_train.yaml
and it's pretty self-explanotary.
A few config entries to note:
- point
video_folder
andcsv_path
to the path of training data - point
pretrained_model_path
to thet2v_required_models
directory of downloaded model. - point
pretrained
to the t2v.pt file in the downloaded model - You can change
text_prompt
undervalidation
section to the testing validation prompts. During the training process everyckpt_every
steps, it'll test generating videos based on the prompts and publish to wandb for you to checkout.
./run_img_t2v_train.sh
We recommend vast.ai GPUs for training.
We find it pretty good, low price, good network speed, wide range of GPUs to choose. Everything professionally optimized for AI training.
Feel free to use our template here where the environment is all ready to use.
Reference original repo for how to infer.
Buy me a coffee please! 🙏