VinAIResearch/LFM

About training time. I found that it takes two hours to complete 4 epochs.

sumorday opened this issue · 7 comments

Is this speed normal? If it takes two hours to complete 4 epochs, then wouldn't it take nearly 250 hours, or about 11 days, to complete a total of 500 epochs? Is this normal for celeba(256x256) dataset(celeb256_dit.txt)?

スクリーンショット 2024-03-11 午後7 09 12

If you are using DiT architecture, you should install torch>=2.0, the flash attention will allow you train faster but will sacrifice some performance. Or you could run encoder on the training data to get latent data before training lfm, you could feed more batch size. Hope these tricks help you.

If you are using DiT architecture, you should install torch>=2.0, the flash attention will allow you train faster but will sacrifice some performance. Or you could run encoder on the training data to get latent data before training lfm, you could feed more batch size. Hope these tricks help you.

Thank you for the response. I will try adjusting the torch version (mainly concerned about compatibility issues).
As for running the encoder, hasn't the autoencoder already been used in the train_flow_latent.py? Or do I need to configure something to run the encoder on the training data?

If you are using DiT architecture, you should install torch>=2.0, the flash attention will allow you train faster but will sacrifice some performance. Or you could run encoder on the training data to get latent data before training lfm, you could feed more batch size. Hope these tricks help you.

截屏2024-03-13 10 55 59
截屏2024-03-13 10 56 13
Is the "--f" here automatically utilizing an autoencoder?

截屏2024-03-15 13 45 06
Is this, change false to true?

quandao10

@hao-pt Xin chào, Quandao đề xuất tôi nên bật bộ mã hóa để tăng tốc độ huấn luyện. Tôi muốn hỏi, liệu tôi có thể bắt đầu lại quá trình huấn luyện bộ giải mã mã hóa ở đây bằng cách thay đổi giá trị từ false thành true cho mô hình giai đoạn đầu tiên không? Mong được câu trả lời của bạn. (Dường như tôi không cần mô hình được tiền huấn luyện vì sử dụng cho các tập dữ liệu khác và có thêm các phương pháp khác.)

Hello, Quandao suggested that I enable the encoder to speed up the training process. I would like to ask, can I restart the training of the encoder-decoder model from scratch by changing the value from false to true for the first_stage_model_train here? Looking forward to your answer. (It seems that I don't need a pretrained model because I am using it on different datasets and have added other methods.)

The use of pretrained autoencoder is to enhance training efficiency and performance of the model. Hence, first_stage_model here is only performed in inference mode, without further training. In case, you want to train from scratch an autoencoder which is out of our focus. As long as your dataset still follows similar statistics of natural images, there is no need to retrain the autoencoder.

run encoder on the training data to get latent data before training lfm

Thank you. Because I saw another comment saying "run encoder on the training data to get latent data before training lfm," I'm curious if I need to set up another encoder separately. So, as long as the code being run is as follows: !bash ./bash_scripts/run.sh ./test_args/celeb256_dit.txt, it will be the correct Flow Matching in Latent Space, right?

Bởi vì tôi thấy một comment khác nói "chạy encoder trên dữ liệu huấn luyện để có dữ liệu ẩn trước khi huấn luyện lfm," nên tôi muốn biết liệu tôi có cần phải thiết lập một encoder khác một cách riêng biệt không. Vì vậy, miễn là mã được chạy như sau: !bash ./bash_scripts/run.sh ./test_args/celeb256_dit.txt, nó sẽ là Flow Matching in Latent Space đúng, phải không?