haoheliu/AudioLDM-training-finetuning

Requirements

Opened this issue · 1 comments

Hello!

I am a beginner and I've wanted to play with your finetuning script so I decided to run it on Google Colab. Just to test it out, I prepared train, test and eval datasets which contained only one audio with caption. When I ran this script, it consumed about 15 GB of GPU vRAM and the process was killed. Is this a normal behaviour? If so, do you guys know how much vRAM do I need to finetune this model on more reasonable datasets (about 1000 rows)?

Thank you in advance for your reply!

I would like to also add that I haven't changed any hyperparameters and used audioldm_train/config/2023_08_23_reproduce_audioldm/audioldm_original_medium.yaml config. Also. here is the screenshot of the moment when process was killed:

image