mk-minchul/AdaFace

Out of memory when setting custom_num_class = 2059906

martinenkoEduard opened this issue · 14 comments

I am trying to teach webface42m but it says "out of memory"
I am using Nvidia Geforce RTX 3080 TI with 12gb of RAM.

When I lower custom_num_class it works fine.

cfivek commented

I might be able to help you, i was dealing with some memory issues as well but sorted it out with different batch sizes / mixed precision/swap memory settings. I am also looking to train it on other datasets as well. If you want some help you can send me an email at vdapple@proton.me

I might be able to help you, i was dealing with some memory issues as well but sorted it out with different batch sizes / mixed precision/swap memory settings. I am also looking to train it on other datasets as well. If you want some help you can send me an email at vdapple@proton.me

I sent you an email!

It says - RuntimeError: CUDA out of memory. Tried to allocate 3.93 GiB (GPU 0; 11.76 GiB total capacity; 9.73 GiB already allocated; 1.03 GiB free; 9.76 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

cfivek commented

I didnt get the email, but did you try running nvidia-smi in the cmd window to see what's eating up the memory?

If the GPU is only 12gb I would try running it with maybe batch size 64 and mixed precision (16 bit)

If the GPU is only 12gb I would try running it with maybe batch size 64 and mixed precision (16 bit)

I tried it .
Even with batch size 2 it says out of memory

It says the same as for ir50 and ir_101

How does incresing classnum increases memory consumption?

I didnt get the email, but did you try running nvidia-smi in the cmd window to see what's eating up the memory?

If the GPU is only 12gb I would try running it with maybe batch size 64 and mixed precision (16 bit)

Yep. Already tried minum batchsize and mixed precision

cfivek commented

It says the same as for ir50 and ir_101

How does incresing classnum increases memory consumption?

Increasing number of classes increases the output layer and needs more memory. I have 2 4090s (48gb) and I would give it a shot to see if that's enough, but I don't have that dataset

It says the same as for ir50 and ir_101
How does incresing classnum increases memory consumption?

Increasing number of classes increases the output layer and needs more memory. I have 2 4090s (48gb) and I would give it a shot to see if that's enough, but I don't have that dataset

https://www.face-benchmark.org/
I got it by request that I have sent by an email.

It says the same as for ir50 and ir_101
How does incresing classnum increases memory consumption?

Increasing number of classes increases the output layer and needs more memory. I have 2 4090s (48gb) and I would give it a shot to see if that's enough, but I don't have that dataset

I have sent you another email.
I can share this data set, but it is huge in size. around 300gb.

cfivek commented

It says the same as for ir50 and ir_101
How does incresing classnum increases memory consumption?

Increasing number of classes increases the output layer and needs more memory. I have 2 4090s (48gb) and I would give it a shot to see if that's enough, but I don't have that dataset

I have sent you another email.
I can share this data set, but it is huge in size. around 300gb.

For some reason I'm not getting your emails, checked spam too. If you have some other way you wanna chat let me know. What about trying my other email - vladskies@gmail.com. or if you want to give me yours I can send you one.

It says the same as for ir50 and ir_101
How does incresing classnum increases memory consumption?

Increasing number of classes increases the output layer and needs more memory. I have 2 4090s (48gb) and I would give it a shot to see if that's enough, but I don't have that dataset

I have sent you another email. I can share this data set, but it is huge in size. around 300gb.

Is it possible to share the dataset 🙏🏼
If yes, my Email address is:

hassan-miqdad@hotmail.com

Thank you so much 🙏🏼

你可以考虑使用insightface中的partial_fc方法,这个可以解决你的问题。但是需要该一定的代码,我已经将adaface添加到insightface中并成功解决了你的这个问题

zws98 commented

你可以考虑使用insightface中的partial_fc方法,这个可以解决你的问题。但是需要该一定的代码,我已经将adaface添加到insightface中并成功解决了你的这个问题

请问您是怎么合并partial fc的?可以请教一下吗?