Face clustering runs Out of memory
blackmore190 opened this issue · 4 comments
I have a lot of faces, running nextcloud aio with external model. If Run the Background Jon via occ it Crashes with this Message:
'''61580 faces found for clustering
PHP Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to al
locate 536870920 bytes) in /var/www/html/custom_apps/facerecognition/lib/Backgro
undJob/Tasks/CreateClustersTask.php on line 330'''
Is it possible to Split the clustering into smaller packages wich could be handled by the allowed memory size?
Thanks!
@blackmore190 I had the same problem. I looked at the code that the error message pointed to, it seems the problem is that the program needs to allocate memories for all possible pairs of images (unless it is too different based on the Euclidean distance) in order to run the Chinese Whisper algorithm. This doesn't work for large number of images.
Without a change on how the algorithm works, the only thing we can do is to lower the clustering threshold (in the admin setting panel). I have 118k images, 54k faces and need to low the threshold to 0.32 in order to complete the clustering in a machine with 11G RAM. The result is still pretty good.
I do think the current implementation of algorithm is not scalable to very large number of images. So I hope a new algorithm can be used to address the issue.
Hi both,
You can try this commit which will be part of an upcoming update next week.
58e3e0e
I was focused on improve the speed of clustering, but in the specific case of what you describe in the last comment it should help you since the comparison is also done in batches, but in particular all the faces must still be loaded into memory, which consumes a lot of resources.
@matiasdelellis Thank you for this change. It worked well for me. With over 55K faces, it would not complete with 2G maximum memory, but is now reliable and takes less time to finish than it took to fail without this change.
Some tuning of the batch size seemed to help:
- With the suggested setting setting for clustering_batch_size of 1000, about 32K clusters were found.
- Batch size 5000, 30K clusters
- Batch size 8000, 28K clusters
- Batch size 20000, 26K clusters (This looks like it will remain within 1G maximum memory)
The main benefit was a noticeable improvement in the quality of clustering as the batch size is larger.