FFCV accrues massive overhead with heavy transform pipelines
akashc1 opened this issue · 1 comments
akashc1 commented
Hi,
I've noticed that FFCV can deliver nice speedups for individual steps. For context:
- I have lots of images of shape 256 x 256 x 3
- I use many augmentations, some even repeated as I'm using an SSL method which relies on this for performance
- My baseline
__getitem__
call:- read a JPEG image from a NFS
- decode JPEG
- Perform augmentations
Noticing with FFCV:
- Dataloading on its own is 3-4x faster (reading from disk + JPEG decoding)
- Transforms on their own are 3-4x faster after optimization on my end (due to using
numba
or other tricks) - E2E dataloading with FFCV is 2-3x slower than my baseline.
I profiled it quite a bit and it seems to be due to FFCV overhead. I can see in htop
that even when using something like 16 dataloader workers, most cpu cores are actually idle
Has anybody else experienced something similar, or would have any tips on debugging/addressing?
Thank you!
bordesf commented
Thanks for opening this issue. Can you define what you mean by "lots of images" or "heavy transform" pipelines ? Is it 1B images and 20 transforms or just 1M images with 4 transforms ? In addition, it would be easier if you could share a snippet of your code to see if this is reproducible on our side.