cluster_weighted_diounms crashing ultralytics yolov5
bsugerman opened this issue · 6 comments
I've found that using your nms in yolov5 is giving me better AP scores, contrary to an issue also raised in this repo. However, I'm finding that sometimes when I use it, the testing phase on the validation set causes CUDA to run out of memory no matter how small the batch size. I have 11Gb of CUDA RAM, and might be running training that uses only 6Gb of it. Halfway though the NMS calculation, I get an error like this:
Traceback:
method='cluster_weighted_diounms'
line 675 in non_mas_suppression
iou=box_diou(boxes,boxes).triu_(diagonal=1)
line 356, in box_diou
c=((crb-clt)**2.sum(dim=2)
RuntimeError: CUDA out of Memory. Tried to allocate ...
I know this is somewhat vague and I can provide more concrete details but I thought I would start with asking whether you think these lines could cause some sort of memory overflow in the middle of running test.py from the yolov5 library?
@bsugerman that's interesting. Do you have a fork or branch we could look at?
What AP scores did you get with the default NMS, merge NMS, and Cluster NMS?
Some of the methods that use matmuls can create very large operations, which is why we cap the box count at 300 here:
https://github.com/ultralytics/yolov5/blob/b42e8a531b8eb12b2f71c00ddc4dea187633f7c7/utils/general.py#L607
Boxes past 300 are removed here:
https://github.com/ultralytics/yolov5/blob/b42e8a531b8eb12b2f71c00ddc4dea187633f7c7/utils/general.py#L657-L658
@bsugerman I experimented a bit with the YOLOv5 NMS function today, and observed that I could remove the .float() conversion. This may help reduce memory consumption by allowing more fp16 ops in place of fp32 ops.
This is implemented here:
ultralytics/yolov5#814
@bsugerman yeah, I met the same problem before when I turned on test augmentation. I run testing on 2 GTX 1080ti. And GPU 0 always took high memory. But when I turned off test augmentation, "out of memory" did not happen.
Yes, I had made that change a while ago because it didn't make sense to keep float32. :-)
@bsugerman ah I see. If you implement any other improvements that appear useful feel free to submit PRs as well! With regard to your multi-gpu issues, you should simply try to create a new python 3.8 venv and go from there. Conda can be problematic.