XinzeLee/PolygonObjectDetection

I found a bug and improved it with a huge performance boost

18112330636 opened this issue · 5 comments

The core idea of this model is to regress the deviation of the four vertices of the quadrilateral to the center point. To simplify the problem, sort the vertices of the quad such that y3, y4 >= y1, y2; x1 <= x2; x4 <= x3. But there is a flaw. If the diagonal of the quadrilateral is parallel to the x-axis, then y2=y3, then this method is ambiguous, and the order of the left and right vertices on the diagonal is undefined. This ambiguity results in greatly reduced performance of the model. Moreover, three for loops are written in the method of sorting vertices, resulting in very slow training. Before the improvement, my data required training for 1000epoch to reach the 0.5 threshold of 94map. After the improvement, my data only needs to train 120epoch to reach the 0.5 threshold of 96map. Also, the pre-improved model cannot be scaled and rotated. After the improvement, the degree of polygon_random_perspective is changed to 90, the scale is changed to 0.5, and the model can still reach 96map at about 150epoch. Below is my code。I want to change jobs in Shanghai, China, Interested can contact me。 28 years old with three years of CV experience。number 18112330636

def order_corners(boxes):
"""
Return sorted corners for loss.py::class Polygon_ComputeLoss::build_targets
Sorted corners have the following restrictions:
y3, y4 >= y1, y2; x1 <= x2; x4 <= x3
"""
if boxes.shape[0] == 0:
return torch.empty(0, 8, device=boxes.device)

boxes = boxes.view(-1, 4, 2)
x = boxes[..., 0]
y = boxes[..., 1]
y_sorted, y_indices = torch.sort(y) # sort y
idx = torch.arange(0, y.shape[0], dtype=torch.long, device=boxes.device)
complete_idx = idx[:, None].repeat(1, 4)
x_sorted = x[complete_idx, y_indices]
x_sorted[:, :2], x_bottom_indices = torch.sort(x_sorted[:, :2])
x_sorted[:, 2:4], x_top_indices = torch.sort(x_sorted[:, 2:4], descending=True)
y_sorted[idx, :2] = y_sorted[idx, :2][complete_idx[:, :2], x_bottom_indices]
y_sorted[idx, 2:4] = y_sorted[idx, 2:4][complete_idx[:, 2:4], x_top_indices]
special = (y_sorted[:, 1] == y_sorted[:, 2]) & (x_sorted[:, 1] > x_sorted[:, 2])
if idx[special].shape[0]  != 0:
    x_sorted_1 = x_sorted[idx[special], 1].clone()
    x_sorted[idx[special], 1] = x_sorted[idx[special], 2]
    x_sorted[idx[special], 2] = x_sorted_1
return torch.stack((x_sorted, y_sorted), dim=2).view(-1, 8).contiguous()

Can you kindly paste your comparison results here? I will take a look and modify the code if necessary. Thanks!

I have checked your code, and I found that indeed you have modified the for loop to matrix slicing version. This feature is great and I will update the code. For the diagnal parallel problem, does it have a significant impact on the performance? Can you show me your training process and results before and after the modification?
Thanks! Will mention your contribution on this repo!

before improvement.txt
after improvement.txt
special = (y_sorted[:, 1] == y_sorted[:, 2]) & (x_sorted[:, 1] > x_sorted[:, 2])
if idx[special].shape[0] != 0:
x_sorted_1 = x_sorted[idx[special], 1].clone()
x_sorted[idx[special], 1] = x_sorted[idx[special], 2]
x_sorted[idx[special], 2] = x_sorted_1

### When the diagonal of the quadrilateral is parallel to the X axis, then the order of x2, x3 is undefined
My improvement essentially enhances the model, not just to speed up the training, the map obtained with the original method are lower than with my method, and the training time is several times longer. Above are my two training results. The first file is the original order_corners training result and the second file is my improved training result. It can be clearly seen that my improvements have improved the performance of the model. Both training results are under the condition that the angle of polygon_random_perspective is 90 and the scale is 0.5. It can be clearly seen that when the angle range is large, the diagonal of the quadrilateral is parallel to the X axis more, and the uncertainty of x2 and x3 at this time leads to a large decrease in the performance of the model. I tested mosaic: 1.0 again. But it still converges quickly. The nc of my data is 4. I feel this ambiguity is very restrictive for the model. Mosaic enhancement is a very important method for yolov5 to improve performance. This parameter cannot be turned on without my method.

I still have a little idea of ​​improving this model. When the data needs to be clipped after polygon_random_perspective, the coordinates of the target will exceed the image, and this model does not do it, but when the center of the target is in the image. but this is clearly problematic. This problem can be divided into three cases. The target obtained after the target overflow image is cropped may be a triangle, a quadrilateral, and a pentagon. I think it's possible to do some filtering and then clip the coordinates of the quad to the inside of the image. The accuracy should be improved at this time. I can't think of a good solution for the pentagon.

Yes, I made the modifications to the def order_corners accordingly.

By the way, you concerns regarding the polygon_random_perspective and mosaic also occurred to me before, but hard to find solutions. If you have further improvements, please let me know. Thanks!