ultralytics/yolov5

Can't 2nd-classifier predict more detailed class than detection model?

Closed this issue ยท 21 comments

Search before asking

  • I have searched the YOLOv5 issues and found no similar feature requests.

Description

Currently, the detection and 2nd-classification models must have the same number of classes.
However, if the number of classes in the classification model can be larger than that of detection model, it may lead to higher achievement with less effort for following 2 reasons.

(1) fine-grain classifier can achieve higher performance than that of coarse grain classifier
Classification model can recognize better with dataset of fine-grained labels (e.g. "dog", "cat", ...) than coarse-grained dataset (e.g. "animal")
Ref: Chen et al. (2018)

(2) In general, preparing dataset for classification is easier than for object detection.
We need to teach location and class for detection dataset, but teaching class is enough for classification model

Therefore, it would be nice if detection model do coarse labeling (like "animal") and 2nd-classifier can do the detailed labeling (like "dog", "cat").

Use case

(1) for detailed (fine-grained) classification
(2) for increasing performance of coarse-grain labeling

I suppose it will be a help for most tasks in object detection

Additional

The basic idea is;
(1) replace the detected class (pred_cls1) with second-model's prediction (pred_cls2) in utils.general.apply_classifier
(2) define detailed class names in data/~.yaml
(3) load and refer (2) in getting outputs (when second classifier is applied)

I'll try, but it may take some time.

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

@KazuhideMimura yes that's a good experiment!

@KazuhideMimura Hi๏ผŒI am very interested in this question.How's it going now๏ผŸ

Hi @WangRongsheng, thank you for the comment!
I made 2nd-model possible to classify in detail classes by updating the function apply_classifier as follows.

# 2nd classifier
def apply_custom_classifier(x, model, img, im0, second_size, classes = None,
                            check_prediction=False):
    im0 = [im0] if isinstance(im0, np.ndarray) else im0
    device_ = x[0].device
    for i, d in enumerate(x):  # per image
        if d is not None and len(d):
            d = d.clone()
            # Reshape and pad cutouts
            b = xyxy2xywh(d[:, :4])  # boxes
            b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # rectangle to square
            b[:, 2:] = b[:, 2:] * 1.3 + 30  # pad
            d[:, :4] = xywh2xyxy(b).long()
        
            # Rescale boxes from img_size to im0 size
            scale_coords(img.shape[2:], d[:, :4], im0[i].shape)

            # Classes
            pred_cls1 = d[:, 5].long()
            ims = []
            for j, a in enumerate(d):  # per item
                cutout = im0[i][int(a[1]):int(a[3]), int(a[0]):int(a[2])]
                im = cv2.resize(cutout, (second_size, second_size))  # BGR
                if check_prediction:
                    cv2.imwrite('example%i.jpg' % j, im)
                im = im[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
                im = np.ascontiguousarray(im, dtype=np.float32)  # uint8 to float32
                im /= 255  # 0 - 255 to 0.0 - 1.0
                ims.append(im)

            PRED_cls2 = model(torch.Tensor(ims).to(d.device))
            PRED_cls2 = torch.nn.Softmax(dim=1)(PRED_cls2)
            # pred_cls2 = PRED_cls2.argmax(1)  # classifier prediction
            pred_conf2, pred_cls2 = PRED_cls2.max(1)
        
            if check_prediction:
                for j, c in enumerate(pred_cls2):
                    prev_name = f"example{j}.jpg"
                    new_name = f"example{j}_{int(round(float(c)))}.jpg"
                    os.rename(prev_name, new_name)
        
            # 2nd model determines the class
            x[i][:, -1] = pred_cls2
            x[i][:, -2] = pred_conf2

            # filter by classes
            if classes is not None:
                x[i] = x[i][(x[i][:, 5:6] == torch.tensor(classes, device=device_)).any(1)]

    return x

Also, I modified LL126-131 of detect.py as follows.

# NMS
pred = non_max_suppression(pred, conf_thres, iou_thres, classes=None, agnostic=agnostic_nms, max_det=max_det)
# Second-stage classifier (filtering by class at this stage)
pred = apply_custom_classifier(pred, second_model, im, im0s, 224, classes)
dt[2] += time_sync() - t3

In the beginning of detect.py, you need to load model and define list of classes for second model.

# load second model
second_model_path = "path / to / 2nd-model" # define the path
second_model = torch.load(second_model_path, map_location=torch.device('cpu'))['model'].float()
second_model.to("cuda:0")
names2 = ['cls-A1', 'cls-A2', 'cls-B1', 'cls-B2', ...] # list of class names for 2nd model

finally, replace all the "names" at detect.py to "names2"

It worked in my case.
I haven't compared the performances between rough classifier and detailed one.
(And I'm not going to do it. I hope someone could do...)

@KazuhideMimura Thank You!You did a great job!I don't have a good training model. I hope to test the model quickly. Can you provide a complete project code and weight?

@KazuhideMimura If it's not convenient for you to provide it here, you can email me๏ผMy email is ๏ผšwrs6@88.com

@WangRongsheng I don't have a trained weight I can share.
However, I can share a project code on my GitHub account. Please wait for 2 or 3 days.

Maybe you'll need to find training and test dataset. Sorry I couldn't contribute to your purpose.

@WangRongsheng This is my project folder. I hope it will be a help
https://github.com/KazuhideMimura/yolov5-ichthyolith

@KazuhideMimura Thank you for your work. I have paid attention to it๏ผ

@glenn-jocher Thanks for the comment. It worked, although I haven't tested the performance yet.
https://github.com/KazuhideMimura/yolov5-ichthyolith

๐Ÿ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 ๐Ÿš€ resources:

Access additional Ultralytics โšก resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 ๐Ÿš€ and Vision AI โญ!

Hi @KazuhideMimura ! Your work looks amazing and I am trying to do something similar! However, I am wondering if you know there is any way to evaluate the performance of the 1st + 2nd model together? For just the yolo classifier, I can use the val.py file to generate performance, but we can't do the same for the detect.py file...

Hi @anniehfwx , thank you for the question.
I also find it difficult to measure total performance of detect & classify system. In my research, I detected target objects from ~50 images using detect.py and counted TP, FP, FN manually.
In addition to checking performance of 1st and 2nd models separately, which Glenn has mentioned in #11357 , I suppose total performance check (even manually) would make your evaluation more solid.

If you are good at programming, please consider editing val.py so that total performance can be evaluated. I tried, but couldn't do that...

Dear @KazuhideMimura, thank you for your reply! I will try both methods to evaluate the performance, and maybe have a go at attempting to update the val.py or detect.py file for full performance evaluation. I will share the edits if there is any meaningful progress, cheers!

Great to hear, @anniehfwx! Please don't hesitate to ask if you need further assistance or have any other questions. Our community is always here to help. Good luck with your project, and we look forward to seeing your progress!

@anniehfwx Thank you. I'd love to see your progress!

This is great to hear that you're both working on similar projects, @KazuhideMimura and @KazuhideMimura! We encourage collaboration within the community, so feel free to share your progress and collaborate with each other as much as possible. Remember, the goal is to push the boundaries of object detection and advance the field of computer vision. Good luck to both of you, and don't hesitate to reach out if you need anything!

@glenn-jocher Thank you!! I think enabling total evaluation will be a big progress for both yolov5 and yolov8 users.
I checked val.py again and I'm imaging following way.

  1. Load classification model after loading first model (L.154)
  2. insert apply_classifier after detection by first model (L. 211)

Do you think it works? I was working on previous version of yolov5, but the latest version seems to be easier to connect 2nd model.

It's great to hear you're working on enabling total evaluation for the community, @KazuhideMimura! Your approach to integrating the classification model with YOLOv5 appears reasonable. With the improvements in the latest YOLOv5 version, connecting the 2nd model might indeed be easier. Feel free to proceed with your plan, and don't hesitate to reach out if you encounter any challenges along the way. Good luck with your work, and we look forward to your progress!