Extract feature vector from the bounding box predicted together with the coordinates and class output vector

Question

Extract feature vector from the bounding box predicted together with the coordinates and class output vector

Closed this issue 3 days ago · 6 comments

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi
I have seen there are some related questions (#385, #2904) but I think it does not solve my problem because it is used the yolov5 classification model.

I would like to train a yolov5 model to detect objects but also I would like to get the feature vector that describes the object within the bounding box. I want to use this vector to cross correlate with other bounding boxes feature vectors and gave me the similarity between them. Is there a way to get the feature vector for all the bounding boxes yolo gives? I cannot take the bbox image and pass through the yolov5 classification due to performance limitations. I will need probably to create another output (containing these feature vectors) apart from the 3 outputs yolo has (coordinates + class assigned for each anchor). Is that possible?

Thank you.

Additional

No response

Answer 1 · 2024-04-22T16:06:53.000Z

Hey there!

Great question! 🌟 Extracting feature vectors directly from YOLOv5's bounding box predictions for similarity comparisons is an advanced use case and would require modifying the model architecture and potentially the post-processing code.

Essentially, you're looking to tap into the intermediary layers of the model, as the final layers are more focused on class probabilities and bounding box coordinates. The layers before the output layers contain rich feature representations of detected objects.

Here's a high-level approach:

Identify which layer's output you want to use as the feature vector. This often involves some experimentation. Look at the layers before the final detection layers.
Modify the model's architecture to return the desired layer's output along with the standard detection outputs. This involves diving into the model's code and adding the necessary changes to forward the selected layer's output.
Adjust post-processing to handle these additional outputs efficiently for your similarity comparison tasks.

While I can't provide a specific code example due to the complexity and need for customization, this approach requires a good understanding of PyTorch and the YOLOv5 architecture. I highly recommend delving into the model's code and potentially experimenting with different layers to find which offers the most valuable features for your application.

This task is quite advanced and may involve trial and error to get right. Good luck! 🚀

Answer 2 · 2024-04-23T06:43:58.000Z

Hi Glenn

Thanks for the answer, looks complicated but I will give it a try.

Answer 3 · 2024-04-23T13:20:22.000Z

You're welcome! 😊 It sure sounds like a challenge, but diving into these complex tasks is how we grow. If you hit any bumps along the way or have more questions, feel free to reach out. Best of luck with your project! 🚀

Answer 4 · 2024-04-27T18:04:25.000Z

Hello @JaviMota

We encountered the same question. Below is an example code to extract features from the final layers of the neck in the YOLOv8 model. This will yield a vector of 2304 features (256 x 3 x 3). You might consider adjusting this to suit your needs, such as modifying the feature extraction layer in YOLO and adjusting the pooling layers to reduce the size of the feature vector.

from PIL import Image
from ultralytics import YOLO
import lightning as L
from torchvision.transforms import transforms
import torch.nn as nn

class YOLO_Features(L.LightningModule):
    def __init__(self, model):
        super().__init__()
        self.model = model
        self.pool1 = nn.MaxPool2d(3)
        self.pool2 = nn.MaxPool2d(2)
        self.tf = transforms.Compose(
            [
                transforms.Resize((640, 640)),
                transforms.ToTensor(),
            ]
        )

    def forward(self, x):
        x = self.transform(x)
        x = self.model.model.model[:9](x)
        x = self.pool1(x)
        x = self.pool2(x)
        x = x.flatten()
        return x

    def transform(self, x):
        return self.tf(x).unsqueeze(0)


path = None # your image path
image = Image.open(path)
model = YOLO("yolov8n-cls.pt")
fts = YOLO_Features(model)(image)

Answer 5 · 2024-04-29T10:41:12.000Z

Hi @Poissonfish

Thank you for your answer! I have not had time to check it at it but as a first look I see the example uses "yolov8n-cls.pt". Is it possible to do the same with "yolov8n.pt"? I have seen the combination of taking the output of the YOLO (bbox) and pass it to the yolov8-cls.pt but due to performance limitations I cannot use more than one model.

Answer 6 · 2024-05-30T00:22:00.000Z

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐