gau-nernst/vision-toolbox

Imagenet pretrained

Closed this issue · 3 comments

I've seen your instructions to train on Imagenet Object Localization.

How long does it take to train on such a big dataset?

Could you please share a trained checkpoint for Yolov5 and any other architecture that you've tested?

It would be great for people with less resources and to save a little bit of energy ;-)

If it wouldn't be possible for any reason, it's perfectly understandable. Thanks for sharing this project

Hi @virilo,

For my setup of 2x 3090 GPUs, it takes around 1 day to train for most models. The trained weights of YOLOv5 backbone are already there.

from vision_toolbox import backbones

model = backbones.darknet_yolov5m(pretrained=True)
model(inputs)                       # last feature map, stride 32
model.forward_features(inputs)      # list of 4 feature maps, stride 4, 8, 16, 32
model.get_out_channels()            # channels of output feature maps

If you want to use the weights for object detection with the YOLOv5 repo, you will have to port the weights to match the keys' names.

Cheers

Thanks a lot @gau-nernst !

I'm feeding it with a batch of one single 512x512 image normalized to [0,1]

Using this input, model(inputs) returns me a tensor with a size of [1, 768, 16, 16]

How could I get the inference bounding boxes from this output?

Cheers

Hello @virilo,

I only implement the backbone part of YOLOv5. If you want to do object detection, you have to follow the original YOLOv5 repo.

YOLOv5 does provide pre-trained weights on COCO. You can use them to fine-tune on your dataset.