jrieke/shape-detection

Question about multiple classes in an image

Closed this issue · 4 comments

Hello?

Thanks for uploading this and I have a question about having multiple classes in a training image. There could be many classes more than just 2 and the bboxes must contain all class coordinates per a training image.

Ex. train_image_0.jpg: 4 classes
train_image_1.jpg: 1 class
train_image_2.jpg: 10 classes
...

But you defined bboxes to have max. 2 classes per an image.
bboxes = np.zeros((num_imgs, num_objects, 4))

Then how to re-define the bboxes?

Thanks,

By classes, do you mean which shape the objects are (e.g. rectangle/triangle/circle)? In that case, you simply use a longer vector for each bounding box (the class and the color are encoded as one-hot vectors in each bounding box).

Or do you mean the number of objects on each image? That is very tricky to do. Part of the reason why the examples in here are so simple is because they contain a fixed number of objects per image. There are methods to recognize a variable number of objects per image, but they are way more sophisticated. See also the last chapter in my blogpost here.

Hello? I meant the number of objects and classes.

Currently I made some example codes with keras to contain max.100 classes or objecs per a scene; their class IDs as well as ROI (x, y, w, h) information is embedded as a 1D data in 'class number, x, y, w, h' order. So 5 data x 100 objects = 500 + dummy 12 = 512. And all other blanks are filled with zeros.

But accuracy is around 67% with 32x32 size VOC data. And its accuracy is dropped after around 14,000 epochs. Maybe I have to increase size of conv2D input to 224x224 or add more conv2d layers. Currently I used only 4 of conv2d layers.

Also, I haven't used encoded one-hot vectors for classes. Maybe I would try it with one-hot vectors as well....

Thanks,

With up to 100 objects, I think it will get very hard to get good results with the methods presented here. You might want to have a closer look at the papers I mention in the blogpost linked above (and there are also some more recent approaches).

Using one-hot encoding should definitely improve the situation for the object classification though.

Oh and just to be clear: There's no possibility to have a variable number of objects per image within the framework presented here (again, see the blogpost).