AIWintermuteAI/aXeleRate

Training with our dataset

Closed this issue · 3 comments

Hey, thank you for the work.

I've been using your framework for a while and I was wondering how should a dataset be formed to be actually good for training yolo with mbnet0.5 or 0.75 as backend.
I am training a person detector.
I have parsed the pascal-voc to remove all the labels that are not 'person' but using the whole dataset for the training didn't bring good result.
I have also used the inria dataset, which as I can see, is the one that you have partially provided in the colab notebook and got better results.

my question is this:
is there a proportion to respect, between the number of images containing the objects we want to detect and images not containing them?
Thanks!

Thank you for kind words!
While I did not make tests to study this problem, theoretically because of the way how YOLO detection works you do not need any images without the object to train good model - all the pictures can contain object and it will train well.
YOLO v2 has a number of outputs for each box in the grid cell:

  • parameters for the boundary box (x,y, w, h)
  • box confidence score (objectness)
  • class probabilities
    When training it learns to make box confidence score low for the boxes in grid that DO NOT contain objects. For example
    yolo-object-detection
    In this picture there will still be prediction made for grid cells that have tree and floor in the center, but their box confidence scores will be very low.
    Scores@2x
    So, the model learns still learns the negative examples from these pictures, even despite one instance of the object is present.

Oh and as a side note - INRIA nowadays can be considered "a toy dataset' for pedestrian detection. The number of positive samples is quite few (for modern days standards) and there are some instances where not all people present in the image are annotated - you can see it if you browse through dataset with labeling tool. Which worsens the performance of the model (prediction made on a person NOT included annotation, but present in the image considered to be false positive and will be penalized by loss function).
I did want to make BetterINRIA dataset, with all people annotated and more images added from PASCAL-VOC, but sadly no time for this now.

Thank you very much for the explanation.
I also tought that no images without person were needed but as the performances didn't improve like with your dataset did, I had this doubt because I saw that you have provided some images without the person.
For the dataset, I'm working on it, if I get good result maybe I can give it to you.
Thank you again, I'll close the issue!