nqanh/affordance-net

train my own dataset

Opened this issue · 20 comments

would you mind to tell me how to produce the "instance_png" images?
Is there any requirement for the instance_png? such as: different object need different colour?

thanks

nqanh commented

There are two kind of groundtruths in our system:

  • The object bounding box: defined as a rectangle with top-left (xmin, ymin) and bottom-right (xmax, ymax) coordinates.
  • The multiclass affordance mask (of each object): each affordance will have a unique id. There are many tools that help you do label the mask (e.g. Label Me)

thanks a lot!
I have some new problem, please help me :(

  1. I have used the Label Me, but what I just made a .json file for a image, I don't know how to convert .json into the .png file.
  2. I noticed that there are 11 classes in the IIT-AFF dataset, I believed each class should have a color to represent it in the mask png, however I can not find the location to define the color. I found "parse_pascal_instance.py" , but it just for the VOC dataset.
  3. Can you please tell me if I want to add a new class in IIT-AFFdataset, what I should do...
    Looking forward to your reply!
nqanh commented

Hi,

  1. In general, the results of Label Me (or any tools such as this one) will be saved in a particular format. You should check the Readme of each tool to understand the way they saved the data. There are many tools available now, you can choose or modify them as your need.
  2. The colour doesn't matter during training/testing. It's just for visualization. What you'll need is the ID of each class. If you want to know how to visualize the affordance map, take a look at file tool/demo_img.py, line 228
  3. First, you need to define your object + affordance classes. If they overlap with our IIT-Aff dataset, then you should use the same ID. Then collect the images and label them with your favourite tool. During the post-processing, remember to assign the correct ID you wanted. The same affordance should have the same ID (for example, the contain part of a cup and a bowl should have the same ID).

The utils folder also provides more information about how to the prepare the data.

Appreciate for your help, my own dataset works with your help !!! cheers!!!

nqanh commented

Great to hear that!

hi nqanh, I found an interesting question,
in models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt,
at line 13, the num_classes is 11, and your comment is "10 obj categories + 1 background"
at line 890, the num_output is 10, your comment is "9 affordance classes +1 backgtound"
may I ask why the obj classes are different from affordance classes?
thanks a lot !!!!

nqanh commented

The object class and the affordance class are not the same since they're defined separately.

The object categories refer to the whole object (e.g., pan, bottle, hammer), while the affordace classes are the object parts (e.g, the pan has two affordances: contain and grasp).

hi nqanh,
I met a serious problem, I have tested my own object, but I can't get any mask. I wonder know if I used the wrong label name in using "LABEL ME"?
Here are my steps,

  1. using "label me", I used "bowl" as the label when I tagged images,and make the json files, afer that, I used “labelme_json_to_dataset” function to get the PNG files(all black on my vision).
  2. copy all PNG files into affordance-net/utils/instance_png, use convert_instance_png to convert the png files into .sm files
  3. add my original images files into data/VOCdevkit2012/VOC2012/JPEGImages, and use "labelimg" to make the .xml files(they locate in Annotations folder).
  4. change the train.txt in ImageSets/Main folders
  5. train the caffemodel
    Is there any problems in my process?
    thanks a lot! :(
nqanh commented

The process looks ok to me. But maybe something wrong in one of your step.

  1. Make sure you have the correct label ID in the PNG file (you can check with Matlab or visualize the "black" png to see it). Also do not confuse between object id and affordance ID. They're different.
  2. ok (if step 1 is good).
  3. Make sure you have the same .xml format as in the IIT-AFF dataset (name of the node, etc.).
  4. ok
  5. If you don't change the number of class (object class and/or affordance class) then you can use the default prototxt file. Otherwise you'll need to change the prototxt to adapt to the new number of class.

thanks,
I will follow your steps,
Would u mind tell me, where I can find the specific affordance ID for IIT-AFF dataset ?

nqanh commented

You can download the original version of the IIT-AFF dataset. The IDs are in the ReadMe file.

hi nqanh
There is a problem for me.
When I used the "labelme_json_to_dataset” function to make my own dataset, the output png file is an "unit16" type .png image. But, the images in your example (eg. /utils/instance_png/0_1.png), the types are "uint8". Can I just transfer the type by using matlab code?

nqanh commented

Yes, you can do it if you want. I think "uint16" will work fine too.

hi nqanh,
me again, I met a problem when training my own dataset.
the error code is :
gt_mask = mask_flipped_ims[gt_mask_ind]
IndexError: list index out of range
I found the code in rpn/proposal_target_layer.py

how can I solve this problem?

thanks a lot

nqanh commented

Hi, it seems something missing in your dataset.
Please check Issues 8 for a recent discussion.

Basically,
By looking at Issues 8, I found that there was only one .sm file for one picture in my dataset, no matter how many goals there were in the picture. Do you think I found the key to the problem?

nqanh commented

No, it's not correct. For each object in the xml file, you need a mask groundtruth.

In general, each object in the .xml file must have a groundtruth mask file. The first object in the .xml file should have 'IMGID_1_segmask.sm" mask, the second object in the .xml should have "IMGID_2_segmask.sm", etc.

According to your reply, the number of .sm files must equal to the number of targets in the xml file, otherwise, the training target can not find the ground truth of the mask. Is my understanding correct?

nqanh commented

Yes, the number of .sm files must equal to the number of objects in the xml file.

Thank you for your help,and I have another problem here:
What I was doing was a job of dividing the Ammeter scale. I found that the segmentation of the small target was not very good. I exported the intermediate model and found that there was no mask at all for the first 60,000 iteration models after 70,000 iteration, there are some mask results(but they are far away from the ground truth mask). Do you think it is not enough iteration for my model?