Some confusing phenomenon observed for the ORBIT dataset

Question

Some confusing phenomenon observed for the ORBIT dataset

chi-chi-zx opened this issue 3 years ago · 1 comments

Hi,

Thank you for the contribution on developing the real-world scenario dataset. Our team is developing the algorithms to tackle the real-world problems emerged in the ORBIT dataset paper and aim to improve the few-shot evaluations for the ORBIT dataset. However, based on our investigations, we observed that there are some problems in the dataset which hinder the algorithm development and fair evaluation. Please see the following:

1. Multiple support items appear in the same query frames.

The above image contains frames from P177 person in the test set. The top row lists the support objects and bottom row shows the query frames. There are some portion of the query videos that contain multiple support objects. For an example, the phone appears in multiple query video frames, but their GT labels are different. I annotated their GT label (also the GT bbox) and other support object appear in the frame. Some support object other than the GT object has greater area ratio, therefore, higher the chance to confuse the model. Such cases aren't rare, they are quite often observed. We are wondering, for per-frame-accuracy metric, isn't it more reasonable to have multiple labels for those cases?

2. Frames without any objects at the beginning of the videos.

We also observed that most of the people repetitively moving the camera closer and further towards the target object. There are certain videos with the beginning frames that contain no objects (background only). Since we are not allowed to refer future frame, thus, those frames can only be randomly guessed. We are wondering if those frames can be removed during the evaluation to make it fair.

Thank you.

Answer 1 · 2022-05-04T08:29:19.000Z

Hi @chi-chi-zx

Thanks for raising these phenomena.

Re issue 1. We are aware that some clutter videos show the target object as well as other objects from the user's support set. Data collectors were instructed not to do this, however, in some cases they forgot or did not understand the instructions. We agree that this may skew the results in some cases, so we are looking into annotating multiple object labels for clutter frames in the future. We will not, however, be able to provide these before the competition deadline so when evaluating entries, we may discount the performance of users like P177 to ensure a better reflection of the algorithm.

Re issue 2. Data collectors were asked to record all their clean videos using this specific technique - they were asked to draw their phone away from the object multiple times. Because the data collectors were blind/low-vision, this was done to help them capture multiple sides of the objects while also increasing their chances of capturing frames with object present. In some cases, however, they were not able to keep the object in-frame for all frames. To tackle this issue, we annotated "object_not_present_issue" labels for all clean frames in the dataset. You can find more details about these (and other annotations) here. Note, however, you can only use these annotations during meta-training and NOT during meta-testing (since at meta-testing, a real-world user would not have provided these labels).