Box shifting: some boxes may appear as background after tracking (when using dataloader_vidor.py)
Closed this issue · 0 comments
Tips from @Dawn-LX :
This problem originates from
VidSGG-BIG/dataloaders/dataloader_vidor.py
Lines 488 to 508 in eaf7578
Here, we notice that tracking results for each box at one specific frame consist of a 6-dim
vector or a (12+dim_boxfeature)-dim
vector.
- If the
6-dim
vector appears, corresponding box will be viewed as background. - Otherwise, the first
12-dim
ofbox_info
, which consists offrame_id
,tracklet_id
,4-dim bbox coordinates
,confidence
,category_id
,4-dim bbox coordinates
, will be used to determine the final location of bbox.
The first 4-dim bbox coordinates (box_info[2:6])
is generated by tracker, and the second one box_info[8:12]
is generated by our video obeject detector. The reason why box shift is that we calculate an average bbox coordinates by the two mentioned one. Because detected object location maybe inconsistent with current tracklet, and the tracker-generated one is more precise, so this averaging manner may merge two boxes to a background one.
Specifically, box generated by tracker is much more precise since it considers boxes in previous frames, current detected box, and visual similarity. But box from video object detector maybe wrongly linked to current tracklet (which does not mean it is a background box itself). So this averaging manner is not strictly correct in these cases and that is why we only use track-generated one (box_info[2:6])
in
VidSGG-BIG/dataloaders/dataloader_vidor_v3.py
Lines 414 to 421 in eaf7578
However, tracklet_mAP does not improve by switching from averaging manner to unique manner. The reasons maybe
- Cases of box shifting are rarely seen, so final performance benefits little from this fixing.
- Averaging manner may serve as a more precise way to combine/choose these two kinds of boxes for most cases, so unique manner may lose some accuracy.