antoyang/TubeDETR

Incorrect viou metric calculation

zanglam opened this issue · 2 comments

Hi,

I found a bug in viou metric calculation.

Here, the max_end is min_end indeed.

max_end = min(gt_sted[1], pred_sted[1])

max_end = min(gt_sted[1], pred_sted[1])

Then, the length of union_predgt is shorter.

union_predgt = [
frame_id
for frame_id in frame_ids
if min_start <= frame_id < max_end
]

Then, the calculated viou is much higher than the correct one.

viou = viou / max(len(union_predgt), 1)

Thank you for catching this! I will soon re-evaluate the checkpoints and update the arXiv / repo once done.

Update: The corrected performance of our SoTA model on VidSTG is for declarative sentences: m_tIoU=48.1, m_vIoU=30.4, vIoU@0.3=42.5, vIoU@0.5=28.2 and for interrogatives sentences: m_tIoU=46.9, m_vIoU=25.7, vIoU@0.3=35.7, vIoU@0.5=23.2. For HC-STVG1.0, it is m_vIoU=32.4, vIoU@0.3=49.8, vIoU@0.5=23.5. The arxiv and repo will be updated by mid June.