About m_sIoU
zanglam opened this issue · 1 comments
zanglam commented
Hi, thank you for your excellent work! I have a question about the m_sIoU reported in your paper.
We can estimate the spatial grounding accuracy inside the predicted time span (t_s, t_e) by calculating m_vIoU / m_tIoU. But I observed that in your model, m_sIoU << m_vIoU / m_tIoU (e.g., for HC-STVG2.0 with resolution 352 and temporal stride 4, m_sIoU =0.649, m_vIoU / m_tIoU = 0.467 / 0.539 = 0.866). It means that for the frames that are not in the predicted time span (t_s, t_e), the IoU between the predicted bounding boxes and the ground truth boxes is very low. This is quite interesting for me. Could you provide some analysis/explanations on it?