对比实验数据问题和数据集访问
Closed this issue · 1 comments
你好,我对你们的工作很感兴趣!
我注意到你们在论文里引用了CVPR2023的《Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network》(参考文献[65]),但在实际对比实验的时候没有给出它的数据进行对比。请问你们的工作和他们比起来怎么样呢?
其次,VAANet(参考文献[66])论文里给出的ve8和ek6数据集上的acc比你们表格中列出的结果高不止5个点,请问这个差距是因为你们的实验方法和VAANet论文中用的不一样吗?差别在哪里呢?
最后,我对你们的eMotions数据集很感兴趣,请问需要哪些许可才能访问呢?我愿意按要求提供所需材料。
感谢你的耐心阅读,谢谢~
Thank you for your interest in our work.
1. We cite [65] in the paper to indicate that AV-CPNet is different from the visual backbone they deploy. In addition, our AV-CPNet provides benchmark results for eMotions and is not designed to carry out absolute performance comparisons with other SOTA methods. In the future, we will provide the comparison results with [65].
2. The implementation details are placed in the appendix. Following Video Swin-T [1], we use the optimization strategy of AdamW and weight decay=0.2, which is inconsistent with the optimization strategy described in VAANet [2]. Besides, we also deploy the optimization strategy in VAANet [2] to compare the performance of proposed AV-CPNet and VAANet [2] in appendix.
3. eMotions will be released after completing the final review and formulating the relevant acquisition rules.
Reference:
[1] Liu Z, Ning J, Cao Y, et al. Video swin transformer[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 3202-3211.
[2] Zhao S, Ma Y, Gu Y, et al. An end-to-end visual-audio attention network for emotion recognition in user-generated videos[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(01): 303-311.
[65] Zhang Z, Wang L, Yang J. Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 18888-18897.