KAIST-AILab/DSTC10-SIMMC

Question about evaluation

Closed this issue · 1 comments

Hello,

Could you please show the details of evaluations on each sub-task after deriving the prediction file? I run the evaluate.py and response_evaluation.py, and the act F1 & slot F1 & BLEU are much lower than those reported in the paper.

Is there any postprocess after deriving the prediction file in README, or the training hyper-parameters are different?

Thanks

Hi,

For now, this repo is about the paper: http://ailab.kaist.ac.kr/papers/LKC2022TACKLING, which we submitted to DSTC10 track workshop of 2022 AAAI. Please check Table 2 for evaluation results on devtest split for each subtask.

If you mean the results in the NAACL Findings paper, yes, the results of this repo should be below the numbers reported in the NAACL paper, because this repo does not use visual tokens.

The major difference between AAAI workshop paper and NAACL Findings paper is that we upgraded the model to use vision tokens, which brings performance boost.

I will upload the vision token using part in a few weeks. Sorry for any inconvenience.