can not reproduce results in the paper

Question

can not reproduce results in the paper

Opened this issue 5 months ago · 2 comments

I run your instructions on the openbookqa task and got the following results:
full cache / dense:
"openbookqa": { "acc": 0.414, "acc_stderr": 0.02204949796982787, "acc_norm": 0.458, "acc_norm_stderr": 0.022303966774269938 }

streamingllm:
"openbookqa": { "acc": 0.256, "acc_stderr": 0.019536923574747588, "acc_norm": 0.342, "acc_norm_stderr": 0.02123614719989926 }

h2o:
"openbookqa": { "acc": 0.264, "acc_stderr": 0.01973288558592208, "acc_norm": 0.348, "acc_norm_stderr": 0.0213237286328075 }

cam:
"openbookqa": { "acc": 0.31, "acc_stderr": 0.020704041021724795, "acc_norm": 0.352, "acc_norm_stderr": 0.021380042385946055 }

I think it might not be problems of experiment environment. I run the official repo of H2O and got almost the same scores of 5-shot evaluation as their paper.

Answer 1 · 2024-06-21T03:03:25.000Z

What ratio did you set? In openbookqa dataset, it provides 4 options for the model to choose. That means even without cache, the base acc is 25%.

Answer 2 · 2024-06-21T03:10:03.000Z

Both the start-ratio and recent-ratio are 0.1. And in the 0-shot setting.