
Dataset for Similarity and Category and Results for method and step level inference?

Closed this issue · 4 comments

Hi thank you for your work! I want to ask about two things:

  1. Would it be possible to release the similarity and category test set? I only see the random strategy in the GDrive.
  2. Could you report the exact accuracy for method-level and step-level that you briefly mentioned in figure 3? Having the results for the best performing ones (the triangle ones) and the human performance (the circles) would be great!

Thank you in advance.


  1. I just uploaded the test set in google drive.
  • model: method: 0.6972, 0.7431, 0.5316; step: 0.7848, 0.7465, 0.6607
  • human: method: 0.905, 0.727, 0.74; step: 0.92, 0.8920, 0.86
    the order is random, similarity, category.
    let me know if you have more questions, thank you!

Thank you for your response! Just one more thing that I forgot to ask. What is the exact accuracy for the goal (on random, similarity, category)?

Please see the results in the table 2 of the paper.

Oh, I missed that. Thank you!