About the benchmark list credibility

Question

About the benchmark list credibility

atztao opened this issue 6 years ago · 1 comments

You know the different size batch for the test and train we can get different acc，in the benchmark list modle the high acc just use 200 batch size，so the benchmark list result should had uniform standards，that for whole test and train collection.

Answer 1 · 2018-11-07T05:40:17.000Z

good point, yet please note building a perfect benchmark table is not part of the goal. We collect PR with code and linke that claims new result, leaving the validation and reproducibility test to users, as written in README.md:

Nonetheless, we do basic sanity check based on our own experience though, e.g. #129 #119 #110 #47 #36 , preventing bad PR from merging.