About the benchmark list credibility
atztao opened this issue · 1 comments
atztao commented
You know the different size batch for the test and train we can get different acc,in the benchmark list modle the high acc just use 200 batch size,so the benchmark list result should had uniform standards,that for whole test and train collection.
hanxiao commented
good point, yet please note building a perfect benchmark table is not part of the goal. We collect PR with code and linke that claims new result, leaving the validation and reproducibility test to users, as written in README.md
:
Nonetheless, we do basic sanity check based on our own experience though, e.g. #129 #119 #110 #47 #36 , preventing bad PR from merging.