zalandoresearch/fashion-mnist

About the benchmark list credibility

atztao opened this issue · 1 comments

You know the different size batch for the test and train we can get different acc,in the benchmark list modle the high acc just use 200 batch size,so the benchmark list result should had uniform standards,that for whole test and train collection.

good point, yet please note building a perfect benchmark table is not part of the goal. We collect PR with code and linke that claims new result, leaving the validation and reproducibility test to users, as written in README.md:
image

Nonetheless, we do basic sanity check based on our own experience though, e.g. #129 #119 #110 #47 #36 , preventing bad PR from merging.