CAMELYON BENCHMARK : better datasets for MIL methods
MIT
CAMELYON_BENCHMARK
INTRODUCTION
why we do this work?
Multiple Instance Learning (MIL) methods are mainstream approaches for pathological image classification and analysis.
The CAMELYON-16/17 datasets are commonly used to evaluate MIL methods.
However, they have the following issues:
CAMELYON-16/17 datasets contain some problematic slides
Pixel-annotations of CAMELYON-16/17 test-dataset not accurate enough
Different MIL methods do not have a unified dataset-split and evaluation-metrics on the CAMELYON dataset
To conclude,there is no BENCHMARK for MIL methods
what we do in this work?
We do the following work to establish a CAMELYON-BENCHMARK
Remove some problematic slides.
Correct problematic annotations.
Merge the CAMELYON-16/17 datasets and add some new slides to organize a larger, more balanced CAMELYON-NEW dataset.
Evaluate mainstream MIL methods on the CAMELYON-NEW dataset.
Evaluate mainstream feature extractors on the CAMELYON-NEW dataset.
Use more comprehensive evaluation metrics to assess different methods.
In summary, we establish a new CAMELYON-BENCHMARK.