Understanding evaluation on Chest X-Ray Datasets
Opened this issue · 0 comments
dhudsmith commented
Hello, I am trying to understand the evaluation procedure for the Chest X-Ray datasets. The appendix clearly states that the 5 competition categories from CheXpert were used.
However, in the code, all 14 classes contained in the original CheXpert were used. For example, for VinDr CXR:
BenchMD/src/datasets/chest_xray/vindr_cxr.py
Lines 47 to 57 in 2d1fe92
BenchMD/src/datasets/chest_xray/vindr_cxr.py
Line 172 in 2d1fe92
Can you clarify how these VinDR, MIMIC, and CheXpert numbers were calculated? Was the model trained using all 14 classes but evaluated just on the 5 classes of interest?