rajpurkarlab/BenchMD

Understanding evaluation on Chest X-Ray Datasets

Opened this issue · 0 comments

Hello, I am trying to understand the evaluation procedure for the Chest X-Ray datasets. The appendix clearly states that the 5 competition categories from CheXpert were used.

image

However, in the code, all 14 classes contained in the original CheXpert were used. For example, for VinDr CXR:

CHEXPERT_LABELS_IDX = np.array(
[
CHEXPERT_LABELS['Atelectasis'], CHEXPERT_LABELS['Enlarged Cardiomediastinum'], CHEXPERT_LABELS['Cardiomegaly'],
CHEXPERT_LABELS['Lung Opacity'], CHEXPERT_LABELS['Lung Lesion'], CHEXPERT_LABELS['Edema'],
CHEXPERT_LABELS['Consolidation'], CHEXPERT_LABELS['Pneumonia'], CHEXPERT_LABELS['Atelectasis'],
CHEXPERT_LABELS['Pneumothorax'], CHEXPERT_LABELS['Pleural Effusion'], CHEXPERT_LABELS['Pleural Other'],
CHEXPERT_LABELS['Fracture'], CHEXPERT_LABELS['Support Devices']
],
dtype=np.int32
)
NUM_CLASSES = 14 # 14 total: len(self.CHEXPERT_LABELS_IDX)

label = torch.tensor(self.labels[index][self.CHEXPERT_LABELS_IDX]).long()

Can you clarify how these VinDR, MIMIC, and CheXpert numbers were calculated? Was the model trained using all 14 classes but evaluated just on the 5 classes of interest?
image