We obtain this repository by refactoring the code for the blog post Using Microsoft AI to Build a Lung-Disease Prediction Model using Chest X-Ray Images. This instruction aims to help newcomers build the system in a very short time.
-
Clone this repository
git clone https://github.com/svishwa/crowdcount-mcnn.git
We'll call the directory that you cloned PredictLungDisease
ROOT
-
All essential dependencies should be installed:pickle, random, re, tqdm, cv2, numpy, pandas, sklearn, keras, tensorflow, keras_contrib, collections.counter.
-
Download the NIH Chest X-ray Dataset from here:
https://nihcc.app.box.com/v/ChestXray-NIHCC.
You need to get all the image files (all the files underimages
folder in NIH Dataset),Data_Entry_2017.csv
file, as well as the Bounding Box dataBBox_List_2017.csv
. -
Create Directory
mkdir ROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC mkdir ROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC_other
-
Save all images under
ROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC
-
Save
Data_Entry_2017.csv
andBBox_List_2017.csv
underROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC_other
-
Process the Data
mkdir ROOT/azure-share/chestxray/output/data_partitions
Run
000_preprocess.py
to create*.pickle
files under this directory
-
We have provided the pretrained-model
azure_chest_xray_14_weights_712split_epoch_054_val_loss_191.2588.hdf5
underROOT/azure-share/chestxray/output/fully_trained_models
. You can also download it separately from here. -
Run
020_evaluate.py
and it will createweights_only_azure_chest_xray_14_weights_712split_epoch_054_val_loss_191.2588.hdf5
saving weights of the pretrained-model under the same directory. -
Below is the result showing the AUC score of all the 14 diseases:
Disease Our AUC Score Stanford AUC Score Delta Atelectasis 0.822334 0.8094 -0.012934 Cardiomegaly 0.933610 0.9248 -0.008810 Effusion 0.882471 0.8638 -0.018671 Infiltration 0.744504 0.7345 -0.010004 Mass 0.858467 0.8676 0.009133 Nodule 0.784230 0.7802 -0.004030 Pneumonia 0.800054 0.7680 -0.032054 Pneumothorax 0.829764 0.8887 0.058936 Consolidation 0.811969 0.7901 -0.021869 Edema 0.894102 0.8878 -0.006302 Emphysema 0.847477 0.9371 0.089623 Fibrosis 0.882602 0.8047 -0.077902 Pleural Thickening 1.000000 0.8062 -0.193800 Hernia 0.916610 0.9164 -0.000210
-
Create Folder Test
mkdir ROOT/azure-share/chestxray/data/ChestX-ray8/test_images
Copy any number of images under
ChestXray-NIHCC
totest_images
and resize them to 224x224 pixels. -
Run
004_cam_simple.py
and it will output a Class Activation Map(CAM). The CAM lets us see which regions in the image were relevant to this class.
- Baseline result: https://arxiv.org/abs/1705.02315
- Image Localization: http://arxiv.org/abs/1512.04150
- The original chexnet paper mentioned in StanfordML website as well as their paper.
- http://cs231n.stanford.edu/reports/2017/pdfs/527.pdf for pre-processing the data
- https://arxiv.org/abs/1711.08760 for some other thoughts on the model architecture and the relationship between different diseases
Please contact yanhaotian@bupt.edu.cn if you have any problem.