With an estimated 160,000 deaths in 2018, lung cancer is the most common cause of cancer death in the United States (Ardila et al. 2019).
Lung cancer is one of the most prevalent cancers worldwide, causing 1.76 million deaths per year (Yu et al. 2020).
Clinical decision support systems have been developed to enable early diagnosis of lung cancer from CT images. However, most of these tools are limited to lung or nodule segmentation, leaving classifation of nodules to the radiologist. Early research shows that deep learning models can support with this task as well. Integrating these research efforts into clinical applications is an active area of development. See the Arterys Marketplace for examples of lung cancer detection models, some of which are currently under review for FDA or CE approval. This project constitutes a design study of how a deep learning-based lung cancer detection app could look like.
This dataset contains 1010 chest CT scans (in DICOM format) containing 2625 nodules. Nodules are annotated by radiologists regarding their malignancy, measurements and additional characteristics (e.g., calcification, spiculation).
The lung_cancer_detection
package contains modules for reading and preprocessing images.
The raw data can be downloaded from the Cancer Imaging Archive.
Model 1: Malignancy classification from tabular data
- Code:
nbs/14_Nodule_Classification_Tabular.ipynb
- Data:
- Source: Nodule metadata
- Filters: minimum three annotations, malignancy annotation is not "Indeterminate"
- Target: Binary classification (benign => labels 1, 2; malignant => labels 4, 5)
- Features: 11 in total (measurements and additional annotations)
- Split: 672 training examples, 75 test examples
- Model:
- Type: Random Forest (scikit-learn defaults)
- Performance: 94.67% accuracy, 0.9895 AUC score (on test data)
Nodule detection model:
- Preprocess LIDC dataset
- Train baseline model
Random ideas:
- Apply TCAV algorithm to trained model, use additional annotations as concepts
Basics of CT images:
PyTorch Lightning:
Monai (PyTorch-based library for medical imaging):
- Docs
- MONAI Bootcamp YT playlist
- Feature highlights
- 3D segmentation examples
- DICOM loading example
- Dataset types
- Basis for definition of image metadata
Preprocessing of DICOM images:
- DICOM Standard Browser
- Useful for looking up meta data specification
- Preprocessing code for LIDC dataset
- Well documented
- Creates masks from raw DICOM files
- Based on this repo
- Notebook series by Jeremy Howard
- Preprocessing of DSB 2017 dataset
Lung cancer detection datasets:
- LUNA 2016
- LIDC-IDRI
- Data Science Bowl 2017 (data can be found at academictorrents.com)
- NLST (project proposal required)
Ardila, D., Kiraly, A. P., Bharadwaj, S., Choi, B., Reicher, J. J., Peng, L., Tse, D., Etemadi, M., Ye, W., Corrado, G., Naidich, D. P., and Shetty, S. 2019. “End-to-End Lung Cancer Screening with Three-Dimensional Deep Learning on Low-Dose Chest Computed Tomography,” Nature Medicine (25:6), Springer US, pp. 954–961. (https://doi.org/10.1038/s41591-019-0447-x).
Setio, A. A. A., Traverso, A., de Bel, T., Berens, M. S. N., Bogaard, C. van den, Cerello, P., Chen, H., Dou, Q., Fantacci, M. E., Geurts, B., Gugten, R. van der, Heng, P. A., Jansen, B., de Kaste, M. M. J., Kotov, V., Lin, J. Y. H., Manders, J. T. M. C., Sóñora-Mengana, A., García-Naranjo, J. C., Papavasileiou, E., Prokop, M., Saletta, M., Schaefer-Prokop, C. M., Scholten, E. T., Scholten, L., Snoeren, M. M., Torres, E. L., Vandemeulebroucke, J., Walasek, N., Zuidhof, G. C. A., Ginneken, B. van, and Jacobs, C. 2017. “Validation, Comparison, and Combination of Algorithms for Automatic Detection of Pulmonary Nodules in Computed Tomography Images: The LUNA16 Challenge,” Medical Image Analysis. (https://doi.org/10.1016/j.media.2017.06.015).
Svoboda, E. 2020. “Artificial Intelligence Is Improving the Detection of Lung Cancer,” Nature (587:7834), pp. S20–S22. (https://doi.org/10.1038/d41586-020-03157-9).
Yu, K. H., Lee, T. L. M., Yen, M. H., Kou, S. C., Rosen, B., Chiang, J. H., and Kohane, I. S. 2020. “Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation,” Journal of Medical Internet Research (22:8), pp. 1–11. (https://doi.org/10.2196/16709).
Zhu, W., Liu, C., Fan, W., & Xie, X. (2018, March). Deeplung: Deep 3d dual path nets for automated pulmonary nodule detection and classification. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 673-681). IEEE.