This project leverages deep learning techniques, specifically computer vision (CV) and natural language processing (NLP), to predict medical report text from chest X-ray images. The application of deep learning in healthcare, particularly in disease prediction from medical images, has shown significant advancements in recent years.
In the medical field, physicians rely on detailed medical reports for patient examinations. However, the shortage of specialist physicians, especially in resource-limited countries, poses a significant challenge. This project aims to address this gap by employing artificial intelligence to generate accurate and timely medical reports.
We utilize the publicly available Indian University (IU) dataset, focusing on chest X-ray images. The dataset includes both frontal and lateral views along with corresponding XML files. These XML files contain essential information categorized into four unique tags:
- Comparison: Provides information on serial follow-up procedures.
- Indication: Includes patient medical information.
- Findings: Presents depth information as per the medical report.
- Impression: Generated by combining information from comparison and findings.
This task involves the integration of computer vision and natural language processing. Given one or more chest X-ray images, the goal is to generate a text report resembling one produced by a radiologist. State-of-the-art techniques such as CNNs, RNNs, LSTMs, GRUs, and attention mechanisms are employed. The choice of using GRU with the Attention Model is motivated by its focus on important words, thereby improving the prediction of report text.
- EDA.ipynb: Explores the characteristics of chest X-ray images and provides insights into the dataset.
- Basic_Model.ipynb: Implements encoder-decoder models using CNNs, RNNs (LSTM, GRU), and their combinations.
- Attention_Model.ipynb: Implements and explains attention mechanism-based models for improved prediction.
- Error Analysis.ipynb: Analyzes prediction errors and suggests potential improvements.
The end_to_end_pipeline.py
script provides a comprehensive solution for predicting medical report text from chest X-ray images. Users can choose different models (CNN, RNN, LSTM, GRU with attention) via command-line arguments.
- Install dependencies listed in
requirements.txt
. - Execute the end-to-end pipeline script:
python end_to_end_pipeline.py
.
For a detailed understanding of the project's motivation, challenges, and potential impact, check out our blog post.