Pneumonia is an infection that inflames the air sacs in one or both lungs, causing cough with phlegm or pus, fever, chills, and difficulty breathing. A variety of organisms, including bacteria, viruses and fungi, can cause pneumonia.
Key facts provided by the WHO
- Pneumonia accounts for 14% of all deaths of children under 5 years old, killing 740,180 children in 2019.
- Pneumonia can be caused by viruses, bacteria or fungi.
- Pneumonia can be prevented by immunization, adequate nutrition, and by addressing environmental factors.
- Pneumonia caused by bacteria can be treated with antibiotics, but only one third of children with pneumonia receive the antibiotics they need.
Being able to accurately detect pneumonia in pediatric patients is a live and death procedure, in which being able to act fast can increase the survival chances.
In this project, we evaluate pneumonia in pediatric patients by using deep-learning techniques with TensorFlow that allows to create a classification model.
Warning: One common mistake is to try to predict an adult x-ray image, which lead to wrong results. We would demonstrate at the demonstration section.
The data were obtained from Kaggle datasets under the name: Chest X-Ray Images (Pneumonia). The weight of the folder is 2GB which cannot be uploaded to GitHub, but it can be downloaded using the Download button in the top right corner, or by using the step-by-step guide provided here to download the data using kaggle keys provided in this link.
The following is the same description provided in the kaggle dataset about details of the data
The data contains three folders (Train, test, val) containing subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).
Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.
For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.
The prediction model provides a probability to predict that a pediatric patient x-ray shows signs of bacterial or virus pneumonia. The threshold was set at 0.8 to have more confidence in positive cases, however this might not be the ideal solution, but it would help to demonstrate some aspects of the prediction.
images
: Example images for the classification model (you can use the URL to run the prediction service)
Demonstration_notebook.ipynb
: Notebook that runs the image classification model using the prediction service
Dockerfile
: For deployment of the model in AWS as lambda function.
Notebook.ipynb
: Notebook for exploratory data analysis, creating and exporting the model.
Pipfile
and Pipfile.loc
: contains the dependencies to run the project.
pneumonia-class.tflite
: Model with TensorFlow lite
process_data.py
: Python script to process an url with the image and return a prediction
test.py
: Python script to test the prediction service using AWS.
-
Clone the repo
-
Download the data from kaggle
-
Install the dependencies
pipenv install
- Activate the virtual enviroment
pipenv shell
Run the train.py
file to obtain the best model for the training parameters as a .h5
file and convert to tflite file.
To make easier for you to run the training file you can go to this kaggle notebook that replicates the
train.py
file, so you don't need to download the data
Run the docker file:
First build the model:
docker build -t pneumonia-model .
Run the docker image
docker run -it --rm -p 8080:8080 pneumonia-model:latest
Run the prediction service: Open a new command line (make sure you are running the docker file)
python test.py
The test.py
already have an x-ray image link to return a prediction.
you can change the link to make a different prediction (some times do not work to take the link directly, you can just take a screenshot and upload to github
AWS
pre-requisets needs to have AWS CLI installed which is command line to interact with AWS ( I have a windows and working with WSL, so I download the cli using the linux command)
Place to store your container
Create repo View push command
Go to security credentials and find the access key to configure your AWS
run in your command line: aws configure
and type your credentials from the above step
run:
Create the repo to store the image
aws ecr create-repository --repository-name pneumonia-class-images
Obtain the URI of the
xxxxxx2.dkr.ecr.us-west-2.amazonaws.com/pneumonia-class-images
Set at the command line
$(aws ecr get-login --no-include)
ACCOUNT=xxxxxxx
REGION=us-west-2
REGISTRY=pneumonia-class-images
PREFIX=${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/${REGISTRY}
TAG=pneumonia-class-model-v1-001
REMOTE_URI=${PREFIX}:${TAG}
Push the docker image to AWS
docker tag penumonia-model:latest ${REMOTE_URI}
docker push ${REMOTE_URI}
Create the lambda function
Browse the image
For deep learning task we need to increase the time of the response and the memory allocated to perform the function.
We need to go configuration -> General configuration and change the timeout to 30 seconds and the memory to 1024
- Use API Gateaway
- Select the POST method
- Integration type: lambda
- Select the lambda function
Go to actions and click on deploy
Now we just need to obtain the URL and add predict at the end:
Let's use some examples to demonstrate how the AWS lambda function service work:
The Demonstration_notebook.ipyb
shows how to run the predictions service.
The demonstration was made from independent articles that wasn't part of the training or testing example, but with similar characteristics of the patients using in the training.