The Respiratory Health Classifier is a telemedical online lung auscultation system that automatically returns the probability of having healthy airways.
- Motivation
- Project status
- Training and validation data
- Programming
- Server/cloud
- Confidentiality
- Demo
- Local editing
- Authors
- Links
- Acknowledgements
Seven percent of the humanity suffer from chronic respiratory diseases, mainly COPD. 3.9 million persons died from them in 2017, and even more became disabled. Before COVID19, the one-year global incidence of acute respiratory diseases was already near to 100%. COPD is the third and lower respiratory infections the fourth global leading cause of death.
Telehealth reduces the burden on medical resources, saves the patients´ time and money, and is easily accessible. Telemedical care increased during the COVID-19 pandemic. The evolution of convolutional neural networks further contributes to this increase.
Training and validation data | |
---|---|
ICBHI data | ✓ |
Steth data | ✓ |
HF_Lung_V1 data | |
Microphone data | |
Training data from the apps |
Network programming | ||
---|---|---|
Model1: | - distinction between healthy and suspicious | ✓ |
- area under the receiver operating characteristic curve | ||
Model2: | - is acute or chronic more probable? | |
Model3: | - most probable diagnosis |
Performance of model1 given a recall of 95% for the respiratory ill | |
---|---|
Threshold for the predicted probability | 52% |
Recall for the respiratory healthy | 67% |
Accuracy level reached (0 to 4) | 3 |
Client app programming | App1 | App2 | App3 |
---|---|---|---|
Main functionality | ✓ | ✓ | |
Full functionality | ✓ | ||
Table for further improvement | ✓ | ||
Communication with the model | ✓ | ||
More factors than respiration and smoking |
The ICBHI and Steth datasets include a total of 238 participants (37% girls/women) from infancy to >90 years old. 177 suffered from respiratory diseases, of which 124 were chronic, predominately COPD (75 cases) and asthma (34 cases). The participants provided 25618 seconds of respiratory auscultation, namely 2570 by the respiratory healthy and 23048 by the respiratory ill. The breath cycles lasted - on the average of individual averages of the ICBHI set - 2.4 seconds in the healthy, 2.2 in the acutely ill, and 3.0 in the chronically ill.
The most important programming steps are a deep convolutional neural network and client apps.
The units for training the network are equally large images obtained from the wav audio files as follows:
- slicing to eight-second audio chunks with 90% overlap for sounds from respiratory healthy and 10% for such from respiratory ill persons; i.e. data from the healthy is augmented more than that from the diseased to increase its proportion to nearly 50%
- zero-pre-padding of residual pieces, split-up of noises into volumes by Fourier transformation, and conversion into spectrograms
- further augmentation with random volume reduction (darkening), random frequency masking (horizontal bars of zeros), and random time frame masking (vertical bars of zeros)
Transfer learning with ResNet50V2 pre-trained on the ImageNet Classification Problem:
- Tensor flow python module
- 47 convolutional layers, a max pooling layer, an average pooling layer, and a fully connected output layer with 1000 nodes and softmax activation function
- replacement of the original output layer by a densely-connected output layer with 2048 weights, sigmoid activation function, and one final output
- first freezing of the original layers and training of the new output layer, then retraining of the entire network with a 40 times smaller initial learning rate
App1 is built with HTML, CSS, and JavaScript, app2 with Streamlit, app3 will run locally on mobile phones. They include:
- during the initial state:
- a train button with instructions to improve the quality
- a more button to this readme file
- a contact button to the members of this project
- a scroll menu to collect more heterogenous training data
- a button to start the 16-second recording phase
- during the return state:
- the probability of having health airways with and without smoking
- thresholds below which consulting a physician is recommended respectively urgent (traffic light system)
- a copy of the record for the user
- a table with possibilities to further improve the recording quality
Currently Streamlit and GitHub
The additional training data collected within this project is anonymous. Audio files with other content than breath cycles, such as voices, etc., are immediately deleted. The users have the possibility to indicate if they
- suspect an airway illness.
- have sound airways.
- have sick airways.
In the latter case, they can further check one of these options:
- any respiratory illness.
- a diagnosis out of a list of of the most common acute and chronic respiratory diseases.
That´s all. The function to collect data will be implemented later.
Clone the project
git clone git@github.com:loukra/Respiratory_Disease_Classification.git
Go to the project directory
cd Respiratory_Disease_Classification
Create Virtual Envirnment
pyenv local 3.9.8
python -m venv .venv
source .venv/bin/activate
Install dependencies
pip install --upgrade pip
pip install -r requirements.txt