/Health_Care_App

Primary LanguageJupyter Notebook

Health_Care_App

TABLE OF CONTENTS


BACKGROUND

Machine learning is a subset of artificial intelligence utilizing mathematical and statistical methods to identify patterns in data in an automated fashion. Numerous aspects of clinical practice lend themselves to computational tools to assess disease pathology, identify anomalies, triage critical patients, and various other tasks, but the scope of this article is limited to supervised learning to constrain the discussion, develop concrete examples, and because it represents the majority of clinical machine learning research.

In the context of supervised machine learning, models are fit to data, thereby learning relationships between input features and output targets. Input data represent digital encodings of, for example, X-rays, lab tests, electrocardiograms, or various other clinical data streams. The output could be a diagnostic label, a region of interest, length of stay, etc. For pedagogical ease, throughout this article, the classification of lung nodules will be used as a reference example.

The inputs to this nodule classifier are computed tomography (CT) images, but other modalities could have been used (e.g., X-ray or ultrasound). Each input image is associated with a two-class binary label (i.e., 0 or 1, indicating the absence or presence of calcified nodules, respectively). There is nothing special about the binary label; in other clinical applications, the label could represent several discrete classes (e.g., different types of lung nodules or disease stages) or be a continuous output as in regression (e.g., length of hospital stay, lab tests with continuous ranges).

Once CT images and associated labels are sourced and validated, a trained model learns relations between the image features (e.g., edges, contours, etc.) and their binary class (i.e., a positive or negative finding). However, this trained model may have also learned idiosyncratic features specific to the provided image and label pairs, which are not generally true in other data from the same modality (in this case, CT images). This generalization brittleness occurs for many reasons, including equipment with different noise sources (across different manufacturers), out-of-calibration effects, selection bias, population differences, and many others. Building generalizable models is paramount to clinical research. After all, the radiologist who developed the training data/labels can go to another hospital and provide the same expertise. At the same time, a working model at one medical center can fail at another. Therefore, it becomes key to understand the issues that might arise during the model training, validation, and testing processes.


Objective

  • To build disease classification models using Deep Neural networks and Random Forrest Classifier
  • To preprocess images using CV2 and improve model performance
  • To integrate trained models and create an app using flask

TOOLS

Task Technique Tools/Packages Used
Data Pre-processing and EDA Image normaliaztion, Noise removal, Data Creation for Covid cv2, shutil, sklearn, pandas, numpy
Model Developement feature_selection, model_selection, model construction, optimization, neural network tunning, performance evaluation Tensorflow, xgboost, sklearn
Data Visualization Multi-attribute plots, heatmaps, correlation plots matplotlib, seaborn
Environments & Platforms MS Excel, Jupyter Notebook, Tensorflow, Pycharm


DATA-VISUALIZATION

Brain Tumour

Preprocessing Augmented Images

Mountains

Covid Model Performance


Vaccine Conversation Trends


Application with Flask

Preprocessing Augmented Images


Augmented Images

  Output

Augmented Images Augmented Images


RESULTS

Disease Classifier Type Accuracy
Pneumonia CNN 83.17%
Heart Disease XGBoost 86.96%
Diabetes Random Forest 89.8%
Alzheimer CNN 83.54%
Breast Cancer Random Forest 91.81%
Brain Tumor CNN, VGG16 96.5%
COVID-19 CNN 93.5%

CONCLUSION

Created seven disease classification models with TensorFlow, Random Forest and XGBoost to analyse patients’ medical records, achieving over 90% accuracy.Improved the accuracy of deep neural networks by 30% with image data augmentation and transfer learning

REFERENCES

  • Disease Classification Using Machine Learning Algorithms - A Comparative Study S.Leoni Sharmila1,∗ , C.Dharuman2 and P.Venkatesan3 1,2 Department of Mathematics, SRM University, Ramapuram Campus, Chennai - 600 089, India.
  • Development of machine learning model for diagnostic disease prediction based on laboratory tests Dong Jin Park, Min Woo Park, Homin Lee, Young-Jin Kim, Yeongsic Kim & Young Hoon Park
  • Machine-Learning-Based Disease Diagnosis: A Comprehensive Review Md Manjurul Ahsan,1, Shahana Akter Luna, and Zahed Siddique
  • A Review of Challenges and Opportunities in Machine Learning for Health Marzyeh Ghassemi,, Tristan Naumann, Peter Schulam, Andrew L. Beam , Irene Y. Chen, Rajesh Ranganath

  • CHALLENGES-AND-FUTUREWORK

    Challenges : Identifying package for tweet scraping and recognizing limitations on extraction, large execution times and runtime errors due to memory limitation for parts of data modeling. Medical information is difficult to come by. As a result, if the databases were made public, researchers would have access to additional information.

    Future Work

  • Explore new models to detect rare diseases
  • Number of active COVID cases, recoveries and deaths for the three months