/Covid-Prediction-Lung-CT

A simple framework to detect the Covid-19 by analyzing the lung scans CT

Primary LanguageJupyter NotebookMIT LicenseMIT

Covid-Prediction-Lung-CT

A simple framework to detect the Covid-19 by analyzing the lung scans CT.

Problem Description

The goal of this research is to train a classifier to recognize Covid-19 positive patients from their CT lungs scans in order to support the physician’s decision process with a quantitative approach.

Dataset

Pipeline

The general pipeline process is:

  1. Slices Selection
  2. Mask Generation
  3. Mask Fill
  4. Histogram Equalization and Filtering
  5. Haralick Features Extraction
  6. Feature Reshaping
  7. Feature Reduction through PCA
  8. Feeding to the Classifier

Pipeline image

Features extraction and PCA

We obtain three 4x13 matrices that we reshape into a single vector of 156 features. We first attempt to perform feature selection by eliminating the least contributing features. However, the loss of information is excessive. So, given the high dimensionality, we opt for a feature synthesis approach and apply PCA to only retain the first 2 Principal Components as they alone explain ~89% of the total variability.

Model

Finally, the processed data is fed into the different Classifiers (SVM, Logistic Regression, Random Forest, Ensemble methods). All methods were tested with 5-fold Cross Validation and 80/20 train/test split stratified on the labels. SVM with a linear kernel obtained the best results both in terms of AUC(85%), Accuracy(81%), Precision (81%) and Recall(80%). The counfusion matrix and the AUC plot is reported below:

Confusion matrix

AUC