Covid-Prediction-Lung-CT

A simple framework to detect the Covid-19 by analyzing the lung scans CT.

Problem Description

The goal of this research is to train a classifier to recognize Covid-19 positive patients from their CT lungs scans in order to support the physician’s decision process with a quantitative approach.

Dataset

Original positive data: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742
Original negative data: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969771#80969771bcab02c187174a288dbcbf95d26179e8
Link to our custom dataset: https://drive.google.com/file/d/112rnDTasnEIYrHm6E2sd-EgNbi-UKuKV/view?usp=sharing

Pipeline

The general pipeline process is:

Slices Selection
Mask Generation
Mask Fill
Histogram Equalization and Filtering
Haralick Features Extraction
Feature Reshaping
Feature Reduction through PCA
Feeding to the Classifier

Features extraction and PCA

We obtain three 4x13 matrices that we reshape into a single vector of 156 features. We first attempt to perform feature selection by eliminating the least contributing features. However, the loss of information is excessive. So, given the high dimensionality, we opt for a feature synthesis approach and apply PCA to only retain the first 2 Principal Components as they alone explain ~89% of the total variability.

Model

Finally, the processed data is fed into the different Classifiers (SVM, Logistic Regression, Random Forest, Ensemble methods). All methods were tested with 5-fold Cross Validation and 80/20 train/test split stratified on the labels. SVM with a linear kernel obtained the best results both in terms of AUC(85%), Accuracy(81%), Precision (81%) and Recall(80%). The counfusion matrix and the AUC plot is reported below: