These were categorized according to their application purposes as following :-
- Model Architectures and Deep Learning based Libraries
- tensorflow
- keras
- Evaluation Metrics and Statistical Analysis based Libraries
- sklearn
- scipy
- Image Processing and Data Visualization based Libraries
- opencv (cv2)
- pillow (PIL)
- matplotlib
- Matrices and Data Engineering based Libraries
- numpy
- pandas
- System and Directory Management based Library
- os
- tqdm
For the ease of Training and Implementation flow, following structure would be adviced :-
|--Dataset Folder/
|--X_Ray_Train/
|--COVID/
|--x_c1_image_train
|--x_c2_image_train
...
|--Non_COVID/
|--x_nc1_image_train
|--x_nc2_image_train
...
|--X_Ray_Test/
|--COVID/
|--x_c1_image_test
|--x_c2_image_test
...
|--Non_COVID/
|--x_nc1_image_test
|--x_nc2_image_test
...
|--CT_Scan_Train/
|--COVID/
|--ct_c1_image_train
|--ct_c2_image_train
...
|--Non_COVID/
|--ct_nc1_image_train
|--ct_nc2_image_train
...
|--CT_Scan_Test/
|--COVID/
|--ct_c1_image_test
|--ct_c2_image_test
...
|--Non_COVID/
|--ct_nc1_image_test
|--ct_nc2_image_test
...
-
Arrange the Dataset according to the adviced structure, mentioned above.
-
Run
Dataset_PreProcessing.py
, which consist of 2 functions, namingSample_Processing
andDataset_Processing
, with the following details :--
Sample_Processing(x_ray_directory,ct_scan_directory) :
x_ray_directory
--> A String of the X-Ray Image File Path with the name of the file and extension of it.ct_scan_directory
--> A String of the CT-Scan Image File Path with the name of the file and extension of it.
-
Dataset_Processing(x_base_directory,ct_base_directory) :
x_base_directory
--> A String of the X-Ray Folder Path (till COVID or Non-COVID Folder).ct_base_directory
--> A String of the CT-Scan Folder Path (till COVID or Non-COVID Folder).
-
-
Run
Embeddings_Generator.py
, which consist of 2 functions, namingEmbedding_Model
andEmbedding_Save
, with the following details (Replace Model_Name with keras-based Model Classes like keras.applications.VGG16, keras.applications.InceptionV3, etc.) :--
Embedding_Model(Model_save_directory,Model_Class,Dataset_train_Directory,Dataset_test_Directory) :
Model_save_directory
--> A String of the Transfer Learning Model Path, to be saved in future, with the name of file and extension (.h5 or .hdf5) of it.Model_Class
--> A Class Name of the Transfer Learning Model, like keras.applications.VGG16 (To be changed by the user in Function File)Dataset_train_Directory
--> A String of the Train Dataset Directory Path (till x_ray_train or ct_scan_train folder).Dataset_test_Directory
--> A String of the Train Dataset Directory Path (till x_ray_test or ct_scan_test folder).
-
Embedding_Save(Embed_save_Directory,Trained_Model_Class,Dataset_Directory,Image_Size) :
Embed_save_Directory
--> A String of the Directory Path (with the name of file and extension (.npy) of it) where the Embeddings would be saved.Trained_Model_Class
--> The Class of the Trained Transfer Learning Model, which would be loaded from the user side using keras.models.load_model(<--Model_Path-->).Image_Size
--> The Size of images to be taken as input (331 for NasNetLarge and 224 for others) during embedding generation.Dataset_Directory
--> A String of the Dataset Directory Path (till x_ray_train, ct_scan_train, x_ray_test or ct_scan_test folder).
-
-
Run
Task_Specific_Feature_Extractor.py
, which consist of 2 functions, namingx_ray_task_specific
andct_scan_task_specific
, with the following details :--
x_ray_task_specific(Model_save_directory,train_embed,test_embed,train_labels,test_labels) :
Model_save_directory
--> A String of the X-Ray Task Specific Feature Extractor Path, to be saved in future, with the name of file and extension (.h5 or .hdf5) of it.train_embed
--> A numpy array of embeddings from training dataset images.test_embed
--> A numpy array of embeddings from testing dataset images.train_labels
--> A numpy array of task-specific labels (whether COVID or Non-COVID) from training dataset images.test_labels
--> A numpy array of task-specific labels (whether COVID or Non-COVID) from testing dataset images.
-
ct_scan_task_specific(Model_save_directory,train_embed,test_embed,train_labels,test_labels) :
Model_save_directory
--> A String of the CT-Scan Task Specific Feature Extractor Path, to be saved in future, with the name of file and extension (.h5 or .hdf5) of it.train_embed
--> A numpy array of embeddings from training dataset images.test_embed
--> A numpy array of embeddings from testing dataset images.train_labels
--> A numpy array of task-specific labels (whether COVID or Non-COVID) from training dataset images.test_labels
--> A numpy array of task-specific labels (whether COVID or Non-COVID) from testing dataset images.
-
-
Decide whether to make Shared Features Extractors with or without Adversarial Training. Depending upon the choice, following would be the implementation :-
-
Without Adversarial Training, Run
Shared_Features_Without_Adversarial.py
, which consist of 2 functions, namingmlp_model
andmlp_train
, with the following details :-mlp_model()
--> It would make the MLP model with pre-defined parameters and architecture.mlp_train(Model_save_Directory,x_train_embed,x_test_embed,ct_train_embed,ct_test_embed,train_labels,test_labels)
--> A function to Train the pre-defined MLP Model with the help of training and testing embeddings of X-Ray and CT-Scan with their corresponding Task-Specific Labels (Whether COVID or Non-COVID), all being of the form of numpy array.
-
With Adversarial Training, Run
Shared_Features_Extractor.py
, which consist of some functions, with their following details :-gen_model()
anddisc_model()
--> These Functions would produce the MLP Architecture with pre-defined parameters.disc_loss_func(actual_embed,gen_embed,train_labels,invert_labels)
--> Discriminator Loss Function where the actual input and generator's output go alongside Actual training labels and inverted labels (interchanging classes --> Whether X-Rays or CT-Scans), all of the form of numpy arrays.gen_loss_func(gen_embed,invert_labels)
--> Generator Loss Function where the Generated Output and inverted class labels (Whether X-Rays or CT-Scans) go, all of the form of numpy arrays.train_step(actual_embed,old_gen_loss,old_disc_loss,train_labels,invert_labels)
--> Adversarial Training happens here, alongside Actual Input, Old Values of Generator and Discriminator Loss, with Training Labels and Inverted Class Labels, all of the form of numpy arrays.fine_tune(Model_save_Directory,model_gen,x_train_embed,x_test_embed,ct_train_embed,ct_test_embed)
--> Here, the Shared Feature Extractor Module would be fine tuned and best model iteration would be saved according to "Model_save_Directory", with Class Labels (Whether X-Rays or CT-Scans).
-
-
Run
Classifier_Head.py
, which consist of 2 functions, namingClassifier_Head
andFinal_Embeddings
, with the following details :--
Classifier_Head(Model_save_Directory,train_embed,test_embed,train_labels,test_labels) :
Model_save_Directory
--> A String of the Classifier Head Path, to be saved in future, with the name of file and extension (.h5 or .hdf5) of it.train_embed
--> A numpy array of embeddings from training dataset images, after passing through Task Specific and Shared Features Extractor Modules.test_embed
--> A numpy array of embeddings from testing dataset images, after passing through Task Specific and Shared Features Extractor Modules.train_labels
--> A numpy array of Labels (Whether COVID or Non-COVID) for training dataset images.test_labels
--> A numpy array of Labels (Whether COVID or Non-COVID) for testing dataset images.
-
Final_Embeddings(<----Arguments---->) :
shared_model
,x_task_model
,ct_task_model
--> Model Classes of Shared Features Extractor and Task-Specific X-Ray and CT-Scans Feature Extractors, loaded using keras.models.load_model(<--Model_Path-->).shared_x_ray_train_input
,shared_x_ray_test_input
,shared_ct_train_input
,shared_ct_test_input
--> Numpy arrays of Embeddings Input for Shared Features Extractor Modules.task_x_ray_train_input
,task_x_ray_test_input
,task_ct_train_input
,task_ct_test_input
--> Numpy arrays of Embeddings Input for Task-Specific Features Extractor Modules.
-
-
Run
User_Customized_Pipeline.py
, which consist of some functions, with the following details :-- Pre_Process(prompt,x,ct) :
prompt
--> A Parameter which define the input conditions.x
--> Greyscaled X-Ray Image Matrix, loaded using cv2.imread(<---Image Path---> , 0)ct
--> Greyscaled CT-Scan Image Matrix, loaded using cv2.imread(<---Image Path---> , 0)
- Task_Specific_Embed(Save_Folder_Directory,x,ct) :
Save_Folder_Directory
--> A String of the Container Folder Path where each component of the Pipeline was saved.x
--> Pre-Processed X-Ray Image Matrix.ct
--> Pre-Processed CT-Scan Image Matrix.
- Shared_Feature_Embed(Save_Folder_Directory,x,ct) :
Save_Folder_Directory
--> A String of the Container Folder Path where each component of the Pipeline was saved.x
--> Pre-Processed X-Ray Image Matrix.ct
--> Pre-Processed CT-Scan Image Matrix.
- Classification(Save_Folder_Directory,x,ct) :
Save_Folder_Directory
--> A String of the Container Folder Path where each component of the Pipeline was saved.x
--> Pre-Processed X-Ray Image Matrix.ct
--> Pre-Processed CT-Scan Image Matrix.
- Test_Run(Save_Folder_Directory,prompt,X_Ray_Grey_Image,CT_Scan_Grey_Image) :
Save_Folder_Directory
--> A String of the Container Folder Path where each component of the Pipeline was saved.X_Ray_Grey_Image
--> Greyscaled X-Ray Image Matrix, loaded using cv2.imread(<---Image Path---> , 0)CT_Scan_Grey_Image
--> Greyscaled CT-Scan Image Matrix, loaded using cv2.imread(<---Image Path---> , 0)prompt
--> A Parameter which define the input conditions.
- Pre_Process(prompt,x,ct) :
-
For Evaluating and Analysing the Generalization of Implementation, consider using
Evaluation_and_Significance_Test.py
, which consist of 2 functions, namingEvaluation_metrics
andSignificance_Test
, with the following details :--
Evaluation_metrics(prob_list,labels,threshold=0.5) :
prob_list
--> List of Output Probabilities after implementing the whole pipeline.labels
--> Output Classes vector.threshold
--> Confidence Probability (generally 0.5 for ideal sigmoid scenario).
-
Significance_Test(Save_Directory,task_train_embed,shared_train_embed,task_test_embed,shared_test_embed,train_labels,test_labels) :
Save_Directory
--> A String of Sample Run Iterations, with the name and extension (.h5 or .hdf5) on it.task_train_embed
,shared_train_embed
--> Numpy arrays of Task Specific and Shared Features Embeddings from Training Data.task_test_embed
,shared_test_embed
--> Numpy arrays of Task Specific and Shared Features Embeddings from Testing Data.train_labels
,test_labels
--> Numpy arrays of Training and Testing Labels (Whether COVID or Non-COVID)
-
In order to start with the suggested Dataset from the Research Paper, access this link and use it accordingly, with Label Annotations as train.csv
and test.csv
. As it was already pre-processed, Running Dataset_PreProcessing.py
won't be a requirement.
https://drive.google.com/drive/folders/1dNGICFvzV1gF66QIz8RVizMeyaON6APp?usp=sharing
Run Pre_Trained_Pipeline.py
with the Directory Paths mentioned accordingly.
Pre-Trained Model Weights ---> https://drive.google.com/file/d/1tHIDYe83hk4k0y511zx8ELh7UlQ2KHWP/view?usp=sharing
- https://keras.io/api/applications/
- https://docs.opencv.org/4.5.1/d2/d96/tutorial_py_table_of_contents_imgproc.html
- https://www.youtube.com/watch?v=tX-6CMNnT64&list=LL&index=102
- https://docs.scipy.org/doc/scipy/reference/stats.html
- https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
- https://towardsdatascience.com/hypothesis-testing-in-machine-learning-using-python-a0dc89e169ce