Performed PCA and Randomized PCA, fine-tuned ResNet50 to achieve a 94% accuracy in identifying tumor type in MRI images, and showcased human-interpretable feature reduction and mapping with Meta's DinoV2 model. Check main.ipynb
for full implementation.
The Brain Tumor MRI Dataset on Kaggle provides a comprehensive collection of human brain MRI images aimed at supporting the accurate detection and classification of brain tumors. Consisting of 7,023 images from three distinct datasets - figshare, SARTAJ, and Br35H - this dataset separates MRI scans of brains into four categories: glioma, meningioma, no tumor, and pituitary. The dataset has seen several changes according to the description on Kaggle, with some glioma images from the SARTAJ dataset being replaced due to inaccuracies, highlighting ongoing modifications being made to the dataset for improved reliability and data quality. With images from the Br35H dataset constituting the no tumor class, users should be mindful that the images in this dataset come in varied sizes, meaning that pre-processing and re-sizing must be employed for us to achieve consistent analysis and improved model accuracy.
Brain tumors as a whole pose severe risks given the confined space of the skull, whether or not they are malignant or benign. Growth of these tumors can potentially lead to brain damage and life-threatening situations. Timely detection and precise classification of these tumors are absolutely essential in guiding pre-emptive medical diagnosis before tumors signifcantly effect and harm a patient. With MRI's being a predominant imaging technique in this realm, there is a pressing need for advanced diagnostic models that can detect, classify by type, and pinpoint tumor locations effectively. This dataset, assembled from various sources and continuously refined, aims to provide a rich resource for researchers and data scientists to develop advanced machine learning models to aid in these critical diagnostic tasks.
This MRI dataset from Kaggle has been created with the specific intention to facilitate the development of models capable of detecting the presence of a brain tumor from MRI scans and classifying them by type. Medical practitioners and technicians can then use them as an advisory tool to make more precise diagnoses, leading to more targeted treatment options. Accurate labels are extremely important. It's crucial to acknowledge potential inaccuracies, such as those noted in the SARTAJ dataset, ensuring that machine learning models are trained on the most reliable data available.
For a machine learning approach in the domain of medical diagnosis to be clinically valuable, the algorithm must excel in capturing intricate details, variations, and nuances from MRI images. Given the collection of data from multiple sources in this Kaggle dataset, a successful algorithm should be adaptable, handling diverse imaging techniques and varying image sizes efficiently. Specifically, it should excel in tasks like tumor detection and type classification, ideally performing better than current diagnostic thresholds in accuracy and reliability. Furthermore, the emphasis on pre-processing, such as resizing images for uniformity, underscores the need for meticulous data preparation to enhance the model's performance. Ultimately, an ideal algorithm would bolster the confidence of medical practitioners in their diagnostic decisions, ensuring timely and effective interventions for patients.
Additionally, the tumor labels should ideally not come from initial diagnoses made by medical practitioners. Instead, the tumor label should be derived a considerable time after the MRI scan to ensure that we capture the ground truth about the existence and type of tumor for a patient, instead of preliminary assessments or biases. Using preliminary data would capture medical biases and cause the trained model to capture the exact same biases as a result. This delayed-labeling approach ensures the training data for machine learning models is as close to the eventual outcome as possible, enhancing the model's predictive accuracy and establishing value that can support medical practitioners and doctors in labeling true diagnoses.