A machine learning project that predicts skin cancer risk by analyzing lesion images and patient metadata. The system combines image and structured data processing to enhance diagnostic accuracy.
This project integrates convolutional neural networks (CNN) with a gradient boosting model to evaluate both visual and metadata features for skin cancer detection. The CNN analyzes lesion images, while the gradient boosting model interprets metadata like age and lesion characteristics. Both models' predictions are combined to produce the final outcome.
- Models: Stores trained model weights for prediction and models scripts.
- aws_integration: Contains AWS setup scripts for cloud deployment.
- notebooks: Jupyter notebooks detailing the data exploration, model training, and evaluation.
- src: Core source code for data processing, model training, and plotting.
The project uses the ISIC 2024 dataset, with over 401,000 annotated lesion images and associated metadata, such as patient age, lesion location, and lesion properties.
- Image Augmentation: Uses transformations like flipping, brightness adjustment, and noise to generalize the model for real-world variations.
- Metadata Engineering: Creates features that capture lesion shape, color contrast, and other key indicators.
- CNN: EfficientNet is used to extract visual features from lesion images.
- Gradient Boosting: A CatBoost classifier processes patient metadata, using a stratified K-fold approach to handle imbalanced data and reduce overfitting.
The system is deployed on AWS using:
- AWS Lambda and API Gateway for handling requests and responses.
- S3 for data storage.
- CloudWatch for monitoring model performance.
The interface accepts base64-encoded lesion images and patient metadata through an API. The processed data is analyzed, and the system returns a JSON response indicating the likelihood of malignancy.
For detailed project documentation, click here to view or download the PDF.