/malaria-imaging-investigation

Investigative data science/machine learning guided tutorial on Kaggle malaria imaging dataset.

Primary LanguageJupyter NotebookMIT LicenseMIT

Data Investigation: Malaria Cell Imaging


Description

This is a semi-guided project tutorial designed for use by current and future students of Make School at Dominican University of California.

This project explores, analyzes, processes, and performs classification inquiries across the 2018 Kaggle Malaria Cell Images dataset - the data was curated and comprehensively put together by the Lister Hill National Cell for Biomedical Communications.

Currently, this project is designed to walk through intermediate machine learning processes and pipeline analyses across its target dataset without serious emphasis on productionization or scaling of advanced data modeling.

This may change with future updates.

Project Hierarchy

A full project hierarchy tree structure for this investigative data science project is shown below.

ds-project-template-repository
│   README.md
│   LICENSE
│   .gitignore
│
└───datasets
│   │
│   └───raw 
│   │   
│   └───external
│   │   
│   └───interim
│   │   
│   └───processed
│
└───models
│
└───notebooks
│   │   01-exploratory-data-analysis.ipynb
│   │   02-intermediate-data-processing.ipynb
│   │   03-predictive-data-modeling.ipynb
│
└───production
│   │   
│   └───data
│   │   
│   └───models
│   │   
│   └───visualizations
│   
└───references
│   
└───reports
│   │   
│   └───figures
│
└───structures
│   

Dependencies

General dependencies, such as NumPy and Pandas, are listed below.

Credits

Thanks to the Make School community of students and professionals seeking to learn software engineering and data science for real-world applications.

Thanks to the Lister Hill National Cell for Biomedical Communications and the National Institutes of Health for the curation, productionization, and distribution of research datasets applicable to the field of diagnostics and medicine.

Thanks to Arunava Chakraborty for his contributions on Kaggle to allow peer-to-peer access to this project's dataset and usefully applicable machine learning pipelines to creatively inspire our own investigation.

Finally, special thanks go to two groups of individuals: to Joseph Catanzarite and Adam Braus for mentoring this project and administering oversight for Make School's current generation of aspiring data scientists and quantitatively literate software engineers; and to Hani Jandali and Yin Chang for going beyond student responsibilities to assist in testing and providing valuable feedback for the development and presentation of this project.

License

The content of this project itself and the source code used to format and display that content are both licensed under the MIT license.


This project is constructed and maintained by Aakash Sudhakar.