Pima Indians Diabetes

This repository contains an IPython notebook which introduces concepts involved in exploratory analysis, and visualization techniques, using the pima-indians-diabetes dataset offered by UCI. The notebook was created alongside tutorials in visualization and descriptive statistics.

Towards the end of the notebook after the data had been pre-processed, several machine learning models were trained on the dataset. These involved a k-Nearest Neighbours classifier, a Support Vector Machine with a radial basis function, and a Logistic Regression classification model - all of which could predict the onset of diabetes with over 70% accuracy.

For further work, it may be worthwhile to try stacking/ ensembling to combine multiple classification models together in order to improve their overall accuracy.