A notebook on the "Pima Indians Diabetes Database" from the National Institute of Diabetes and Digestive and Kidney Diseases Link
- Calculated the "Five Number Summary" using
describe()
function. - Displayed the data information by
info()
function.
- Finding the missing and inconsistent values using:-
- Five Number Summary
seaborn.heatmap() function
- Choosing The best way to replace them for each column using
hist()
function. - Replacing them using
fillna()
with themean()
andmedian()
functions.
- Calculated "Pearson's Correlation Coefficient" and plotted it using
seaborn.heatmap()
function, To check for any not needed features.
- Used
StandardScaler()
to normalize the data using the Z-Score method.
- Found the best value for K by looping in a range (1 ==> 27).
- Best value of K is 9 with accuracy of 80.51948 %
- Built the KNN model using
K = 9
. - Calculated the "Confusion Matrix" using
metrics.confusion_matrix()
function, Then plotted it usingseaborn.heatmap()
function.