
Specifying the Data Analytic Question

DATASET 1 : Predict the number of Survivors in Titanic Disaster based on train and test datasets

DATASET 2 : Predict whether an Email is a Spam or Not

Metric for Success

Confusion matrix, Create a classification Report, Compute an Accuarecy score of 80%

Understanding the context


comprise of passenger's information that was captured in the Titanic Disaster. The information about those who survived and those who didn't survive. The Ones that survived are denoted as 1 while those who died denoted by 0. The dataset was sourced from Kaggle.


This dataset has different messages and we are tasked to predict whether an email is a spam or not. The was sourced from Kaggle

Experimental Design

Data mungling Exploratory Data Analysis Feature Engineering Classifing Model K Nearest Neighbor Naive Bayes Create a confusion matrix Create a classification Report that will show

  • Recall
  • Precesion
  • fi score
  • Support
  • Accuarcy

Done by : SamwelJane: samwelmwangi31@gmail.com