/Titanic-Survivors-predictions

Using Kaggle Datasets of Titanic Machine Learning Competition to predict the survivors

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Titanic-dataset

Using the titanic data to predict the survival of the passengers. WorkFlow of the project (work still in progress)

  1. Loading Libraries a. Numpy b. Pandas c. Matplotlib and seaborn d. sklearn for accuracy and algorithms with data-preprocessing purposes

  2. Exploratory Data Analysis -Exploring the data like how many rows and columns shape of training and testing data, finding the missing values in the dataset

-Dummy encoding done on the categorical data.

-For Certain algorithms to work we must normalize the data so I have normalized using StandardScaler method

  1. Training and Testing of Data importing KNN, GaussianNB, DecisionTree etc.. libraries, train_test_split library for model selection and to avoid overfitting of the model used.

Optional- Data Visualization tried making notebook more interactive

Work in Progress!! got 0.77 accuracy so far, will be improving it.

To get a better understanding of the workflow of a Machine Learning project, have a read:

  1. sklearn documentation is also recommended.

  2. https://medium.com/analytics-vidhya/workflow-guide-to-machine-learning-c0545c843f04 (My blog on machine learning do read it!!)

  3. https://medium.com/@NotAyushXD/workflow-of-a-machine-learning-project-ec1dba419b94

  4. https://www.kaggle.com/digvijayyadav/titanic-codesprediction (Do upvote it if you like my kernel)