This repository contains the my machine learning submissions made on Kaggle.
Data Science has always enticed me since the very beginning of my journey as a Computer Science undergraduate.The incredible power of Python as a
programming language to read, process,manipulate and visualize a humongous amount of data is paramount.Hence, I was also inclined to using Python as a Data Science Language
to visualizing data as well as making predictions by fitting Machine Learning models on training data.
After doing some courses on Data Science and Machine Learning from Udemy and Coursera, I wanted to really put my skills
to the test and hence bumped into Kaggle after learning about it from a friend.
Getting started on Kaggle can be a bit daunting. However, after gaining the knowledge of the basic regression techniques,one should be able to go in for The Titanic:Machine Learning from Disaster
Competition, the link for which is given below
Titanic:Machine Learning from Disaster.
The dataset for the Titanic Competition is divided into 3 files:
- The
gender_submission.csv
file. - The
train.csv
file. - The
test.csv
file.
The `gender_submission.csv` file conatins what the resultant output file needs to look like. The `train.csv` file conatins the training data whereas the `test.csv` contains the test data.
The model used for training the dataset is built on Logistic Regression
in python. The model got me to the top 73 % on the leaderboard.Still improving !!
This is my second competition on Kaggle. I solved this problem using Linear Regression techniques. This is again, an introductory problem on Kaggle.The link for this
problem can be found below:
House Prices: Advanced Regression Techniques
The dataset for the Titanic Competition is divided into 3 files:
- The
data_description.txt
file- that describes all the features in the training ans test data - The
train.csv
file. - The
test.csv
file. Also, asample_submission
file is provided. My output is contained inhouse_regression_analysis.ipynb
file and the final submitted file is thesubmission.csv
file.
The model used for training the dataset is built on Linear Regression
in python. The model got me to the top 82 % on the leaderboard.Still improving !!
My submission was ranked as follows: