Capstone project for Machine learning Nanodegree at Udacity
Quora is a popular website where people can ask and answer all kinds of questions. However, a lot of times people as similar or almost identical questions, which make searching for the best answer difficult. In this project, I have developed a supervised learning algorithm to detect duplicate questions on Quora.
- numpy
- pandas
- wordcloud
- sklearn
- matplotlib
- os
- collections
- xgboost
- graphviz
All the run scripts are in capstone.ipynb file.
Both training data and test data are available at Kaggle Quora Competition Website. Unfortunately they are too large to be stored in this repository.
Using XGboost model with 6 features I am getting logloss ~ 0.39.