About Quora-Duplicate-Question-Prediction Application 👇

  • This Use-case lets quora to avoid the same/repetive questions to be answered over and over again. There by increasing customer experience.
  • Application is an binary classification where we've two questions entered in the form and application classifies/predicts whether the given two questions are duplicate or not!
  • Dataset link: https://www.kaggle.com/c/quora-question-pairs/data.
  • Train data: 4,04,290 are the pairs of quora question.
  • Trained using Random Model, Logistic Regression, Linear SVM, & xgboost Machine Learning Models.
  • xgboost offered accuracy: train-logloss: 0.34470 | valid-logloss: 0.35853.
  • Evaluation Metrics: Log loss & Confusion matrix .

Demo video 📹

Tech stack 🧑‍💻

Machine Learning | Python | AJAX | Bootstrap | VS CODE IDE 💻

Application directory structure 📁

Note: Not to worry if few files are missing in the repo, those file can be generated by running python source files.

Dataset Directory files 📁

How to run ? 🏃

  1. Clone/Download the project from github.
  2. create new conda/python environment, activate the environment created and install application dependencies via terminal command in root directory of the application by "pip install -r requirments.txt".
  3. Run 0.8_prediction.py file.

License 📖



Connect with me 😀