-
Capstone Document: https://github.com/azhenxuan/Quora-Question-Pairs/blob/master/Capstone%20Project.pdf
-
Python 2.7 (using Anaconda's distribution)
-
Data obtained from:
- Both train.csv and test.csv: https://www.kaggle.com/c/quora-question-pairs/data
- Google's News corpus: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit | mirror: https://github.com/mmihaltz/word2vec-GoogleNews-vectors
-
Libraries required:
- NLTK: https://anaconda.org/anaconda/nltk
- Keras (Tensorflow backend): https://anaconda.org/anaconda/keras
- python-Levenshtein: https://pypi.python.org/pypi/python-Levenshtein
- scikit-learn: https://anaconda.org/anaconda/scikit-learn
- matplotlib: https://anaconda.org/anaconda/matplotlib
- numpy
- pandas
- gensim: https://anaconda.org/anaconda/gensim
- tensorflow CPU/GPU: https://www.tensorflow.org/install/
-
Requirements file to replicate environment: https://github.com/azhenxuan/Quora-Question-Pairs/blob/master/requirements.txt
azhenxuan/Quora-Question-Pairs
Exploratory data anlaysis and machine learning modelling detecting for duplicate question pairs.
HTML