Quora is a platform that empowers people to learn from each other. On Quora, people can ask questions and connect with others who contribute unique insights and quality answers.
This is a Kaggle Competition : Quora Insincere Questions Classification.. We will be predicting whether a question asked on Quora is sincere or not. An insincere question is defined as a question intended to make a statement rather than look for helpful answers.
You can install dependencies by running the following command in colab notebook:
#To install pydrive
!pip install -U -q PyDrive
To download Kaggle dataset directly to google colab disk:
- Sign in to Kaggle.
- Download kaggle json file.
- In google colab, upload that file.
- Then install the required packages:
!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
Now, you can download the dataset:
!kaggle competitions download -c quora-insincere-questions-classification
To install requests package:
!pip install requests
There are Two datasets - 1) train data 2) Test data.
Train Data has 1.3m rows, with 3 columns - qid, question_text, target.
Test data has 376k rows with only 2 columns - qid, and question_text.
Sentiment Analyses of the questions have been done using different Recurrent Neural Network(RNN) units like, Gated Recurrent Units(GRU), and Long Short-term Memory(LSTM), and Convolutional Neural Network. We trained the model using different hyper-parameters(like, number of convolutional and dense layers, filter sizes, threshold value) to find the model with highest F1 score, since it is a skewed data.
S.NO | RNN Unit | Convolutional block | Filter size | #Dense Layer | Threshold | Public Dataset F1 Score | Private Dataset F1 Score |
---|---|---|---|---|---|---|---|
1 | LSTM | 1 | 64 | 1 | 0.299999 | 0.61660 | 0.61996 |
2 | GRU | 1 | 128 | 1 | 0.299999 | 0.63823 | 0.64841 |
1) Quota_Insincere_Question.ipynb: This notebbok is used for hyper parameter tunning. You can run this on google colab.
2) Kaggle Submission CuDNN GRU F2 Threshold.py: This file is used for training with GRU units, with specific hyper parameters, shown in the above table and for creating kaggle submission file.