Quora-Insincere-Question-Classification

Quora is a platform that empowers people to learn from each other. On Quora, people can ask questions and connect with others who contribute unique insights and quality answers.

This is a Kaggle Competition : Quora Insincere Questions Classification.. We will be predicting whether a question asked on Quora is sincere or not. An insincere question is defined as a question intended to make a statement rather than look for helpful answers.

Dependencies

You can install dependencies by running the following command in colab notebook:

#To install pydrive
!pip install -U -q PyDrive

To download Kaggle dataset directly to google colab disk:

Sign in to Kaggle.
Download kaggle json file.
In google colab, upload that file.
Then install the required packages:

!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle

Now, you can download the dataset:

!kaggle competitions download -c quora-insincere-questions-classification

To install requests package:

!pip install requests

Dataset

There are Two datasets - 1) train data 2) Test data.

Train Data has 1.3m rows, with 3 columns - qid, question_text, target.
Test data has 376k rows with only 2 columns - qid, and question_text.

Sentiment Analysis

Sentiment Analyses of the questions have been done using different Recurrent Neural Network(RNN) units like, Gated Recurrent Units(GRU), and Long Short-term Memory(LSTM), and Convolutional Neural Network. We trained the model using different hyper-parameters(like, number of convolutional and dense layers, filter sizes, threshold value) to find the model with highest F1 score, since it is a skewed data.

S.NO	RNN Unit	Convolutional block	Filter size	#Dense Layer	Threshold	Public Dataset F1 Score	Private Dataset F1 Score
1	LSTM	1	64	1	0.299999	0.61660	0.61996
2	GRU	1	128	1	0.299999	0.63823	0.64841

Run the code

1) Quota_Insincere_Question.ipynb: This notebbok is used for hyper parameter tunning. You can run this on google colab.
2) Kaggle Submission CuDNN GRU F2 Threshold.py: This file is used for training with GRU units, with specific hyper parameters, shown in the above table and for creating kaggle submission file.

MohammadWasil/Quora-Insincere-Question-Classification

Quora-Insincere-Question-Classification

Dependencies

Dataset

Sentiment Analysis

Run the code