/Question_Classification

LSTM in PyTorch for identification of divisive blog posts on Quora

Primary LanguageJupyter Notebook

Question_Classification

Notebooks dealing with the identification of insincere posts on websites. Quora uses a combination of machine learning and manual review to identify toxic content. For this project around 1300000 questions from Quora are used to develop and test a model. For the data see the Kaggle challenge [https://www.kaggle.com/c/quora-insincere-questions-classification].

Two approaches have been implemented. The first implements tokenization from scratch, whereas the second uses the Keras library.

The net is implemented in PyTorch. It consists of an LSTM cell and two linear layers.

Files

  • Exploration

  • Questions_questions_final

  • quora_questions_classifier - model from final run