Notebooks dealing with the identification of insincere posts on websites. Quora uses a combination of machine learning and manual review to identify toxic content. For this project around 1300000 questions from Quora are used to develop and test a model. For the data see the Kaggle challenge [https://www.kaggle.com/c/quora-insincere-questions-classification].
Two approaches have been implemented. The first implements tokenization from scratch, whereas the second uses the Keras library.
The net is implemented in PyTorch. It consists of an LSTM cell and two linear layers.
-
Exploration
-
Questions_questions_final
-
quora_questions_classifier - model from final run