/NLP-for-disaster-tweets

nlp, word2vec, deep learning

Primary LanguageJupyter Notebook

NLP for Disaster Tweets

This project is my version of reproducing the work of a research paper submitted August 12, 2016 by a team of reserachers from Qatar Computing Research Institute: "Rapid Classification of Crisis-Related Data on Social Networks using Convolutional Neural Networks" by Dat Tien Nguyen, Kamela Ali Al Mannai, Shafiq Joty, Hassan Sajjad, Muhammad Imran, Prasenjit Mitra.

What this paper is about?

This paper introduced neural network based classification methods for binary and multi-class tweet classification task. It makes use of out-of-event data in the early hours of a disaster and achieved a better result with CNN (compared to RF, LR, and SVM)

What did I do and why?

I reproduce this paper for learning purpose to get myself more proficient with text mining techniques and gain a deeper understanding of neural nets.

Techniques, models and concepts used and explained in this project are:

  • preprocessing techniques for text data
  • tokenization
  • word vector and embeddings
  • word2vec
  • Bag of words model
  • Tagging
  • TFIDF
  • PCA
  • Convolutional Neural Network
  • Recurrent Neural Network
  • Clustering

How to use the program

The raw dataset is from CrisisNLP. The concatenated clean data version of 9 major events is under data folder.