Identifying Tweet about Natural Disaster

This project uses different NLP methods and classification models to predict whether tweets were talking about real natural disasters from a dataset from Kaggle. We processed data by tokenizing, lemmatizing and extracting features. Different NLP methods such as TF-IDF and Word Embeddings were implemented to build the model. After vectorizing the corpus, several classification models such as Logistic Regression, Naive Bayes, Decision Tree Classifiers, Random Forest Classifiers, and Multi-layer Perceptrons were used to predict the target variable. As a result, the Logistic Regression produced the best output with AUC of 0.86.

yangxiaohan57/classify-disaster-tweet

Identifying Tweet about Natural Disaster