Kaggle - Natural Language Processing with Disaster Tweets
Kaggle Challenge
Notes taken from the dataset
- id (1 to 10.9k)
- 222 unique values (keyword) out of 7503 unique values (text)
- keyword = word that shows disaster
- Q: How does locations affect keywords and how they interact?
- Maybe subset each location and see which word is more common for each
location. Stratify for each location
- Find ways to detect metaphors of keywords --> Metaphors can lead to false
positives
- Look at words around they keywords
- Maybe depending on the emojis it can see if the message is bad or not. e.g. happy emoji does not mean danger (DO THIS LATER):
- Exclamation mark (❗️ ) could indicate danger tweet
Algorithms Result (See which one is the best)
- Long Short Term Memory Networks (LSTM)