jensdebruijn/Multimodal-flood-tweet-classification
While text classification can classify tweets, assessing whether a tweet is related to an ongoing flood event or not based on its text remains difficult. Inclusion of contextual hydrological information could improve the performance of such algorithms. In this study, we designed a multilingual multimodal neural network that can effectively use both textual and hydrological information. The classification data was obtained from the Twitter-streaming API using flood-related keywords in English, French, Spanish and Indonesian. Subsequently, hydrological information was extracted from a global precipitation dataset based on the tweet’s timestamp and locations mentioned in its text. We performed three experiments analyzing precision, recall and F1-scores while comparing a network that uses hydrological information against a network that does not. Results showed that F1-scores improved significantly across all experiments. Most notably, when optimizing for precision the network with hydrological information could achieve a precision of 0.91 while the network without hydrological information failed to effectively optimize. Moreover, this study shows that including hydrological information can assist in the translation of the classification algorithm to unseen languages. Tweets filtered using this network can be used to more effectively organize disaster response, validate and calibrate flood risk models, and task satellites among other applications.
PythonMIT