Cillian Berragan
[@cjberragan
]1*
Alessia Calafiore
[@alel_domi
]1
1 Geographic Data Science Lab, University of Liverpool, Liverpool, United Kingdom
* Correspondence: C.Berragan@liverpool.ac.uk
Social media presents a rich source of real-time information provided by individual users in emergency situations. However, due to its unstructured nature and high volume, it is challenging to extract key information from these continuous data streams. This paper compares the ability to identify relevant flood related Tweets between a deep neural classification model known as a transformer, and a simple rule-based classification. Results show that the classification model out-performs the rule-based approach, at the time-cost of labelling and training the model.
This repository contains the code for building a RoBERTa-based binary text classification model, trained to identify relevant and irrelevant flood related Tweets. Model training uses a labelled corpus of Tweets extracted during past severe flood events in the United Kingdom, using flood zone bounding boxes.
Inference over a separate testing corpus is compared against a keyword based classification method.
src
├── common
│ ├── get_tweets.py # download tweets to csv through twitter api
│ └── utils.py # various utility functions
│
├── pl_data
│ ├── csv_dataset.py # torch dataset for flood data
│ └── datamodule.py # lightning datamodule
│
├── pl_module
│ └── classifier_model.py # flood classification model
│
├── run.py # train model
└── inf.py # use model checkpoint for inference and compare with keywords
Tweet corpus is not available for model training due to Twitter terms. To train using your own data place a csv into
data/train/labelled.csv
withdata
andlabel
columns.Docker currently uses
demo_data
to demonstrate model training
Install dependencies using Poetry:
poetry install
Train classifier model using the labelled flood Tweets corpus:
poetry run python -m src.run
With docker compose
docker compose up
OR:
Build image from Dockerfile:
docker build . -t cjber/flood_tweets
Run with GPU and mapped volumes:
docker run --rm --gpus all -v ${PWD}/ckpts:/flood/ckpts -v ${PWD}/csv_logs:/flood/csv_logs cjber/flood_tweets