This code can be used to reproduce the results of our stance classification system for English tweets (based on Wojatzki & Zesch 2016).
This work has been published in:
Robin Schaefer and Manfred Stede. Improving Implicit Stance Classification in Tweets Using Word and Sentence Embeddings. In Proceedings of the 42nd German Conference on Artificial Intelligence (KI 2019). Kassel, Germany, 2019.
This code was written for Python 3.
Set up a virtualenv and run:
pip3 install -r requirements.txt
This code makes use of several pretrained embeddings. Download and store them in the respective directory:
fastText vectors (in /tsc/embeddings): https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
GloVe vectors (200d) (in /tsc/embedings): https://nlp.stanford.edu/data/glove.twitter.27B.zip
GloVe vectors (300d) (in /tsc/embeddings): https://nlp.stanford.edu/data/glove.42B.300d.zip
The tweet data can be downloaded from here: https://github.com/muchafel/AtheismStanceCorpus
We already reshaped, tokenized and stored it in /tsc/data.
For training, testing and evaluating a model, run:
python -m tsc
You can modify the procedure by setting options for e.g. the preprocessing or the feature type.
For turning the data to lowercase, run:
python -m tsc -p l
For training the model with fastText vectors, run:
python -m tsc -f fasttext
For a full description of options and parameters, run:
python -m tsc --help
- No installation via setup.py
- Universal Sentence Encoder model needs to be fixed
MIT License