Jessica Grasso
Universität Potsdam
MSc Program in Cognitive Systems
jgrasso@uni-potsdam.de
Clayton Violand
Universität Potsdam
MSc Program in Cognitive Systems
cvioland@uni-potsdam.de
DCIT is a tool written in Python that analyzes the usage of discourse connectives in German Twitter data. Given a list of German discourse markers and one or more files containing German-language tweets, the tool counts possible discourse connectives, performs disambiguation on the ambiguous connectives, and re-counts, printing a summary of the information collected and outputting annotated versions of the tweets.
$ python run.py a.xml b.xml
$ python run.py glob
From within ~/DCIT_Tool, the following is assumed:
1. Dimlex.html is in ../connectives-xml/dimlex.xml.
2. Tweet files are in ../tweets-xml/.
3. POS-tagged files are in ../tweets-pos-tagged/, having the same name as the tweet file w/ extension -tagged.txt.
4. Results are written to file with extension _new.xml and saved to ../results/.