UD vs SUD

This repository provides the code to the experimentes published in the following paper:

Tuora, R., Przepiórkowski, A., & Leczkowski, A. (2021, November). Comparing learnability of two dependency schemes:‘semantic’(UD) and ‘syntactic’(SUD). In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 2987-2996).

To run the experiment first install the dependencies:

python3 -m pip install -r requirements.txt
python3 -m pip install --no-deps --index-url https://pypi.clarin-pl.eu/simple combo==1.0.1

You can choose a dry run, if you want to make sure, that no errors will occur outside of the long phases of the experiment (training). To do so, set DRY_RUN in constants.py to True.

Then run this to start the experiment: python3 main.py

The process consists of the following steps:

Downloads UD and SUD treebanks
Selects treebanks which match the quality and size criteria
Preproccesses treebanks
Downloads embeddings for the selected treebanks (languages)
Downloads parsers
Calculates treebank statistics
Convertes .conllu files to .conll09 files (for Mate parser)
Trains and evaluates Mate parser
Trains and evaluates UDPipe parser
Trains and evaluates COMBO parser
Trains and evaluates UUParser

If you wish to skip one or more of those steps, simply comment out a respective line in the main.py file.

The proccess produces .csv files with results:

tb_stats.csv
results_mate_final_sorted.csv
results_udpipe_final_sorted.csv
results_combo_final_sorted.csv
results_mate_final_sorted.csv
results_uuparser_transition_final_sorted.csv
results_uuparser--graph-base_final_sorted.csv

ryszardtuora/ud_vs_sud

UD vs SUD