This repository provides the code to the experimentes published in the following paper:
Tuora, R., Przepiórkowski, A., & Leczkowski, A. (2021, November). Comparing learnability of two dependency schemes:‘semantic’(UD) and ‘syntactic’(SUD). In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 2987-2996).
To run the experiment first install the dependencies:
python3 -m pip install -r requirements.txt
python3 -m pip install --no-deps --index-url https://pypi.clarin-pl.eu/simple combo==1.0.1
You can choose a dry run, if you want to make sure, that no errors will occur outside of the long phases of the experiment (training). To do so, set DRY_RUN
in constants.py to True.
Then run this to start the experiment:
python3 main.py
The process consists of the following steps:
- Downloads UD and SUD treebanks
- Selects treebanks which match the quality and size criteria
- Preproccesses treebanks
- Downloads embeddings for the selected treebanks (languages)
- Downloads parsers
- Calculates treebank statistics
- Convertes
.conllu
files to.conll09
files (for Mate parser) - Trains and evaluates Mate parser
- Trains and evaluates UDPipe parser
- Trains and evaluates COMBO parser
- Trains and evaluates UUParser
If you wish to skip one or more of those steps, simply comment out a respective line in the main.py
file.
The proccess produces .csv
files with results:
tb_stats.csv
results_mate_final_sorted.csv
results_udpipe_final_sorted.csv
results_combo_final_sorted.csv
results_mate_final_sorted.csv
results_uuparser_transition_final_sorted.csv
results_uuparser--graph-base_final_sorted.csv