To run the experiments we need to create a folder called data
and copy the data folders containing the datasets for the selected languages.
You can obtain the data from the Shared-task page here.
So far we're using:
- Chinese (ZH)
- German (DE)
- Hindi (HI)
- Irish (GA)
- Portuguese (PT)
To run the experiments and obtain the evaluation results at the end, you should copy the evaluation scripts from the oficial Shared-task repository hereand:
- copy the files into a folder called
eval_scripts
- at the top of the
evaluate.py
replaceimport tsvlib
withfrom . import tsvlib
.
Otherwise, you should run with evaluation disabled
-
Python 3.6 +
-
numpy
-
scikit-learn
-
pytorch
-
huggingface transformers
-
torchtext
-
skorch
-
Suggestion: install libraries in a
virtual environment
.
- Masked Language Model
- NMT approach