If you find anything useful in this work, please cite our paper:
@inproceedings{xu-koehn-2021-zero,
title = "Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation",
author = "Xu, Haoran and
Koehn, Philipp",
booktitle = "Proceedings of the Second Workshop on Domain Adaptation for NLP",
month = apr,
year = "2021",
address = "Kyiv, Ukraine",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.adaptnlp-1.21",
pages = "204--213",
abstract = "Linear embedding transformation has been shown to be effective for zero-shot cross-lingual transfer tasks and achieve surprisingly promising results. However, cross-lingual embedding space mapping is usually studied in static word-level embeddings, where a space transformation is derived by aligning representations of translation pairs that are referred from dictionaries. We move further from this line and investigate a contextual embedding alignment approach which is sense-level and dictionary-free. To enhance the quality of the mapping, we also provide a deep view of properties of contextual embeddings, i.e., the anisotropy problem and its solution. Experiments on zero-shot dependency parsing through the concept-shared space built by our embedding transformation substantially outperform state-of-the-art methods using multilingual embeddings.",
}
First install the virtual environmemt including required packages.
conda create --name clce python=3.7
conda activate clce
pip install -r requirements.txt
To reproduce the number in the paper, please find our pre-trained model and mappings in the following table. Note that the pre-trained model and mappings go through the iterative normalization prepreocessing and in a near-isotropic space.
Pre-trained English parser: model.zip
Language | word-level mapping | sense-level mapping |
---|---|---|
es | iter-norm-mean_es-en.th | iter-norm-multi_es-en.th |
pt | iter-norm-mean_pt-en.th | iter-norm-multi_pt-en.th |
ro | iter-norm-mean_ro-en.th | iter-norm-multi_ro-en.th |
pl | iter-norm-mean_pl-en.th | iter-norm-multi_pl-en.th |
fi | iter-norm-mean_fi-en.th | iter-norm-multi_fi-en.th |
el | iter-norm-mean_el-en.th | iter-norm-multi_el-en.th |
The zero-shot depdency parsing task is evaluated on Universal Dependencies treebank 2.6, which is available for free download.
Before using English pre-trained model to parse treebanks in other languages, you have to point out the path for the pre-trained model, pre-trained mappings, and treebanks in evaluate.sh
. After that, you can easily run:
./evaluate.sh lang # e.g., ./evaluate.sh fi
You may need to change the path location for the train and dev dataset in the config file allen_configs/enbert_IN.jsonnet
.
allennlp train allen_configs/enbert_IN.jsonnet -s PATH/TO/STORE/MODEL --include-package src
Please follow the instrcution here.