ST2024

Shared Task on Word Embedding Evaluation for Ancient and Historical Languages

In 2024, SIGTYP is hosting a Shared Task on Word Embedding Evaluation for Ancient and Historical Languages.

Since the rise of word embeddings, their evaluation has been considered a challenging task that sparked considerable debate regarding the optimal approach. The two major strategies that researchers have developed over the years are intrinsic and extrinsic evaluation. The first amounts to solving specially designed problems like semantic proportions or “odd one out”, or comparing word/sentence similarity scores yielded by a model to human judgement. The second one focuses on solving downstream NLP tasks, such as sentiment analysis or question answering, probing word or sentence representations in real-world applications.

In recent years, sets of downstream tasks called benchmarks have become a very popular, if not default, method to evaluate general-purpose word and sentence embeddings. Starting with decaNLP (McCann et al., 2018) and SentEval (Conneau & Kiela, 2018), multitask benchmarks for NLU keep appearing and improving every year. However, even the largest multilingual benchmarks, such as XGLUE, XTREME, XTREME-R or XTREME-UP (Hu et al., 2020; Liang et al., 2020; Ruder et al., 2021, 2023), only include modern languages. When it comes to ancient and historical languages, scholars mostly adapt/translate intrinsic evaluation datasets from modern languages or create their own diagnostic tests. We argue that there is a need for a universal evaluation benchmark for embeddings learned from ancient and historical language data and view this shared task as a proving ground for it.

The shared task involves solving the following problems for 12+ ancient and historical languages that belong to 4 language families and use 6 different scripts.

Subtasks

A. Constrained

POS-tagging
Full morphological annotation
Lemmatisation

B. Unconstrained

POS-tagging
Full morphological annotation
Lemmatisation
Filling the gaps
- Word-level
- Character-level

For subtask A, participants are not allowed to use any additional data; however, they can reduce and balance provided training datasets if they see fit. For subtask B, participants are allowed to use any additional data in any language, including pre-trained embeddings and LLMs.

Data

For tasks 1-3, we use Universal Dependencies v. 2.12 data (Zeman et al., 2023) in 11 ancient and historical languages, complemented by 5 Old Hungarian codices from the MGTSZ website (HAS Research Institute for Linguistics, 2018) that are annotated to the same standard as the corpora available through UD. UD corpora containing less than 1K examples or having only a test set were not taken into account.

For task 4, we add historical Irish data from CELT (Ó Corráin et al., 1997), Corpas Stairiúil na Gaeilge (Acadamh Ríoga na hÉireann, 2017), and digital editions of the St. Gall glosses (Bauer et al., 2017) and the Würzburg glosses (Doyle, 2018) as a case study of how performance may vary on different historical stages of the same language. Each Irish text taken from CELT is labelled "Old", "Middle" or "Early Modern" in accordance with the language labels provided in CELT metadata. We only include a CELT text in the dataset if it has a single Irish language label, and if it matches the dates (also provided in the metadata) of this period of the Irish language history. Some Irish texts may contain bits in Latin.

A detailed list of text sources for each language in the dataset is provided here.

We set the upper temporal boundary to 1700 CE and do not include texts created later than this date in our dataset. The choice of this date is driven by the fact that most of the historical language data used in word embedding research dates back to the 18th century CE or later, and we would like to focus on the more challenging and yet uncovered data.

The following table gives a brief overview of the data. We use ISO 639-3 codes wherever they exist and special 4-letter codes based on them otherwise. The “Script” column refers to the scripts used in the dataset, not to the script(s) a particular language used (e.g. our Sanskrit corpus is transliterated from Devanagari into the Latin script). The “Dating” column describes the period when texts in the dataset were created, not when a particular language existed. The dates are cited according to the electronic editions/corpora these texts come from. Finally, we provide the size of each subset in sentences (S) and tokens (T).

NB! Our training, validation and test splits are different from those in the UD repositories. The train : validation : test proportion is 0.8 : 0.1 : 0.1 split by the number of sentences.

Language	Code	Family	Branch	Script	Dating	Train-T	Valid-T	Test-T	Train-S	Valid-S	Test-S
Ancient Greek	grc	Indo-European	Hellenic	Greek	VIII c. BCE – 110 CE	334,043	41,905	41,046	24,800	3,100	3,101
Ancient Hebrew	hbo	Afro-Asiatic	Semitic	Hebrew	X c. CE	40,244	4,862	4,801	1,263	158	158
Classical Chinese	lzh	Sino-Tibetan	Sinitic	Hanzi	47 – 220 CE	346,778	43,067	43,323	68,991	8,624	8,624
Coptic	cop	Afro-Asiatic	Egyptian	Coptic	I – II c. CE	57,493	7,282	7,558	1,730	216	217
Gothic	got	Indo-European	Germanic	Latin	V – VIII c. CE	44,044	5,724	5,568	4,320	540	541
Medieval Icelandic	isl	Indo-European	Germanic	Latin	1150 – 1680 CE	473,478	59,002	58,242	21,820	2,728	2,728
Classical and Late Latin	lat	Indo-European	Romance	Latin	I c. BCE – IV c. CE	188,149	23,279	23,344	16,769	2,096	2,097
Medieval Latin	latm	Indo-European	Romance	Latin	774 – early XIV c. CE	599,255	75,079	74,351	30,176	3,772	3,773
Old Church Slavonic	chu	Indo-European	Slavonic	Cyrillic	X – XI c. CE	159,368	19,779	19,696	18,102	2,263	2,263
Old East Slavic	orv	Indo-European	Slavonic	Cyrillic	1025 – 1700 CE	250,833	31,078	32,318	24,788	3,098	3,099
Old French	fro	Indo-European	Romance	Latin	1180 CE	38,460	4,764	4,870	3,113	389	390
Vedic Sanskrit	san	Indo-European	Indo-Aryan	Latin (transcr.)	1500 – 600 BCE	21,786	2,729	2,602	3,197	400	400
Old Hungarian	ohu	Finno-Ugric	Ugric	Latin	1440 – 1521 CE	129,454	16,138	16,116	21,346	2,668	2,669
Old Irish	sga	Indo-European	Celtic	Latin	600 – 900 CE	88,774	11,093	11,048	8,748	1,093	1,094
Middle Irish	mga	Indo-European	Celtic	Latin	900 – 1200 CE	251,684	31,748	31,292	14,308	1,789	1,789
Early Modern Irish	ghc	Indo-European	Celtic	Latin	1200 – 1700 CE	673,449	115,163	79,600	24,440	3,055	3,056

The data for tasks 1-3 is the same and can be found in the morphology folder in conllu format. Tasks 4a and 4b have their own data folders: fill_mask_word and fill_mask_char, where each file is a two-column table in tsv format (quotechar="^"). Each file is named with a language code from the table above and a train/valid/test prefix.

UPD. For your convenience, we added extra information for tasks 4a and 4b: the indices of masked tokens and masked tokens themselves for each sentence. You can find this information in json files in fill_mask_word and fill_mask_char folders.

Please note that Old Hungarian texts come from diplomatic editions, i.e. they haven't been normalised and present some specific orthographic notation. We left this as is with the exception of punctuation: wherever a token in Old Hungarian data had a PUNCT POS-tag, we set its form to be equal to lemma, thus getting rid of ·Γ, :~, |Γ etc. complex punctuation marks.

Data format for tasks 1-3

NB! Old Hungarian files don't have any metadata (#source, #text etc.), only annotated sentences.

# source = Jerome's Vulgate, Revelation 9
# text = et cruciatus eorum ut cruciatus scorpii cum percutit hominem
# sent_id = 33745
1	et	et	CCONJ	C-	_	4	cc	_	ref=REV_9.5
2	cruciatus	cruciatus	NOUN	Nb	Case=Nom|Gender=Masc|Number=Sing	4	nsubj:outer	_	ref=REV_9.5
3	eorum	is	PRON	Pp	Case=Gen|Gender=Masc|Number=Plur|Person=3|PronType=Prs	2	det	_	ref=REV_9.5
4	ut	ut	ADV	Dq	PronType=Rel	0	root	_	ref=REV_9.5
5	cruciatus	cruciatus	NOUN	Nb	Case=Nom|Gender=Masc|Number=Sing	4	nsubj	_	ref=REV_9.5
6	scorpii	scorpios	NOUN	Nb	Case=Gen|Gender=Masc|Number=Sing	5	nmod	_	ref=REV_9.5
7	cum	cum	SCONJ	G-	_	8	mark	_	ref=REV_9.5
8	percutit	percutio	VERB	V-	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act	5	acl	_	ref=REV_9.5
9	hominem	homo	NOUN	Nb	Case=Acc|Gender=Masc|Number=Sing	8	obj	_	ref=REV_9.5

Data format for task 4a

For this task, 10% of tokens in each sentence were randomly replaced with a [MASK] token. Please keep in mind that some sentences have more than one masked token, and some (short) sentences have none.

NB! Here and below we provide examples from different languages in one table, but in the dataset each language has its own data file(s).

TSV

masked	src
Cé [MASK] secht [MASK] im gin sóee suilgind, co bráth, mó cech delmaimm, issed ma do-ruirminn.	Cé betis secht tengtha im gin sóee suilgind, co bráth, mó cech delmaimm, issed ma do-ruirminn.
ѿ негоже рожает [MASK] с҃нъ преже всѣх вѣкъ	ѿ негоже рожает сѧ с҃нъ преже всѣх вѣкъ
豈人[MASK]之子孫則必不善哉	豈人主之子孫則必不善哉

JSON

[
    {
        "src": "Sith do denamh doib iar sin.",
        "masked": "Sith do denamh doib iar sin [MASK]",
        "masked_tokens": [
            {
                "mask_idx": 6,
                "masked_token": "."
            }
        ]
    },
   {
        "src": "\"Créd tucc ort cen teacht dá hiarraidh a cCarn Chaireadha theas mar ar gealladh dhuit í,\" ol Oisín.",
        "masked": "\"Créd tucc ort cen teacht dá hiarraidh a cCarn [MASK] [MASK] mar ar gealladh dhuit í,\" ol Oisín.",
        "masked_tokens": [
            {
                "mask_idx": 10,
                "masked_token": "Chaireadha"
            },
            {
                "mask_idx": 11,
                "masked_token": "theas"
            }
        ]
    }
]

Data format for task 4b

For languages with alphabetical writing systems, sentences were split into individual characters. For Classical Chinese, each Hanzi character was decomposed into individual strokes with the help of hanzipy package (we used the deepest decomposition level available, "graphical"). Then, 5% of characters in each sentence were randomly replaced with a [_] token. Please keep in mind that some sentences have more than one masked character, and some (short) sentences have none.

TSV

masked	src
Cé betis se[_]ht te[_]gtha im gin s[_]ee suilgind, co bráth, mó cech[_]delmaimm, isse[_] ma do-ruirminn.	Cé betis secht tengtha im gin sóee suilgind, co bráth, mó cech delmaimm, issed ma do-ruirminn.
ѿ негоже рожает с[_] с҃нъ п[_]еже всѣх вѣкъ	ѿ негоже рожает сѧ с҃нъ преже всѣх вѣкъ
丨凵一口丷一人一一[_]一[_]一一丨一丶戈㇝㇇亅一㇇亅一㇒㇛丶亅八[_]二八亅丨𠁼㇃丿一丿丨丶一二丨丷丷一口一丨一戈口	豈人主之子孫則必不善哉

JSON

[
    {
        "src": "Sith do denamh doib iar sin.",
        "masked": "Sith do denamh doib iar [_]in.",
        "masked_tokens": [
            {
                "mask_idx": 24,
                "masked_token": "s"
            }
        ]
    }
]

Evaluation Procedure

The shared task is hosted on CodaLab. Following the authors of GLUE and SuperGLUE (Wang et al., 2019, 2020), we weigh each task equally and provide a macro-average of per-task scores as an overall score. For tasks with multiple metrics (e.g., F1 and accuracy), we use an unweighted average of the metrics as the score for the task when computing the overall macro-average.

Task	Metrics
POS-tagging	Accuracy @1, F1
Detailed morphological annotation	Macro-average of Accuracy @1 per tag
Lemmatisation	Accuracy @1, Accuracy @3
Filling the gaps (word-level)	Accuracy @1, Accuracy @3
Filling the gaps (character-level)	Accuracy @1, Accuracy @3

Submission Format

Unconstrained Subtask & Phase 1 of the Constrained Subtask

You should submit a .zip file with the following folder structure. Please make sure that your folder & file names are exactly as displayed below, ans that the subfolders are in the root directory!

📂 fill_mask_char
    ├── chu.json
    ├── cop.json
    ├── fro.json
    └── ...
📂 fill_mask_word
    ├── chu.json
    ├── cop.json
    ├── fro.json
    └── ...
📂 pos_tagging
    ├── chu.json
    ├── cop.json
    ├── fro.json
    └── ...
📂 morph_features
    ├── chu.json
    ├── cop.json
    ├── fro.json
    └── ...
📂 lemmatisation
    ├── chu.json
    ├── cop.json
    ├── fro.json
    └── ...

Phase 2 of the Constrained Subtask

Please upload your embeddings to this folder. Please upload a zip archive with the same name as your username on CodaLab.

POS-tagging

You should submit a .json file with a list of sentences. Each sentence is a list of (token, POS-tag) tuples.

[
    [
        ["quem", "PRON"],
        ["me", "PRON"],
        ["arbitramini", "VERB"],
        ["non", "ADV"],
        ["sum", "AUX"],
        ["ego", "PRON"]
    ],
    [
        ["hospitalitatem", "NOUN"],
        ["sectantes", "VERB"]
    ],
    [
        ["secundum", "ADP"],
        ["hominem", "NOUN"],
        ["dico", "VERB"]
    ]
]

Detailed morphological annotation

Your submission is a .json file again, but with a bit more complicated structure. It is a list of sentences, and each sentence is a list of tokens, while each token is a dictionary that contains the form, its POS-tag and all morphological features. The inventory of morphological feautures, as well as their names, are specific to each language (please refer to the training data). UPOS and Token keys are universal, i.e. valid for every language. NB! You have to submit UPOS in this task, but you don't have to submit lemma.

[
    [
        {
            "Case": "Acc",
            "Gender": "Fem",
            "Number": "Sing",
            "UPOS": "NOUN",
            "Token": "hospitalitatem"
        },
        {
            "Case": "Nom",
            "Gender": "Masc",
            "Number": "Plur",
            "Tense": "Pres",
            "VerbForm": "Part",
            "Voice": "Act",
            "UPOS": "VERB",
            "Token": "sectantes"
        }
    ],
    [
        {
            "UPOS": "ADP",
            "Token": "secundum"
        },
        {
            "Case": "Acc",
            "Gender": "Masc",
            "Number": "Sing",
            "UPOS": "NOUN",
            "Token": "hominem"
        },
        {
            "Mood": "Ind",
            "Number": "Sing",
            "Person": "1",
            "Tense": "Pres",
            "VerbForm": "Fin",
            "Voice": "Act",
            "UPOS": "VERB",
            "Token": "dico"
        }
    ]
]

Lemmatisation

Similarly to POS-tagging, you should submit a .json file with a list of sentences, but in this case each sentence is a list of (token, (lemma1, lemma2, lemma3)) tuples. The second element of this tuple is another tuple/list of your top-3 lemma predictions. Remember that the order of your predictions is important! If you have less than 3 predictions, you can submit empty strings.

[
    [
        ["quem", ["quis", "ques", "que"]],
        ["me", ["ego", "me", "messe"]],
        ["arbitramini", ["arbitror", "arbitrar", "arbitramini"]],
        ["esse", ["sum", "esse", "ego"]],
        ["non", ["non", "no", ""]],
        ["sum", ["sum", "esse", "sunt"]],
        ["ego", ["ego", "me", "esse"]]
    ],
    [
        ["hospitalitatem", ["hospitalitas", "hospitalis", "hospitalitus"]],
        ["sectantes", ["secto", "sectum", "sectant"]]
    ],
    [
        ["secundum", ["secundum", "secundus", "secund"]],
        ["hominem", ["homo", "homus", "hominem"]],
        ["dico", ["dico", "dixi", ""]]
    ]
]

Fill-mask tasks

Submissions should be json files with a list of sentences, where each sentence is a dictionary. The key "masked" is a masked sentence provided, the key "text" is a restored sentence, and the key "masked_tokens" is a list of your predictions for masked tokens. We advise to provide your top-3 predictions for each masked token, but if you have less, you can submit empty strings. NB! The order of masked tokens and the order within your top-3 predictions is important! In the example below, ["ni", "na", ""] is a list of top-3 predictions for the first masked token, where "ni" is the one with the highest probability; ["⁊", "&", "ocus"] is a list of top-3 predictions for the second masked token etc.

Word-level

[
  {
    "text": "‘Sech ni ricfe iluc, ⁊ ni toruis húc’.",
    "masked": "‘Sech [MASK] ricfe iluc, [MASK] ni toruis húc’.",
    "masked_tokens": [
                      ["ni", "na", ""],
                      ["⁊", "&", "ocus"]
                     ]
  },
  {
    "text": ".i. ni fiad chách",
    "masked": "[MASK] ni fiad chách",
    "masked_tokens": [
                      [".i.", "i.e.", "éd"]
                     ]
  }
]

Character-level

[
  {
    "text": "Ó domun co brait ar Zedechias mac Iosias."
    "masked": "Ó do[_]un co brait ar Zedechias mac[_]Iosias.",
    "masked_tokens": [
                      ["m", "n", ""],
                      [" ", "-", "c"]
                     ]
  }
]

Paper submission

Participants will be invited to describe their system in a paper for the SIGTYP workshop proceedings. The task organisers will write an overview paper that describes the task and summarises the different approaches taken, and analyses their results.

Paper submission instructions will be the same as for the workshop. Each team participating in the shared task is expected to submit a paper of 4 to 8 pages, plus unlimited additional pages for references, formatted according to the workshop guidelines: https://github.com/acl-org/acl-style-files. Both long and short paper submissions must follow the two-column format of ACL proceedings. All submissions must be in PDF format.

The paper should describe the system and the resources used along with the libraries used to develop the system. The methodology/strategy should be documented in such a way that the readers and other researchers are able to replicate the work from the system description in the paper.

Important Dates

05 Nov 2023: Release of training and validation data
02 Jan 2024: Release of test data
12 15 Jan 2024: Submission of results
13 15 Jan 2024: Notification of results
20 22 Jan 2024: Submission of shared task papers
27 29 Jan 2024: Notification of acceptance to authors
03 05 Feb 2024: Camera-ready
~~15 Mar~~ 18 Feb 2024: Video recordings due on Underline
22 Mar 2024: SIGTYP workshop

Important Links

Constrained subtask on CodaLab
Unconstrained subtask on CodaLab
Registration
SIGTYP 2024 style guide
Please cite these if you are using the dataset (APA-style citations can be found in the list of references below).
Underline Form
Need help in uploading the presentation on Underline, see the this.

Task Organizers

Oksana Dereza, Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway
Priya Rani, SFI Centre for Research and Training in AI, Data Science Institute, University of Galway
Atul Kr. Ojha, Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway
Adrian Doyle, Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway
Pádraic Moran, School of Languages, Literatures and Cultures, Moore Institute, University of Galway
John P. McCrae, Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway

Acknowledgements

This shared task is in part supported by the Irish Research Council under grant number IRCLA/2017/129 (CARDAMOM-Comparative Deep Models of Language for Minority and Historical Languages) and co-funded by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (Insight) and SFI/12/RC/2289_P2 (Insight_2). We would also like to thank Universal Dependencies, University College Cork, Royal Irish Academy and HAS Research Institute for Linguistics for providing the source data.

References

Acadamh Ríoga na hÉireann. (2017). Corpas Stairiúil na Gaeilge 1600-1926. Acadamh Ríoga na hÉireann. http://corpas.ria.ie/index.php?fsg_function=1
Bauer, B., Hofman, R., & Moran, P. (2017). St. Gall Priscian Glosses, version 2.0. http://www.stgallpriscian.ie/
Conneau, A., & Kiela, D. (2018). SentEval: An Evaluation Toolkit for Universal Sentence Representations. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). https://aclanthology.org/L18-1269/
Doyle, A. (2018). Würzburg Irish Glosses. https://wuerzburg.ie/
Ge, H., Sun, C., Xiong, D., & Liu, Q. (2021). Chinese WPLC: A Chinese Dataset for Evaluating Pretrained Language Models on Word Prediction Given Long-Range Context. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 3770–3778. https://doi.org/10.18653/v1/2021.emnlp-main.306
HAS Research Institute for Linguistics. (2018). Old Hungarian Codices. Hungarian Generative Diachronic Syntax. http://oldhungariancorpus.nytud.hu/en-codices.html
Hu, J., Ruder, S., Siddhant, A., Neubig, G., Firat, O., & Johnson, M. (2020). XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization. Proceedings of the 37th International Conference on Machine Learning (ICML 2020), 4411–4421.
Kondratyuk, D., & Straka, M. (2019). 75 Languages, 1 Model: Parsing Universal Dependencies Universally. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2779–2795. https://doi.org/10.18653/v1/D19-1279
Liang, Y., Duan, N., Gong, Y., Wu, N., Guo, F., Qi, W., Gong, M., Shou, L., Jiang, D., Cao, G., Fan, X., Zhang, R., Agrawal, R., Cui, E., Wei, S., Bharti, T., Qiao, Y., Chen, J.-H., Wu, W., … Zhou, M. (2020). XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6008–6018. https://doi.org/10.18653/v1/2020.emnlp-main.484
McCann, B., Keskar, N. S., Xiong, C., & Socher, R. (2018). The Natural Language Decathlon: Multitask Learning as Question Answering (arXiv:1806.08730). arXiv. http://arxiv.org/abs/1806.08730
Ó Corráin, D., Morgan, H., Färber, B., Hazard, B., Purcell, E., Ó Dónaill, C., Lavelle, H., Nyhan, J., & McCarthy, E. (1997). CELT: Corpus of Electronic Texts. University College Cork. http://www.ucc.ie/celt
Ruder, S., Clark, J. H., Gutkin, A., Kale, M., Ma, M., Nicosia, M., Rijhwani, S., Riley, P., Sarr, J.-M. A., Wang, X., Wieting, J., Gupta, N., Katanova, A., Kirov, C., Dickinson, D. L., Roark, B., Samanta, B., Tao, C., Adelani, D. I., … Talukdar, P. (2023). XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages. arXiv; arXiv:2305.11938.
Ruder, S., Constant, N., Botha, J., Siddhant, A., Firat, O., Fu, J., Liu, P., Hu, J., Garrette, D., Neubig, G., & Johnson, M. (2021). XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 10215–10245. https://doi.org/10.18653/v1/2021.emnlp-main.802
Simon, E. (2014). Corpus building from Old Hungarian codices. In: Katalin É. Kiss (ed.): The Evolution of Functional Left Peripheries in Hungarian Syntax. Oxford: Oxford University Press.
Wang, A., Pruksachatkun, Y., Nangia, N., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2020, February 12). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. Proceedings of the 33d Conference on Neural Information Processing Systems (NeurIPS 2019).
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2019, February 22). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 7th International Conference on Learning Representations (ICLR 2019).
Zeman, D., Nivre, J., Abrams, M., Ackermann, E., Aepli, N., Aghaei, H., Agić, Ž., Ahmadi, A., Ahrenberg, L., Ajede, C. K., Akkurt, S. F., Aleksandravičiūtė, G., Alfina, I., Algom, A., Alnajjar, K., Alzetta, C., Andersen, E., Antonsen, L., Aoyama, T., … Ziane, R. (2023). Universal Dependencies 2.12 [dataset]. http://hdl.handle.net/11234/1-5150

ancatmara/ST2024

ST2024

Shared Task on Word Embedding Evaluation for Ancient and Historical Languages

Subtasks

Data

Data format for tasks 1-3

Data format for task 4a

TSV

JSON

Data format for task 4b

TSV

JSON

Evaluation Procedure

Submission Format

Unconstrained Subtask & Phase 1 of the Constrained Subtask

Phase 2 of the Constrained Subtask

POS-tagging

Detailed morphological annotation

Lemmatisation

Fill-mask tasks

Word-level

Character-level

Paper submission

Important Dates

Important Links

Task Organizers

Acknowledgements

References