RuSemShift is a large-scale human-annotated dataset of diachronic semantic shifts in Russian. Lexical semantic change is a process when a word changes its meaning over time. This dataset provides information about relative degree of historical semantic change for dozens of Russian words. It is based on sentence contexts extracted from the Russian National Corpus.
This is the first semantic change dataset for Russian created in a large-scale crowd-sourcing annotation effort. The annotation followed the DURel framework, which makes RuSemShift fully compatible with semantic change datasets developed for other languages. It can be used to evaluate automatic systems for semantic change detection. Please check our paper for more details:
Julia Rodina and Andrey Kutuzov. RuSemShift: a dataset of historical lexical semantic changes in Russian Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020). Association for Computational Linguistics, Barcelona, Spain (2020).
RuSemShift consists of two subsets each covering a specific pair of time periods:
- RuSemShift_1: pre-Soviet and Soviet times (1682-1916 -> 1918-1990)
- RuSemShift_2: Soviet and post-Soviet times (1918-1990 -> 1991-2016)
See the description of each subset in their respective directories.
RuSemShift by Julia Rodina and Andrey Kutuzov is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The creation of RuSemShift was supported by the Russian Science Foundation grant 20-18-00206.