/DeepMistake

The solutuon of the DeepMistake team to the RuShiftEval competition

Primary LanguagePython

Lexical Semantic Change Detection (LSCD) for the Russian language by the DeepMistake team.

Lexical semantic change detection

This repository contains code to reproduce the best results from the paper:

Arefyev Nikolay, Maksim Fedoseev, Vitaly Protasov, Daniil Homskiy, Adis Davletov, Alexander Panchenko. "DeepMistake: Which Senses are Hard to Distinguish for a Word­in­Context Model" in Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2021”.

DeepMistake was 2nd best system in the RuShiftEval-2021 competition.

After the competition we improved the system and outperformed the winner of the competition (see the table below).

Citation

If you use any part of the system, please, cite our paper above.

Reproduction of the best results

Installation

Clone repositories:

git clone https://github.com/Daniil153/DeepMistake
cd DeepMistake
git clone https://github.com/davletov-aa/mcl-wic

Install requirements

pip install -r mcl-wic/requirements.txt

The solution for the RuShiftEval-2021 shared task on LSCD.

Download data. You can download data from the command line also:

bash download_files.sh

Download models:

bash download_models.sh first_concat mean_dist_l1ndotn_MSE mean_dist_l1ndotn_CE

To reproduce the best result in evaluation you need use:

bash eval_best_eval_model.sh

To reproduce the best result in post evaluation you need use:

bash eval_best_post-eval_model.sh

To reproduce second the best result in post evaluation you need use:

bash eval_2best_post-eval_model.sh

Results

Results of the LSCD task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.

Model RuShiftEval avg RuShiftEval1 RuShiftEval2 RuShiftEval3 Script
first+concat on MCLen-accCE → RSSdev2-sentSpearMSE, LinReg(https://zenodo.org/record/4981585/files/first_concat.zip) 0.795 0.812 0.78 0.795 eval_best_eval_model.sh
mean+dist_l1ndotn-hs0 on MCLnen-accCE → RSSdev2-sentSpearMSE, Mean (https://zenodo.org/record/4992633/files/mean_dist_l1ndotn_MSE.zip) 0.833 0.839 0.834 0.826 eval_2best_post-eval_model.sh
mean+dist_l1ndotn-hs0 on MCLnen-accCE → RSSdev2-sentSpearCE, Mean (https://zenodo.org/record/4992613/files/mean_dist_l1ndotn_CE.zip) 0.85 0.863 0.854 0.834 eval_best_post-eval_model.sh

Solution for SemEval 2020 Task1

In the process

Train models

Also you can train the best three models with

train_best_eval_model.sh
train_best2_post-eval_model.sh
train_best_post-eval_model.sh