/bertwsi

Word Sense Induction with BERT MLM

Primary LanguagePythonApache License 2.0Apache-2.0

Towards better substitution-based word sense induction - Word Sense Induction with BERT

A follow up to https://github.com/asafamr/SymPatternWSI , adapted to BERT.

Paper: Towards better substitution-based word sense induction - https://arxiv.org/abs/1905.12598

prerequisites:

Python 3.7
Install requirements.txt with pip -r
This will install python pacakges including pytorch and huggingface's BERT port.
(for CUDA support first install pytorch accroding to their instructions).

Run download_resources.sh to download datasets.

WSI:

Run wsi_bert.py for sense induction on both SemEval 2010 and 2013 WSI task datasets.
Logs should be printed to "debug" dir.

results - (SOTA when published):

SemEval 2013 WSI mean(STD) over 10 runs:
FNMI:21.4(0.5) FBC:64.0(0.5) Geom. mean:37.0(0.5)
(previous SOTA 11.3,57.5,25.4)


SemEval 2010 WSI mean(STD) over 10 runs:
F-S:71.3(0.1) V-M:40.4(1.8) Geom. mean:53.6(1.2)
(previous SOTA 61.7,9.8,24.59)