This repository contains scripts and datasets used in the study with the name above.
data/
: contains the SimLex-999 dataset, and cleaned data collected from human participants.embeddings/
: contains word embeddings trained on a subset of the Corpus of Contemporary American English (COCA) and an SgE corpus collated by Lin et al. (2022).scripts/
: contains Python scripts used in data cleaning and analysis.clean/
: scripts used in data cleaning.compare/
: scripts used in data analysis.train/
: scripts used in the training of word embeddings.