/ro_sgns

Implementation of Riemannian optimization for skip-gram negative sampling (ACL 2017)

Primary LanguagePythonMIT LicenseMIT

Introduction

This repository implements ACL 2017 Riemannian optimization for skip-gram negative sampling (Fonarev, Hrinchuk et al.).

@inproceedings{fonarev2017ro_sgns,
  title={Riemannian Optimization for Skip-Gram Negative Sampling},
  author={Alexander Fonarev and Oleksii Hrinchuk and Gleb Gusev and Pavel Serdyukov and Ivan Oseledets},
  booktitle={ACL},
  year={2017}
}

Prerequisits

pip install numpy scipy pandas gensim nltk bs4 

Usage

  • Clone ro_sgns repostirtory:
git clone https://github.com/AlexGrinch/ro_sgns.git
cd ro_sgns
  • Download enwik9 dataset and preprocess raw data with Perl script main_.pl.
wget http://mattmahoney.net/dc/enwik9.zip
unzip enwik9.zip
mkdir data
perl main_.pl enwik9 > data/enwik9.txt
jupyter notebook enwik_experiments.ipynb

Algorithm

algorithm geometric
Figure 1. Riemannian optimization for skip-gram negative sampling (RO-SGNS) algorithm. Figure 2. Geometric interpretation of one step of Riemannain optimization procedure: the point is first projected onto the tangent space and then retracted to the manifold.

Results

corr neighbors
Table 1. Spearman’s correlation between predicted similarities and the manually assesed ones. Table 2. Examples of the semantic neighbors (in terms of cosine similarity) for the word usa.