NLP Paper Reading Notes

1. A Simple But Hard-to-Beat Baseline for Sentence Embedding

Address: https://openreview.net/forum?id=SyK00v5xx

This paper proposed a simple sentence embedding method —— Smooth Inverse Frequency (SIF). Given word embeddings, First, sum all word embedding in a sentence with weight as follow:

where, a is a connstant.(0.001 or 0.0001 could lead to best results)

Then, form a matrix which columns consist of sentence embedding from above operation (eg.[v1.T, v2.T,...,vn.T]) and let u be its first singular vector.

Finally: