NLP Word embeddings project

Papers

Bengio et al., 2003, A Neural Probabilistic Language Model. Overview of early work. Uses feedforward net. link
Morin et al., Hierarchical Probabilistic Neural Network Language Model. Introduces hierachical softmax. link
Mikolov et al., 2013. Efficient Estimation of Word Representations in Vector Space. Overview of recurrent models and introducing two log-linear models of word embeddings: continuous bag of words (average embedding of surrounding words used to predict words) and continuous skip gram (word embedding to predict word embeddings of surrounding words). link
Mikilov et al. Distributed Representations of Words and Phrases and their Compositionality. Describes hierarchical softmax. Introduces Negative sampling (sample denominator in softmax) and subsampling of frequent words (train more often with rare examples). Learning phrase embeddings (e.g. New York Times). Some words on additional structure of vectors. link
Socher et al., 2013, Parsing with Compositional Vector Grammars. Combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. link

CVGs combine the advantages of standard probabilistic context free grammars (PCFG) with those of re- cursive neural networks (RNNs). The former can capture the discrete categorization of phrases into NP or PP while the latter can capture fine-grained syntactic and compositional-semantic information on phrases and words. This information can help in cases where syntactic ambiguity can only be re- solved with semantic information, such as in the PP attachment of the two sentences: They ate udon with forks. vs. They ate udon with chicken.

Milokov, Thesis, ch 3. Doesn't seem to ad lot. link
Milokov, Linguistic Regularities in Continuous Space Word Representations. Introduction of syntactical and semantical interpretation of vector operators. link

pimdh/nlp-project

NLP Word embeddings project

Papers