Examining the Effect of Varying Domains on Vector-Space Representations

Final Project for LING 28610 (Autumn 2020)

Members

Nancy Li
Deniz Türkçapar
Zhou Xing

Abstract

This paper aims to explore differences in word embeddings of vector-space models trained on varying domains. The domains analyzed in this study were literature and news. Due to more frequent use of figurative language in literature, as opposed to news, it was hypothesized that the model trained on literature was more likely to exhibit word embeddings that related more heavily to abstract meanings, while the model trained on news was more likely to exhibit word embeddings that related more heavily to concrete meanings. Overall, many of the words shared in both the models had vastly different similar words, suggesting that the meanings constructed by the two models were indeed different. Compared with the news model, the literature model more closely aligned with benchmark datasets meant to demonstrate human behavior.

Visualization

Please go to tensorflow projector
On the left panel, click on Load and upload the metadata and vectordata with the same model name from tsv dir

Models

Analysis

Top Similar Words for Model Pair: here
Concreteness Analysis: here
Similarity Comparison: here

ZhouXing19/WordEmbeddingfromMultiDomain

Examining the Effect of Varying Domains on Vector-Space Representations

Members

Abstract

Visualization

Models

Analysis