/WordEmbeddingfromMultiDomain

Final Project for LING 28610 (Autumn 2020)

Primary LanguagePythonMIT LicenseMIT

Examining the Effect of Varying Domains on Vector-Space Representations

Final Project for LING 28610 (Autumn 2020)

Members

  • Nancy Li
  • Deniz Türkçapar
  • Zhou Xing

Abstract

This paper aims to explore differences in word embeddings of vector-space models trained on varying domains. The domains analyzed in this study were literature and news. Due to more frequent use of figurative language in literature, as opposed to news, it was hypothesized that the model trained on literature was more likely to exhibit word embeddings that related more heavily to abstract meanings, while the model trained on news was more likely to exhibit word embeddings that related more heavily to concrete meanings. Overall, many of the words shared in both the models had vastly different similar words, suggesting that the meanings constructed by the two models were indeed different. Compared with the news model, the literature model more closely aligned with benchmark datasets meant to demonstrate human behavior.

Visualization

  1. Please go to tensorflow projector
  2. On the left panel, click on Load and upload the metadata and vectordata with the same model name from tsv dir

Models

Analysis

  • Top Similar Words for Model Pair: here
  • Concreteness Analysis: here
  • Similarity Comparison: here