prodriguezsosa/EmbeddingRegression

Semantic Projection of Predicted Embeddings

Closed this issue · 5 comments

Hi all,

I've been working with conText since '21 and have become a big fan of the method. I wondered if the authors might be available to comment on my approach to assessing the substantive significance of my model results, given that this exact approach is not included in the APSR conText paper and may pose issues that I have not yet anticipated.

To better understand the substance of a change in my model's predicted embeddings that results from changes in my covariates, I have been using semantic projection onto dimensions defined by relation vectors. I largely follow the approach described in Kozlowski, Taddy, and Evans 2018 and Grand et al. 2018 in constructing these dimensions.

I build a list of antonym pairs, take the difference for each pair, and average these to produce an embedding corresponding to my dimension of interest. I then take the cosine similarity of the model's predicted embeddings and each dimension to place the former relative to the dimension's poles. For instance, using the examples given in the paper, I might ask whether Republicans systematically use the term "immigration" in ways which are closer to the "danger" pole of a danger/safety semantic dimension than Democrats.

I vary one covariate at a time (+/- 2 standard deviations or toggle covariate, if categorical) and plot the change this induces in the cosine similarity of the predicted embedding and each of my semantic dimensions relative to the model's predicted embedding when all covariates are held at their mean/reference value.

Does this sound like a reasonable approach?

I appreciate this is a niche issue to raise on GitHub so please do not feel any pressure to respond unless you have sufficient time and interest to do so. That being said, I would greatly appreciate any and all guidance!

Best regards,
Charlie Carter

PhD Candidate
Department of International Relations
London School of Economics

@CharlieCarter thanks for your interest. Just to understand a bit better, on this first part,

I build a list of antonym pairs, take the difference for each pair, and average these to produce an embedding corresponding to my dimension of interest.

The pairs are pairs of embeddings, yes (e.g. one for danger and one for safety)? and where do those embeddings come from -- GloVe or something similar? As in, you are not estimating these embeddings from the same texts you are using to produce the embeddings for e.g. immigration?

Hi @ArthurSpirling, thank you for your reply.

Yes, that's exactly right. The pairs are pairs of embeddings and these are drawn from the pretrained FastText set. I have not updated or trained these embeddings on the texts in my corpus.

OK, thanks @CharlieCarter. Just putting aside any problems of uncertainty estimation for this set up, I think the primary issue on the descriptive end is whether it makes sense to compare (via cosine or something else) embeddings estimated on different data and in different ways.

That is, you are getting the dimension from 'vanilla' GloVe embeddings from wikipedia/common crawl etc, and then getting the target embedding from ALC/conText fit to a local corpus. I'll let @prodriguezsosa @bstewart chime in here, because they've thought about these space comparisons a bit more than I have.

@prodriguezsosa @ArthurSpirling Thank you both for your helpful comments.

My actual corpus of interest is composed of speeches delivered by state representatives at the United Nations. I am interested in how these representatives use the term "terrorism". The language used in this context is quite distinct from the texts the FastText embeddings are trained on and it definitely is worth fitting my own embeddings here to capture these nuances.

The Kindel working paper is great — thank you @prodriguezsosa — and I will think about how I can apply its findings to my methods.

I greatly appreciate the help!