Semantic Projection of Predicted Embeddings

Question

Semantic Projection of Predicted Embeddings

Closed this issue 10 months ago · 5 comments

Hi all,

I've been working with conText since '21 and have become a big fan of the method. I wondered if the authors might be available to comment on my approach to assessing the substantive significance of my model results, given that this exact approach is not included in the APSR conText paper and may pose issues that I have not yet anticipated.

To better understand the substance of a change in my model's predicted embeddings that results from changes in my covariates, I have been using semantic projection onto dimensions defined by relation vectors. I largely follow the approach described in Kozlowski, Taddy, and Evans 2018 and Grand et al. 2018 in constructing these dimensions.

I build a list of antonym pairs, take the difference for each pair, and average these to produce an embedding corresponding to my dimension of interest. I then take the cosine similarity of the model's predicted embeddings and each dimension to place the former relative to the dimension's poles. For instance, using the examples given in the paper, I might ask whether Republicans systematically use the term "immigration" in ways which are closer to the "danger" pole of a danger/safety semantic dimension than Democrats.

I vary one covariate at a time (+/- 2 standard deviations or toggle covariate, if categorical) and plot the change this induces in the cosine similarity of the predicted embedding and each of my semantic dimensions relative to the model's predicted embedding when all covariates are held at their mean/reference value.

Does this sound like a reasonable approach?

I appreciate this is a niche issue to raise on GitHub so please do not feel any pressure to respond unless you have sufficient time and interest to do so. That being said, I would greatly appreciate any and all guidance!

Best regards,
Charlie Carter

PhD Candidate
Department of International Relations
London School of Economics

Answer 1 · 2023-11-02T19:37:27.000Z

@CharlieCarter thanks for your interest. Just to understand a bit better, on this first part,

I build a list of antonym pairs, take the difference for each pair, and average these to produce an embedding corresponding to my dimension of interest.

The pairs are pairs of embeddings, yes (e.g. one for danger and one for safety)? and where do those embeddings come from -- GloVe or something similar? As in, you are not estimating these embeddings from the same texts you are using to produce the embeddings for e.g. immigration?

Answer 2 · 2023-11-03T09:56:45.000Z

Hi @ArthurSpirling, thank you for your reply.

Yes, that's exactly right. The pairs are pairs of embeddings and these are drawn from the pretrained FastText set. I have not updated or trained these embeddings on the texts in my corpus.

Answer 3 · 2023-11-03T14:23:37.000Z

OK, thanks @CharlieCarter. Just putting aside any problems of uncertainty estimation for this set up, I think the primary issue on the descriptive end is whether it makes sense to compare (via cosine or something else) embeddings estimated on different data and in different ways.

That is, you are getting the dimension from 'vanilla' GloVe embeddings from wikipedia/common crawl etc, and then getting the target embedding from ALC/conText fit to a local corpus. I'll let @prodriguezsosa @bstewart chime in here, because they've thought about these space comparisons a bit more than I have.

Answer 4 · 2023-11-04T12:34:58.000Z

Glad conText has been helpful @CharlieCarter. A couple of immediate reactions to your approach: 1. Agreed with @ArthurSpirling, given enough data I’d suggest training your own embeddings model (with the full corpus of Democrats + Republicans) and then proceed with your approach. Using pre-trained embeddings is not uninformative (especially if you can make the case that their underlying corpus is roughly a superset of your corpus), but I’d see it as a second best solution and think carefully about additional validation. 2. Take a look at this paper (https://atkindel.github.io//working_papers/Kindel_CosineSimilarity.pdf), it brings up some good points around using averages of cosine similarities to measure concepts. I’d have to think a bit more about how your approach maps onto the approach discussed in this paper —there are some differences in implementation—, but I think the overarching points still apply and merit consideration.3. Just a heads up: avoid comparisons with significant class imbalance and very small samples (see the latest updates and discussion of bias on conText’s GitHub).Sent from my iPhoneOn Nov 3, 2023, at 9:23 AM, Arthur Spirling ***@***.***> wrote: OK, thanks @CharlieCarter. Just putting aside any problems of uncertainty estimation for this set up, I think the primary issue on the descriptive end is whether it makes sense to compare (via cosine or something else) embeddings estimated on different data and in different ways. That is, you are getting the dimension from 'vanilla' GloVe embeddings from wikipedia/common crawl etc, and then getting the target embedding from ALC/conText fit to a local corpus. I'll let @prodriguezsosa @bstewart chime in here, because they've thought about these space comparisons a bit more than I have. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 5 · 2023-11-06T10:28:55.000Z

@prodriguezsosa @ArthurSpirling Thank you both for your helpful comments.

My actual corpus of interest is composed of speeches delivered by state representatives at the United Nations. I am interested in how these representatives use the term "terrorism". The language used in this context is quite distinct from the texts the FastText embeddings are trained on and it definitely is worth fitting my own embeddings here to capture these nuances.

The Kindel working paper is great — thank you @prodriguezsosa — and I will think about how I can apply its findings to my methods.

I greatly appreciate the help!