kanishkamisra/minicons

A question about extracting word representations

Closed this issue · 3 comments

Hi! I have a question about extracting word representations.
If the sentence has two target words, for example, "There are two books. One is mine, the other one is yours.", when using model.extract_representation(['There are two books. The red one is mine, and the other one is yours.', 'one'], layer = 12), which representation in extracted? Is the representation of the first "one"?
Many thanks!

It will default to the first one indeed. But you can circumvent this by providing the character-span of the word you want! for instance, if you want the most recent occurrence of 'one', you could do this by:

import re

sentence = 'There are two books. The red one is mine, and the other one is yours.'

span = list(re.finditer(r'one', sentence))[-1].span()

print(span)

#> (56, 59)

# extract reps:
model.extract_representation(['There are two books. The red one is mine, and the other one is yours.', span], layer = 12)

does this help/make sense?

Thank you so much! It helps me a lot.
Here is another question. I am doing a psychological study on ambiguous words, and I am wondering which layer of BERT provides lexical representations, excluding higher-level representations such as grammar or position information. Do you have any suggestions? :)

I asked around a bit and found the following:

In this paper the authors found that the middle layers of bert base are best at predicting word similarity. that is, paradigmatic (wordnetlike) relations between words. https://aclanthology.org/2020.conll-1.17/

And this kind of echoes that the middle layers are better at semantics and the later layers better at syntax, e.g. https://arxiv.org/abs/1905.05950

Unsure if these help, but you could perhaps also determine this empirically on your data! Try with multiple layers if you can!